7:30am - 5pm Registration Open
7:30am - 8:50am Breakfast
8:20am - 8:50am Introduction to HBase Session
Click anywhere inside this box to close
HBase: Just the Basics
As optional pre-conference prep for attendees who are new to HBase, this talk will offer a brief Cliff's Notes-level talk covering architecture, API, and schema design. The architecture section will cover the daemons and their functions, the API section will cover HBase's GET, PUT, and SCAN classes; and the schema design section will cover how HBase differs from an RDBMS and the amount of effort to place on schema and row-key design.

Jesse Anderson - Instructor, Cloudera University
Jesse is a curriculum developer and instructor for Cloudera University.

HBase: Just the Basics
9am-10:30am
General Session
Click anywhere inside this box to close
Welcome Messages
The hosts of HBaseCon welcome the Apache HBase community to the conference and preview the day ahead.

Michael Stack - Software Engineer, Cloudera
Michael is a software engineer on Cloudera's HBase team. He is Chair of the HBase PMC and a member of the Hadoop PMC.

Amr Awadallah - Ph.D. - CTO, Cloudera
Before co-founding Cloudera in 2008, Amr was an Entrepreneur-in-Residence at Accel Partners. Prior to joining Accel he served as Vice President of Product Intelligence Engineering at Yahoo!, and ran one of the very first organizations to use Hadoop for data analysis and business intelligence. Amr joined Yahoo after it acquired his first startup, VivaSmart, in July 2000.

Welcome Messages
Click anywhere inside this box to close
Bigtable at Google: Yesterday, Today, and Tomorrow
Bigtable is the world's largest multi-purpose database, supporting 90% of Google's applications around the world. This talk provides a brief overview of Bigtable evolution since it was originally described in an OSDI '06 paper, its current use cases at Google, and future directions.

Avtandil Garakanidze - Group Product Manager, Google
Avtandil is group product manager for Google's core storage infrastructure team, covering a number of storage services, including Bigtable. Prior to Google, Avtandil was VP of Engineering and Products at Composite Software, where he helped grow the company into a recognized leader in Data Virtualization and eventual acquisition by Cisco Systems. Prior to that, Avtandil held various product and engineering management roles with VERITAS, Starfish Software (acquired by Motorola), Yahoo!, and Compressent.

Carter Page - Engineering Manager, Google
Carter Page wears two hats as engineer and manager of the Bigtable development team in New York City. Prior to Google, he worked at Amplify Education as Director of Engineering, delivering instructional analytics solutions to teachers across the country. Carter has worked on high-performance distributed software for 18 years across several industries, including media, finance, and education.

Bigtable at Google: Yesterday, Today, and Tomorrow
Click anywhere inside this box to close
HydraBase: Facebook's Highly Available and Strongly Consistent Storage Service Based on Replicated HBase Instances
HBase powers multiple mission-critical online applications at Facebook. However, providing a highly available online storage system on top of a single HDFS cluster has been challenging. HydraBase is built to provide a highly available, strongly consistent online storage service. It allows Facebook to synchronously replicate transactions across multiple geographically dispersed HBase instances and support seamless failover among HBase instances at Region-level granularity. This talk will cover the design of HydraBase, including the replication protocol, an analysis of failure scenario, and a contribution plan to HBase.

Liyin Tang - Software Engineer, Facebook
Liyin is a software engineer for the Data Infrastructure team at Facebook, where he focuses on building high available and reliable storage services, and helps the service scale in the face of exponential data growth. Liyin is an HBase Committer and contributes to other open source projects including HDFS and Apache Hive.

HydraBase: Facebook's Highly Available and Strongly Consistent Storage Service Based on Replicated HBase Instances
Click anywhere inside this box to close
HBase @ Salesforce.com
Lars explains how Salesforce.com's scalability requirements led it to HBase and the multiple use cases for Apache HBase there today. You'll also learn how Salesforce.com works with the HBase community, and get a detailed look into its operational environment.

Lars Hofhansl - Architect, Salesforce.com
Lars explains how Salesforce.com's scalability requirements led it to HBase and the multiple use cases for Apache HBase there today. You'll also learn how Salesforce.com works with the HBase community, and get a detailed look into its operational environment.

HBase @ Salesforce.com
10:30am - 11am Break
Operations
Features & Internals
Ecosystem
Case Studies
11am - 11:40am
Click anywhere inside this box to close
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.

Dheeraj Kapur - IT Tech Lead, Yahoo!
Dheeraj has eight years of IT experience including experience in the areas of cloud computing, advanced system automation & tools design, and system administration. He is skilled in management of infrastructure and rollout of technology to support large user groups, supporting users at corporate headquarters as well as multiple remote locations, and effectively managing high-end Hadoop clusters.

Rajiv Chittajallu - Senior Principal Engineer, Yahoo!
Rajiv is a Senior Principal Engineer in the Grid Operations Team at Yahoo! He has been involved with Hadoop at Yahoo! since 2006, starting with a 400-node development/research clusters and since moving to a 42,000+ node production environment. He worked at The Center for Development of Advanced Computing before joining Yahoo! In 2005.

Anish Mathew - Principal Engineer, Yahoo!
Anish has been responsible for supporting Hadoop ecosystem projects since 2009. He focuses on building highly reliable backend systems to manage/onboard projects to use Hadoop and HBase.
Harmonizing Multi-tenant HBase Clusters for Managing Workload Diversity
Click anywhere inside this box to close
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase has ACID semantics within a row that make it a perfect candidate for a lot of real-time serving workloads. However, single homing a region to a server implies some periods of unavailability for the regions after a server crash. Although the mean time to recovery has improved a lot recently, for some use cases, it is still preferable to do possibly stale reads while the region is recovering. In this talk, you will get an overview of our design and implementation of region replicas in HBase, which provide timeline-consistent reads even when the primary region is unavailable or busy.

Enis Söztutar - Member of Technical Staff, Hortonworks
Enis is an HBase, Hadoop, and Gora committer and a member of the Apache Software Foundation. He has been using and developing Hadoop ecosystem projects since 2007. He is currently working at HortonWorks as a part of the HBase engineering team.

Devaraj Das - Co-founder, Hortonworks
Devaraj has made significant contributions to MapReduce and Hadoop Security. He is a committer on Apache Hadoop, Apache HBase, Apache Ambari, Apache Tez, and a Mentor on a couple other Apache projects. He is currently working with Hortonworks of which he is a co-founder, and is contributing to HBase actively.
HBase Read High Availability Using Timeline-Consistent Region Replicas
Click anywhere inside this box to close
Taming HBase with Apache Phoenix and SQL
HBase is the Turing machine of the Big Data world. It's been scientifically proven that you can do *anything* with it. This is, of course, a blessing and a curse, as there are so many different ways to implement a solution. Apache Phoenix (incubating), the SQL engine over HBase to the rescue. Come learn about the fundamentals of Phoenix and how it hides the complexities of HBase while giving you optimal performance, and hear about new features from our recent release, including updatable views that share the same physical HBase table and n-way equi-joins through a broadcast hash join mechanism. We'll conclude with a discussion about our roadmap and plans to implement a cost-based query optimization to dynamically adapt query execution based on your data sizes.

Eli Levine - Principal Member of Technical Staff, Salesforce.com
Eli is building Salesforce's next-generation customer data storage platform using Apache Phoenix and HBase.

James Taylor - Engineer, Salesforce.com
James is an engineer at Salesforce.com in the Big Data Group. He founded the Apache Phoenix project and has led the development effort on that for the past several years. Prior to working at Salesforce.com, James worked at BEA Systems on a federated query processing system and an event driven programming platform.

Maryann Xue - Software Engineer, Intel
Maryann is a software engineer in the big data team at Intel and a committer on the Phoenix project. She has been focusing on enhancing the capabilities and performance of HBase for its application in customer solutions.

Taming HBase with Apache Phoenix and SQL
Click anywhere inside this box to close
Data Evolution in HBase
Managing the evolution of data within HBase over time is not easy: Data resulting from Hadoop processing pipelines or otherwise placed in HBase is subject to the same kinds of oversights, bugs, and faulty assumptions inherent to the software that creates it. While the development of this software is often effectively managed through revision control systems, data itself is rarely modeled in a way that affords the same flexibility. In this session, we'll talk about how to build a versioned, time-series data store using HBase that can provide significantly greater adaptability and performance than similar systems.

Eric Czech - Chief Architect, Next Sound
As Chief Architect at music data analytics company Next Big Sound, Eric has lead the development and deployment of purely open source solutions for measuring trillions of consumer actions taken on platforms such as Spotify, iTunes, YouTube, Twitter, Facebook, and the like.

Alec Zopf
Alec is a Senior Data Engineer at the music analytics company Next Big Sound. He is an avid music fan, a drummer for 18 years, and a biomedical and electrical engineer by training. After a stint in high-frequency trading, Alec joined Next Big Sound to help usher in a new era of advanced analytics and data warehousing for the music industry, combining data about social engagement and streaming with traditional digital sales and radio metrics to discover what drives modern music consumption.

Data Evolution in HBase
11:50am - 12:30pm  
Click anywhere inside this box to close
The State of HBase Replication
In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.

Jean-Daniel Cryans - Software Engineer, Cloudera
Jean-Daniel Cryans is a software engineer at Cloudera on the Storage team where he spends his days making Apache HBase better. Previous to that, he worked on HBase at StumbleUpon. Jean-Daniel became a committer and PMC member on the project in 2008.

The State of HBase Replication
Click anywhere inside this box to close
New Security Features in Apache HBase 0.98: An Operator's Guide
HBase 0.98 introduces several new security features: visibility labels, cell ACLs, transparent encryption, and coprocessor framework changes. This talk will cover the new capabilities available in HBase 0.98+, the threat models and use cases they cover, how these features stack up against other data stores in the Apache big data ecosystem, and how operators and security architects can take advantage of them.

Andrew Purtell - Systems Architect, Intel
Andrew is a Committer and PMC member for the HBase project. Andrew is a Principal Architect at Intel in the Big Data Platform Engineering Group, whose current focus includes security, coprocessors, the next generation of commodity hardware, and the constraints of the Java sandbox on the Hadoop ecosystem. Previously, Andrew worked at Trend Micro, Sparta, and McAfee.

Ramkrishna Vasudevan - Senior Software Engineer, Intel
Ramkrishna is a Senior Software Engineer at Intel and an HBase committer and PMC member. Prior to Intel, he worked at Huawei and was part of developing the secondary index feature in HBase.(HBASE-9203). Currently, he is working on HBase off-heap storage ideas and sometimes on Phoenix.

New Security Features in Apache HBase 0.98: An Operator's Guide
Click anywhere inside this box to close
Tasmo: Building HBase Applications From Event Streams
Tasmo is a system that enables application development on top of event streams and HBase. Its functionality is similar to a materialized view in a relational database, where data is maintained at write time in the forms it is needed at read time for display and indexing. Tasmo is designed for significantly read-heavy applications that display the same underlying data in multiple forms, where repeatedly performing the required selects and joins at read time can be prohibitively expensive. In this talk, we'll explore the features and roadmap for Tasmo.

Pete Matern - Principal Engineer, Jive Software
Pete is currently the architect for Jive's next generation architecture and co-creator of Tasmo. He is one of the original authors of Jive's flagship product, where he focused on clustering and caching functionality.

Jonathan Colt
Jonathan is co-creator of Tasmo, author of Jive's recommendations engine, and tech lead for the Jive Data Platform team.

Tasmo: Building HBase Applications From Event Streams
Click anywhere inside this box to close
Blackbird: Storing Billions of Rows a Couple of Milliseconds Away
Would you use HBase to make billions of rows available for real-time lookup under 10 ms with 99% guarantee? We, at Rocket Fuel, do just that. Blackbird, our system built on top of HBase, makes billions of rich user profiles available for AI based optimization under the tight latency requirements of real time auction. It relies on our novel collections API, a constrained yet useful append only model that is sympathetic to HBase internals and allows us to scale our writes easily while keeping strict read performance guarantees. In this talk, we describe the key abstractions Blackbird exposes, utilities we built over time to support our use cases and our hardware and software configuration (including HBase configs) that helps us achieve our strict latency guarantees. We also share the key challenges and lessons learned scaling the system ten fold in a short span of time and some common beginner mistakes that we made and fixed later that you should avoid.

Ishan Chhabra
Ishan is a Systems Engineer at Rocket Fuel, helping to build the next generation of infrastructure at a data driven company. When not hard at work, he is looking at raising the level of abstraction for building distributed systems or playing God of War in Titan Mode.

Shrijeet Paliwal - Senior Engineer, Rocket Fuel
Shrijeet was the second member of the Data Infrastructure group at Rocket Fuel. He has contributed to HBase, Hadoop, and real-time stream processing efforts and helped Rocket Fuel scale from single cluster of 10 nodes to 1000s of nodes across multiple data centers. In past Shrijeet was part of Stony Brook University's data science lab focusing on analyzing large-scale text streams such as news, blogs, and social media to identify cultural trends around the world's people, places, and things.

Abhijit Pol
At Rocket Fuel, Abhijit architects big data systems that learn and gain insights over petabytes of data. He was an architect on Yahoo's big data platforms. Abhijit holds PhD in databases, authored several papers in prestigious database conferences, is a recipient of the SIGMOD best paper award, and co-authored popular textbook Decision Support Systems.

Blackbird: Storing Billions of Rows a Couple of Milliseconds Away
12:30pm - 1:30pm Lunch with the Committers
1:30pm - 2:10pm
Click anywhere inside this box to close
Real-time HBase: Lessons from the Cloud
Running HBase in real time in the cloud provides an interesting and ever-changing set of challenges -- instance types are not ideal, neighbors can degrade your performance, and instances can randomly die in unanticipated ways. This talk will cover what HubSpot has learned about running in production on Amazon EC2, how it handle DR and redundancy, and the tooling the team has found to be the most helpful.

Bryan Beaudreault - Senior Technical Lead, HubSpot
Bryan is a senior technical lead at HubSpot currently leading the Data Operations team, which focuses on improving usage and operations of HBase and Hadoop within HubSpot. Before founding Data Ops, Bryan worked on several teams at HubSpot, notably rewriting HubSpot's Analytics platform to utilize the distributed nature of HBase.

Real-time HBase: Lessons from the Cloud
Click anywhere inside this box to close
Bulk Loading in the Wild: Ingesting the World's Energy Data
HBase is designed to store your big data and provide low latency random access to that data. One of its most compelling features is Bulk Loading, which enables the generation of HFiles that can then be passed to the RegionServers. Opower's energy insights platform uses it to ingest the hundreds of millions of meter reads it receives daily from its partner utility companies. This presentation will walk you through the HBase Bulk Loading process and Opower's adoption of it as an important piece of its HBase ecosystem.

Eric Chang - Tech Lead, Opower
Eric Chang is the Tech Lead for Data Infrastructure at Opower. He has been building features and code infrastructure on top of Hadoop and HBase and has been managing Opower's Hadoop operations since production roll out in 2011.

Jean-Daniel Cryans - Software Engineer, Cloudera
Jean-Daniel Cryans is a software engineer at Cloudera on the Storage team where he spends his days making Apache HBase better. Previous to that, he worked on HBase at StumbleUpon. Jean-Daniel became a committer and PMC member on the project in 2008.

Bulk Loading in the Wild: Ingesting the World's Energy Data
Click anywhere inside this box to close
Cross-Site BigTable using HBase
As HBase continues to expand in application and enterprise or government deployments, there is a growing demand for storing data across geographically distributed datacenters for improved availability and disaster recovery. The Cross-Site BigTable extends HBase to make it well-suited for such deployments, providing the capabilities of creating and accessing HBase tables that are partitioned and asynchronously backed-up over a number of distributed datacenters. This talk reveals how the Cross-Site BigTable manages data access over multiple datacenters and removes the data center itself as a single point of failure in geographically distributed HBase deployments.

Jingcheng Du
Jingcheng is a Software Engineer on the Intel Hadoop team participating in the development of the Intel Distribution for Apache Hadoop. He mainly focuses on the development of features based on HBase.

Ramkrishna Vasudevan - Senior Software Engineer, Intel
Ramkrishna is a Senior Software Engineer at Intel and an HBase committer and PMC member. Prior to Intel, he worked at Huawei and was part of developing the secondary index feature in HBase.(HBASE-9203). Currently, he is working on HBase off-heap storage ideas and sometimes on Phoenix.

Cross-Site BigTable using HBase
Click anywhere inside this box to close
Content Identification using HBase
The motivation behind content identification is to determine the media people are consuming (via TV shows, movies, or streaming). Nielsen collects that data via its Fingerprints system, which generates significant amounts of structured data that is stored in HBase. This presentation will review the options a developer has for HBase querying and retrieval of hash data. Also covered is the use of wire protocols (Protocol Buffers), and how they can improve network efficiency and throughput, especially when combined with an HBase coprocessor.

Daniel Nelson - Lead Software Engineer/Researcher, Nielsen
During the past 15 years at Nielsen, Daniel has been passionate about developing and refining content identification and audience measurement technologies. Daniel has over 25 patents granted in related fields, and finds himself drawn towards solving problems that require finding the "signal in the noise".

Content Identification using HBase

Click anywhere inside this box to close
Digital Library Collection Management using HBase
OCLC has been working over the last year to move its massive repository to HBase. This talk will focus on the impetus behind the move, implementation details and technology choices we've made (key design, shredding PDFs and other digital objects into HBase, scaling), and the value-add that HBase brings to digital collection management.

Ron Buckley - Senior Technical Manager, OCLC
Ron graduated has over 20 years experience building data systems for libraries. Currently, he's leading the implementation of Hadoop and HBase at OCLC. Previous to that, he led the OCLC search team and holds a patent for search results ranking for library search results.

Digital Library Collection Management using HBase
2:20pm - 3pm
Click anywhere inside this box to close
Tales from the Cloudera Field
From supporting the 0.90.x, 0.92, 0.94, and 0.96 HBase installations on clusters ranging from tens to hundreds of nodes, Cloudera has seen it all. Having automated the upgrade paths from the different Apache releases, we have developed a smooth path that can help the community with upcoming upgrades. In addition to automation best practices, in this talk you'll also learn proactive configuration tweaks and operational best practices to keep your HBase cluster always up and running. We'll also walk through how to contain an application bug let loose in production, to minimize the impact on HBase posed by faulty hardware, and the direct correlation between inefficient schema design and HBase performance.

Kevin O'Dell - Systems Engineer, Cloudera
Kevin was the HBase Customer Operations Team Lead at Cloudera and is now part of the Systems Engineer team. He has supported enterprises for the past 6 years. Previously at Netapp, he focused on performance analysis and storage, and later moved to Data Domain/EMC where he specialized in storage. Kevin is also a contributor to the HBase project.

Aleksandr Shulman - Software Engineer, Cloudera
Aleks works as a test engineer at Cloudera. He is responsible for HBase as a part of the CDH distribution. Before Cloudera, Aleks worked on the Platform API team as Salesforce.com, where he focused on test automation.

Kathleen Ting - Technical Account Manager, Cloudera
Kathleen is currently a technical account manager at Cloudera where she helps strategic customers deploy and use the Hadoop ecosystem in production. She's a frequent conference speaker, has contributed to several projects in the open source community, and is a committer and PMC member on Sqoop. Kathleen is also a co-author of The Apache Sqoop Cookbook.

Tales from the Cloudera Field
Click anywhere inside this box to close
HBase at Xiaomi
This talk covers the HBase environment at Xiaomi, including thoughts and practices around latency, hardware/OS/VM configuration, GC tuning, the use of a new write thread model and reverse scan, and block index optimization. It will also include some discussion of planned JIRAs based on these approaches.

Liang Xie - Software Engineer, Xiaomi
Liang is a software engineer at Xiaomi, an active HBase committer, and a Hadoop contributer. His interests include distributed storage, RDBMS, and software performance.

Honghua Feng - Software Engineer, Xiaomi
Honghua is a software engineer at Xiaomi, focusing on HBase. He has also worked at Panasonic, IBM, Microsoft, and Tencent. At Microsoft, he participated in the development of Kirin Store, the back-end, structured distributed storage system for Bing.

HBase at Xiaomi
Click anywhere inside this box to close
Design Patterns for Building 360-degree Views with HBase and Kiji
Many companies aspire to have 360-degree views of their data. Whether they're concerned about customers, users, accounts, or more abstract things like sensors, organizations are focused on developing capabilities for analyzing all the data they have about these entities. This talk will introduce the concept of entity-centric storage, discuss what it means, what it enables for businesses, and how to develop an entity-centric system using the open-source Kiji framework and HBase. It will also compare and contrast traditional methods of building a 360-degree view on a relational database versus building against a distributed key-value store, and why HBase is a good choice for implementing an entity-centric system.

Jonathan Natkins - Member of Technical Staff, WibiData
Jonathan "Natty" Natkins is a Field Engineer at WibiData, helping customers use their data to create better application experiences. Prior to WibiData, Jonathan spent time working on both relational and non-relational technologies at Vertica, and later, Cloudera. Jonathan holds an Sc.B in Math-Computer Science from Brown University.

Design Patterns for Building 360-degree Views with HBase and Kiji
Click anywhere inside this box to close
HBase at Bloomberg: High Availability Needs for the Financial Industry
Bloomberg is a financial data and analytics provider, so data management is core to what we do. There's tremendous diversity in the type of data we manage, and HBase is a natural fit for many of these datasets - from the perspective of the data model as well as in terms of a scalable, distributed database. This talk covers data and analytics use cases at Bloomberg and operational challenges around HA. We'll explore the work currently being done under HBASE-10070, further extensions to it, and how this solution is qualitatively different to how failover is handled by Apache Cassandra.

Sudarshan Kadambi - Software Engineer, Bloomberg LP
Sudarshan works in the Foundational Services team at Bloomberg where he focuses on the company's data infrastructure needs. The team broadly works on distributed databases and compute frameworks for real-time analytics and search. Sudarshan has a background in distributed systems from his days at Yahoo!, where he worked on PNUTS, Yahoo's NoSQL database.

Matthew Hunt
Matthew is the architect of Bloomberg's Portfolio Analytics system, which comprises real time and historical analytics for returns, risk, optimization, and attribution. He has previously been the CTO at several startups, including Omnipod, which became part of Symantec's Cloud solution, and Thor, now Oracle's Identity Management product, and served as the president of LUNY!, the Linux Users of New York.

HBase at Bloomberg: High Availability Needs for the Financial Industry
 
Click anywhere inside this box to close
HBase Design Patterns @ Yahoo!
Bloomberg is a financial data and analytics provider, so data management is core to what we do. There's tremendous diversity in the type of data we manage, and HBase is a natural fit for many of these datasets - from the perspective of the data model as well as in terms of a scalable, distributed database. This talk covers data and analytics use cases at Bloomberg and operational challenges around HA. We'll explore the work currently being done under HBASE-10070, further extensions to it, and how this solution is qualitatively different to how failover is handled by Apache Cassandra.

Francis Liu - Principal Software Engineer, Yahoo!
Francis is a Principal Software Engineer at Yahoo! working mainly on HBase. He is also an Apache Hive contributor and a Podling Project Management Committee (PPMC) member of the Apache HCatalog project. Prior to this, he was involved in the development of a workflow management and incremental processing platform built on top of Hadoop.

HBase Design Patterns @ Yahoo!
3pm - 3:20pm Break
3:20pm - 4pm
Click anywhere inside this box to close
HBase Backups
This talk provides an overview of enterprise-scale backup strategies for HBase: Jesse Yates will describe how Salesforce.com runs backup and recovery on its multi-tenant, enterprise scale HBase deploys; Demai Ni, Songqinq Ding, and Jing Chen of the IBM InfoSphere BigInsights development team will then follow with a description of IBM's recently open-sourced disaster/recovery solution based on HBase snapshots and replication.

Jesse Yates - Senior Member of Technical Staff, Salesforce.com
Jesse Yates has been nerd-ing out about distributed systems since college. Currently, he is a committer on HBase and working to make HBase 'enterprise ready' at Salesforce.com - hacking on everything from security, to backup and disaster recovery. In his free time, he writes more code and trains for triathlons.

Demai Ni - Senior Software Engineer, IBM
Demai is a senior developer at IBM on the BigInsights Engineering team and a contributor to the HBase project. Demai currently focuses on replication and disaster recovery. Previously, Demai worked on IBM's relational database product, DB2 for z/OS.

Richard Ding
Prior to joining the IBM big data team, Richard worked at Yahoo! for three years on Hadoop-related projects. He is an Apache Pig committer and PMC member. At IBM, his focus is on open source stack, especially on building enterprise features of open source components.

Jing Chen He
Jing Chen is a software engineer with IBM BigInsights -- IBM's big data platform. He has extensive experience in information management technologies, including RDBMSs, data warehouses, and Big Data. He is an active contributor to HBase.

HBase Backups
Click anywhere inside this box to close
HBase: Where Online Meets Low Latency
HBase is an online database so response latency is critical. This talk will examine sources of latency in HBase, detailing steps along the read and write paths. We'll examine the entire request lifecycle, from client to server and back again. We'll also look at the different factors that impact latency, including GC, cache misses, and system failures. Finally, the talk will highlight some of the work done in 0.96+ to improve the reliability of HBase.

Nick Dimiduk - Member of Technical Staff, Hortonworks
Nick found Hadoop and HBase in 2008 when his nightly ETL jobs started taking 20+ hours to complete. Since then, he has applied these tools to projects over social media, social gaming, click-stream analysis, climatology, and geographic data. Nick also helped establish Seattle's Scalability Meetup and tried his hand at entrepreneurship. He is an HBase committer and coauthored HBase in Action, the unofficial user's guide for HBase. His passion is scalable, online access to scientific data.

Nicolas Liochon
Nicolas has stayed focused on the software architecture business at various positions including Head of Architecture at Thomson Reuters for the Risk Management product line. He has been deeply part of the Big Data arena for more than two years, working especially with Hortonworks on HBase MTTR. He combines traditional software and enterprise architecture skills with a deep knowledge of Big Data architecture. Nicolas is PMC member for the HBase project. He is also cofounder of Scaled Risk, a company that provides a Big Data solution on top of Hadoop and HBase.

HBase: Where Online Meets Low Latency
Click anywhere inside this box to close
HBase Data Modeling and Access Patterns with Kite SDK
The Kite SDK is a set of libraries and tools focused on making it easier to build systems on top of the Hadoop ecosystem. HBase support has recently been added to the Kite SDK Data Module, which allows a developer to model and access data in HBase consistent with how they would model data in HDFS using Kite. This talk will focus on Kite's HBase support by covering Kite basics and moving through the specifics of working with HBase as a data source. This feature overview will be supplemented by specifics of how that feature is being used in production applications at Cloudera.

Adam Warrington - Tools Team Manager, Cloudera
Adam manages a development team at Cloudera where he focuses on building Big Data applications on top of Hadoop, HBase, Impala, and Cloudera Search. These applications give Cloudera insight into customer environments, enabling more efficient support operations, and allowing the business to be information driven. He led the development of an HBase client library that has since been rolled into the Kite SDK project to form Kite's HBase module.

HBase Data Modeling and Access Patterns with Kite SDK
Click anywhere inside this box to close
Large-scale Web Apps @ Pinterest
Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.

Varun Sharma - Software Engineer, Pinterest
Varun is a Software Engineer on the infrastructure team at Pinterest. His work encompasses fighting core scaling problems and leveraging new cutting edge technologies to augment the Pinterest tech stack. He is an active contributor to HBase and has built large-scale storage solutions on top of HBase at Pinterest. Prior to Pinterest, he was at Google for over 4 years, where he worked on a variety of projects including ad targeting/optimization, computer vision and mobile search.

Large-scale Web Apps @ Pinterest
4:10pm - 4:50pm
Click anywhere inside this box to close
From MongoDB to HBase in Six Easy Months
Pushing well past MongoDB's limits (2TB data every week) is an interesting exercise in operational frustration. It also severely hampers flexibility of design for new use cases. This talk covers the architectural journey from MongoDB/Redis to HBase at Optimizely -- including the performance, design flexibility, speed of implementation, and other gains made. It also covers the operational setup needed to monitor and maintain the system as well as lessons learned from the migration process itself.

Shreeganesh Ramanan - Senior Software Engineer, Optimizely
Shreeganesh (SG) is a software engineer 0n the Analytics team at Optimizely. He has been working on a new analytics backend using Hadoop and HBase. Prior to Optimizely, he worked on the Merchants platform at Amazon and on Core OS, Windows at Microsoft.

Mike Davis - Software Engineer, Optimizely
Mike is a software engineer at Optimizely. As a member of the Analytics team since 2012, he worked on a new analytics platform built on top of Apache Flume, HBase, and Hadoop. He is currently the tech lead for the Results team harvesting actionable data from HBase and MongoDB.

From MongoDB to HBase in Six Easy Months
Click anywhere inside this box to close
State of HBase: Meet the Release Managers
HBase release managers Lars Hofhansl, Andrew Purtell, Enis Soztutar, Michael Stack, and Liyin Tang jointly present highlights from their releases, and take your questions throughout.

State of HBase: Meet the Release Managers
Click anywhere inside this box to close
OpenTSDB 2.0
The OpenTSDB community continues to grow and with users looking to store massive amounts of time-series data in a scalable manner. In this talk, we will discuss a number of use cases and best practices around naming schemas and HBase configuration. We will also review OpenTSDB 2.0's new features, including the HTTP API, plugins, annotations, millisecond support, and metadata, as well as what's next in the roadmap.

Chris Larsen - Software Architect, LimeLight Networks
Chris is a software architect at Limelight Networks and the release manager for OpenTSDB 2.0.

Benoit Sigoure
Benoit is the original author of OpenTSDB, the distributed monitoring system built on top of HBase. He also wrote AsyncHBase, the alternative HBase client that is fully asynchronous, non-blocking, and thread-safe. Benoit is currently working on building distributed systems for next-generation datacenter networks at Arista Networks. He also works on network extensibility, Hadoop integration, APIs, and network programmability in general.

OpenTSDB 2.0
Click anywhere inside this box to close
A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase
Trend Micro collects lots of threat knowledge data for clients containing many different threat (web) entities. Most threat entities will be observed along with relations, such as malicious behaviors or interaction chains among them. So, we built a graph model on HBase to store all the known threat entities and their relationships, allowing clients to query threat relationships via any given threat entity. This presentation covers what problems we try to solve, what and how the design decisions we made, how we design such a graph model, and the graph computation tasks involved.

Chris Huang - Hadoop Architect, Trend Micro
Chen-hsiu (Chris) Huang is responsible for Trend Micro's cloud infrastructure to fulfill the Smart Protection Network (SPN) strategy. He has started work on Hadoop ecosystem since 2009 and designed many distributed systems. As a member of the Taiwan Hadoop User Group, he also helps to conduct the Hadoop in Taiwan conference since 2012, and actively promotes Hadoop/HBase in the local community.

Scott Miao - Senior Engineer, Trend Micro
Scott Miao is a senior engineer at Trend Micro, responsible for anything related to the Hadoop ecosystem used in the company. He has worked in Hadoop for three years.

A Graph Service for Global Web Entities Traversal and Reputation Evaluation Based on HBase
5pm - 5:40pm
Click anywhere inside this box to close
Smooth Operators Panel
Moderated by Eric Sammer.

Jeremy Carroll - Operations Engineer, Pinterest
Jeremy is an engineer for the Technical Operations team at Pinterest, where he builds, monitors, and scales systems infrastructure that handles billions of monthly page views. Prior to Pinterest, Jeremy worked at Klout building large-scale data processing systems using Hadoop and HBase.

Adam Frank - Operations Lead, Flurry
Adam runs the operations team at Flurry. He joined in 2012 specifically to get experience with HBase.

Paul Tuckfield - Production Engineer, Facebook
Paul has been working with large, high-availability, and high-performance databases for a long time at various companies including PayPal, Youtube, and Facebook. Currently Paul is a Production Engineer at Facebook, working on HBase performance and operations.

Smooth Operators Panel
Click anywhere inside this box to close
HBase: Extreme Makeover
This talks introduces a totally new implementation of a multilayer caching in HBase called BigBase. BigBase has a big advantage over HBase 0.94/0.96 because of an ability to utilize all available server RAM in the most efficient way, and because of a novel implementation of a L3 level cache on fast SSDs. The talk will show that different type of caches in BigBase work best for different type of workloads, and that a combination of these caches (L1/L2/L3) increases the overall performance of HBase by a very wide margin.

Vladimir Rodionov - Founder, bigbase.org
Vladimir is a software design, development and delivery expert, with many years in the software development industry. He has been involved in designing and building Big Data solutions for the past several years as a member of Carrier IQ platform team. Vladimir's areas of expertise include (but not limited to) Java, HBase, Hadoop, Hive, large-scale OLAP, and in-memory data processing.

HBase: Extreme Makeover
Click anywhere inside this box to close
Presto + HBase: A Distributed SQL Query Execution Engine on Top of HBase
Presto is a distributed SQL query engine optimized for ad hoc analysis at interactive speed in use at Facebook. At Facebook scale, having ad hoc SQL query capabilities for high-volume NoSQL data stores has been a very valuable asset, and Presto enabled this by supporting connectors on top of HDFS and other data providers. To effectively process the Presto SQL-based workload, HBase needs to be able to efficiently support a critical set of data access patterns over large data sets with high performance. This talk covers the improvements we've made to enhance scan performance and optimize the read path, as well as a number of other new features that help push down the work from the query execution to the database.

Manukranth Kolloju - Software Engineer, Facebook
Manukranth is a software engineer at Facebook and works on HBase and related technologies.

Presto + HBase: A Distributed SQL Query Execution Engine on Top of HBase
Click anywhere inside this box to close
A Survey of HBase Application Archetypes
Today, there are hundreds of production HBase clusters running a multitude of applications and use cases. Many well-known implementations exercise opposite ends of the HBase's capabilities emphasizing either entity-centric schemas or event-based schemas. This talk presents these archetypes and others based on a use-case survey of clusters conducted by Cloudera's development, product, and services teams. By analyzing the data from the nearly 20,000 HBase cluster nodes Cloudera has under management, we'll categorize HBase users and their use cases into a few simple archetypes, describe workload patterns, and quantify the usage of advanced features. We'll also explain what an HBase user can do to alleviate pressure points from these fundamentally different workloads, and use these results will provide insight into what lies in HBase's future.

Lars George - Director EMEA Services, Cloudera
Lars George has been involved with HBase since 2007, and became an HBase committer in 2009. He has spoken at many Hadoop User Group meetings, and conferences such as ApacheCon, FOSDEM, QCon, and Hadoop World. Lars works for Cloudera, as the Director EMEA Services, managing a team of Hadoop solutions architects in and around Europe. He is also the author of the O'Reilly Media book, HBase - The Definitive Guide.

Jon Hsieh - Software Engineer, Cloudera
Jonathan is a Software Engineer with Cloudera, currently focused on the HBase project. He is an HBase committer and PMC member, a committer and founder of the Apache Flume project, and a committer on the Apache Sqoop project.

A Survey of HBase Application Archetypes
5:40pm - 8pm HBaseCon Party!
HBaseCon 2014