Schedule

HBaseCon 2013 is a wrap, thanks to our speakers, sponsors, and attendees! Session video and slides will be available here soon.

HBaseCon 2013 sessions are organized into four tracks: Operations, Internals, Ecosystem, and Case Studies. (See Track descriptions below.) Sessions are 20 minutes or 40 minutes long.

Click here to view other training/community events occurring before and after HBaseCon!

> Download Program Guide PDF

Track Descriptions

Operations Internals Ecosystem Case Studies
7:30am – 6:00pm

Registration Open

8:00am – 9:00am

Breakfast with the Committers
Have a donut and coffee with some of the committers behind Apache HBase!

9:00am – 10:30am

General Session

  • Michael Stack, Software Engineer, Cloudera
  • Amr Awadallah, CTO and Co-founder, Cloudera
  • Lars Hofhansl, Architect, Salesforce.com
  • Aaron Kimball, Chief Architect, WibiData
  • Liyin Tang, Software Engineer, Facebook
10:30am – 11:00am

Break

11:00am – 11:40am

Apache HBase Operations at Pinterest
with Jeremy Carroll (Pinterest)

Apache HBase Operations at Pinterest
Presented by Jeremy Carroll (Pinterest)
Located in Nob Hill CD

This presentation explains how Pinterest operates HBase on Amazon EC2 with success. Pinterest runs many of its products on HBase, and recently switched most HBase deployments to hi1.4xlarge instances (SSD backed) to enable interesting features, such as using HBase Snapshot...

Apache HBase Table Snapshots
with Jonathan Hsieh (Cloudera), Matteo Bertozzi (Cloudera), and Jesse Yates (Salesforce.com)

Apache HBase Table Snapshots
Presented by Jonathan Hsieh (Cloudera), Matteo Bertozzi (Cloudera), and Jesse Yates (Salesforce.com)
Located in Nob Hill AB

Table snapshotting is a new Apache HBase 0.94.6+/0.95+ feature that introduces the ability to quickly capture the data in a table with little impact to the cluster as well the ability to rollback or make read-write clones of a table....

SQL Over HBase: A Case for Apache Hive (11:00am-11:20am)
with Enis Söztutar (Hortonworks), and Ashtutosh Chahaun (Hortonworks)

SQL Over HBase: A Case for Apache Hive (11:00am-11:20am)
Presented by Enis Söztutar (Hortonworks), and Ashtutosh Chahaun (Hortonworks)
Located in Yerba Buena 10-12

In this talk we will look at the current status of using Hive for querying your data stored in HBase. The talk will include a running example of a web table storing web crawl data in HBase, and Hive queries...

How (and Why) Phoenix Puts the SQL Back into NoSQL (11:20am-11:40am)
with James Taylor (Salesforce.com)

How (and Why) Phoenix Puts the SQL Back into NoSQL (11:20am-11:40am)
Presented by James Taylor (Salesforce.com)
Located in Yerba Buena 10-12

Phoenix is an open source project from Salesforce.com that puts a SQL skin on top of HBase. This talk will focus on answering: 1) why put a SQL skin on top of HBase? and 2) how does Phoenix marry the...

Multi-tenant Apache HBase at Yahoo!
with Sumeet Singh (Yahoo!), and Francis Liu (Yahoo!)

Multi-tenant Apache HBase at Yahoo!
Presented by Sumeet Singh (Yahoo!), and Francis Liu (Yahoo!)
Located in Yerba Buena 13-15

Yahoo! has been using HBase for a long time in isolated instances, most notably for the personalization platform powering its homepage experiences. The introduction of multi-tenancy has lowered the barriers for all Hadoop users to use HBase. We will cover...

11:50am – 12:30pm

Reliability: More 9’s for Apache HBase
with Amitanand Aiyer (Facebook)

Reliability: More 9’s for Apache HBase
Presented by Amitanand Aiyer (Facebook)
Located in Nob Hill CD

HBase has been powering Facebook’s messaging system for over two years. This talk will discuss common failure scenarios that Facebook has encountered in this period and efforts to make HBase more reliable/available. Finally, we will explore techniques for handling correlated...

How to Get the MTTR Below 1 Minute and More
with Devaraj Das (Hortonworks), and Nicolas Liochon (Scaled Risk)

How to Get the MTTR Below 1 Minute and More
Presented by Devaraj Das (Hortonworks), and Nicolas Liochon (Scaled Risk)
Located in Nob Hill AB

This talk will explain how to get to an MTTR for regions under one minute. It will first cover the possible failures (.meta.; master; region server, datanode, software, hardware) and then explore how HBase detects the failures, recovers the data,...

Impala: Using SQL to Extract Value from Apache HBase (11:50am-12:10pm)
with Elliott Clark (Cloudera)

Impala: Using SQL to Extract Value from Apache HBase (11:50am-12:10pm)
Presented by Elliott Clark (Cloudera)
Located in Yerba Buena 10-12

Cloudera Impala is an open source project that allows low latency and analytical queries over big data in Apache Hadoop; with Impala it is now possible to use SQL in conjunction with HBase. With a developer friendly interface on HBase’s...

Apache Drill: A Community-driven Initiative to Deliver ANSI SQL Capabilities for Apache HBase (12:10pm-12:30pm)
with Jacques Nadeau (MapR)

Apache Drill: A Community-driven Initiative to Deliver ANSI SQL Capabilities for Apache HBase (12:10pm-12:30pm)
Presented by Jacques Nadeau (MapR)
Located in Yerba Buena 10-12

This session provides an overview of Apache Drill that delivers full ANSI SQL capability for HBase users. Apache Drill, which is based on Google’s Dremel technology, has an ambitious design goal of providing sub-second query capability over trillions of database...

Near Real Time Indexing for eBay Search
with Swati Agarwal (eBay), and Raj Tanneru (eBay)

Near Real Time Indexing for eBay Search
Presented by Swati Agarwal (eBay), and Raj Tanneru (eBay)
Located in Yerba Buena 13-15

eBay search powers search on the ebay.com website and is in the critical path of eBay’s user experience and revenue. Sellers and buyer are continuously updating the underlying data ecosystem and the Search system has to process these changes in...

12:30pm – 1:30pm

Lunch

1:30pm – 2:10pm

OpenTSDB at Scale
with Jonathan Creasy (Box), and Geoffrey Anderson (Box)

OpenTSDB at Scale
Presented by Jonathan Creasy (Box), and Geoffrey Anderson (Box)
Located in Nob Hill CD

This presentation covers operating an OpenTSDB Cluster, including best practices for scaling, how to maximize throughput, and share metrics related to the scale of Box’s cluster. We will also cover data collection processes and how Box uses OpenTSDB as an...

A Developer’s Guide to Coprocessors
with John Weatherford (Telescope)

A Developer’s Guide to Coprocessors
Presented by John Weatherford (Telescope)
Located in Nob Hill AB

This talk will cover all the specifics needed for a Java developer to start creating coprocessors. It will start with a brief introduction to what a coprocessor is and why it is useful, describing the difference between observers and endpoints...

Honeycomb: MySQL Backed by Apache HBase (1:30pm-1:50pm)
with Dan Burkert (Near Infinity)

Honeycomb: MySQL Backed by Apache HBase (1:30pm-1:50pm)
Presented by Dan Burkert (Near Infinity)
Located in Yerba Buena 10-12

Honeycomb is an exciting new open source storage engine plugin for MySQL that enables MySQL to store and query tables directly in HBase. By storing tables directly in HBase and allowing direct access to MySQL for queries and modification, Honeycomb...

Using Coprocessors to Index Columns in an Elasticsearch Cluster (1:50pm-2:10pm)
with Dibyendu Bhattacharya (HappiestMinds)

Using Coprocessors to Index Columns in an Elasticsearch Cluster (1:50pm-2:10pm)
Presented by Dibyendu Bhattacharya (HappiestMinds)
Located in Yerba Buena 10-12

This presentation explores the design and challenges HappiestMinds faced while implementing a storage and search infrastructure for a large publisher where books/documents/artifacts related records are stored in Apache HBase. Upon bulk insert of book records into HBase, the Elasticsearch index...

Apache HBase at Pinterest: Scaling Our Feed Storage (1:30pm-1:50pm)
with Varun Sharma (Pinterest)

Apache HBase at Pinterest: Scaling Our Feed Storage (1:30pm-1:50pm)
Presented by Varun Sharma (Pinterest)
Located in Yerba Buena 13-15

At Pinterest, we have been increasingly using HBase for a variety of applications – real-time, interactive, and batch oriented. In this talk, we discuss our experience with architecting and scaling our Feed storage on HBase. “Feeds” are central to user...

Deal Personalization Engine with HBase @ Groupon (1:50pm-2:10pm)
with Ameya Kantikar (Groupon)

Deal Personalization Engine with HBase @ Groupon (1:50pm-2:10pm)
Presented by Ameya Kantikar (Groupon)
Located in Yerba Buena 13-15

At Groupon, HBase now powers most of the backend technology for real time delivery of “deal” experience across all platforms, as well as powers our batch clusters for consolidated user data. We have over 40 billion data points in our...

2:20pm – 3:00pm

Apache HBase on Flash
with Matt Kennedy (Fusion-io)

Panel: Apache HBase Futures (Moderated by Todd Lipcon)

Panel: Apache HBase Futures (Moderated by Todd Lipcon)
Located in Nob Hill AB

Attend this panel to learn about the new development efforts that are moving the HBase code base into the future — as well as ones on the community’s wish list. Panelists: Nick Dimiduk, Hortonworks Jonathan Gray, Continuuity Lars Hofhansl, Salesforce.com...

Full-Text Indexing for Apache HBase (2:20pm-2:40pm)
with Maryann Xue (Intel)

Full-Text Indexing for Apache HBase (2:20pm-2:40pm)
Presented by Maryann Xue (Intel)
Located in Yerba Buena 10-12

Because BigTable-flavored systems support retrieval only by a single row key, the lack of indexing on non-key columns pushes all details of more complicated indexing schemes up to the application. Intel has extended HBase with a general full-text indexing framework...

HBase SEP: Reliable Maintenance of Auxiliary Index Structures (2:40pm-3:00pm)
with Steven Noels (NGDATA)

HBase SEP: Reliable Maintenance of Auxiliary Index Structures (2:40pm-3:00pm)
Presented by Steven Noels (NGDATA)
Located in Yerba Buena 10-12

One of the foremost common applications of HBase is its use as random-access, planet-sized data store for semi-structured data. HBase doesn’t offer indexed access via non-primary key out of the box however – the main challenge being the maintenance of...

Apache Hadoop and Apache HBase for Real-Time Video Analytics (2:40pm-3:00pm)
with Suman Srinivasan (LongTail Video)

Apache Hadoop and Apache HBase for Real-Time Video Analytics (2:40pm-3:00pm)
Presented by Suman Srinivasan (LongTail Video)
Located in Yerba Buena 13-15

LongTail Video recently launched a real-time video analytics service on top of Hadoop and HBase running on Amazon’s AWS cloud. In this talk, we will discuss our architecture, specifically how we use Hadoop for real-time analytics by processing data in...

3:00pm – 3:20pm

Break

3:20pm – 4:00pm

Scalable Network Designs for Apache HBase
with Benoit Sigoure (Arista Networks)

Scalable Network Designs for Apache HBase
Presented by Benoit Sigoure (Arista Networks)
Located in Nob Hill CD

Based on production experience with HBase since 2009, as well as recent benchmark results, this talk will take you through network designs and optimizations for HBase, as well as possible ways to make the network work better for Apache Hadoop/HBase....

Apache HBase and HDFS: Understanding Filesystem Usage in HBase
with Enis Söztutar (Hortonworks)

Apache HBase and HDFS: Understanding Filesystem Usage in HBase
Presented by Enis Söztutar (Hortonworks)
Located in Nob Hill AB

This talk will take an HDFS-centric look at the filesystem issues in HBase. It will dissect  the interface between HBase and HDFS, with a focus on the filesystem services that HBase relies on, durability, crash recovery, and performance characteristics of...

Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes (3:20pm-3:40pm)
with Hari Shreedharan (Cloudera)

Streaming Data into Apache HBase using Apache Flume: Experience with High Speed Writes (3:20pm-3:40pm)
Presented by Hari Shreedharan (Cloudera)
Located in Yerba Buena 10-12

Flume NG has added capabilities to write to HBase over the last year, using the “standard” HBase client API and the AsyncHBase API. In addition to being able to “put” events into HBase, Flume also supports doing “increments” on data...

High-Throughput, Transactional Stream Processing on Apache HBase (3:40pm-4:00pm)
with Andreas Neumann (Continuuity), and Alex Baranau (Continuuity)

High-Throughput, Transactional Stream Processing on Apache HBase (3:40pm-4:00pm)
Presented by Andreas Neumann (Continuuity), and Alex Baranau (Continuuity)
Located in Yerba Buena 10-12

In the Continuuity AppFabric, a flow processes events in realtime with exactly-once guarantee, passing data for processing downstream via queues. All operations involved in processing of a data object, from de-queuing through data operations in the course of processing to...

ETL for Apache HBase (3:20pm-3:40pm)
with Manoj Khanwalkar (Experian), and Govind Asawa (Experian)

ETL for Apache HBase (3:20pm-3:40pm)
Presented by Manoj Khanwalkar (Experian), and Govind Asawa (Experian)
Located in Yerba Buena 13-15

Experian sends 700+ million emails daily, which it analyzes in real time to see campaign performance and create new segments. Experian’s ETL framework can source data from various systems, transform it, and persist in HBase. The framework also provides the...

Rebuilding for Scale on Apache HBase (3:40pm-4:00pm)
with Rob Roland (Simply Measured)

Rebuilding for Scale on Apache HBase (3:40pm-4:00pm)
Presented by Rob Roland (Simply Measured)
Located in Yerba Buena 13-15

Simply Measured originally built out its entire data storage platform on MongoDB. Things seemed rosy for a while, but they were crumbling at the edges. Consistency was an issue, and Sharding for each customer’s data source didn’t work well as...

4:10pm – 4:50pm

Apache HBase, Meet Ops. Ops, Meet Apache HBase.
with Jean-Daniel Cryans (Cloudera), and Kevin O'dell (Cloudera)

Apache HBase, Meet Ops. Ops, Meet Apache HBase.
Presented by Jean-Daniel Cryans (Cloudera), and Kevin O'dell (Cloudera)
Located in Nob Hill CD

It is important when looking at rolling out Big Data infrastructure, especially HBase, to have your Operations team onboard. The best way to do this is to relate HBase back to their previous experience (with SQL databases, for example) and...

Compaction Improvements in Apache HBase
with Sergey Shelukhin (Hortonworks)

Compaction Improvements in Apache HBase
Presented by Sergey Shelukhin (Hortonworks)
Located in Nob Hill AB

Compactions are a critical aspect of HBase storage design, yet they are frequently a pain point in cluster management, affecting the availability and requiring manual tuning. This talk will provide brief overview of existing HBase compaction algorithm, the problems it...

Real-Time Model Scoring in Recommender Systems (4:10pm-4:30pm)
with Jonathan Natkins (WibiData), and Juliet Hougland (WibiData)

Real-Time Model Scoring in Recommender Systems (4:10pm-4:30pm)
Presented by Jonathan Natkins (WibiData), and Juliet Hougland (WibiData)
Located in Yerba Buena 10-12

Consumers are constantly searching for something new and to stay competitive, organizations must act immediately based on up-to-date data. Outdated recommendations decrease the likelihood of presenting the right offer and make it harder to develop loyal customers, and to provide...

Using Apache HBase for Large Matrices (4:30pm-4:50pm)
with Gokhan Capan (Dilisim)

Using Apache HBase for Large Matrices (4:30pm-4:50pm)
Presented by Gokhan Capan (Dilisim)
Located in Yerba Buena 10-12

Many real-world machine learning applications run on Big Data matrices; they operate on a matrix input with a very large number of rows, some kinds of which, such as text data, also have a plenty of columns and are extremely...

Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond (4:10pm-4:30pm)
with Doug Meil (Explorys)

Evolving a First-Generation Apache HBase Deployment to Second Generation and Beyond (4:10pm-4:30pm)
Presented by Doug Meil (Explorys)
Located in Yerba Buena 13-15

Explorys has been using HBase and Hadoop since HBase 0.20, and will walk through lessons learned over years of usage from their first HBase implementation through a series of upgrades and changes, including impacts to schema design, data loading, data...

Realtime User Segmentation using Apache HBase: Architectural Case Study (4:30pm-4:50pm)
with Murtaza Doctor (RichRelevance), and Giang Nguyen (RichRelevance)

Realtime User Segmentation using Apache HBase: Architectural Case Study (4:30pm-4:50pm)
Presented by Murtaza Doctor (RichRelevance), and Giang Nguyen (RichRelevance)
Located in Yerba Buena 13-15

At RichRelevance, we service 10 of the top 20 Internet retailer chains and deliver more than $5.5 billions in attributable sales. Our Hadoop infrastructure has a capacity to handle upwards of 1.5+ PB. Behavioral Targeting, specifically user segmentation and building...

5:00pm – 5:40pm

Panel: Apache HBase Operations (Moderated by Eric Sammer)

Panel: Apache HBase Operations (Moderated by Eric Sammer)
Located in Nob Hill CD

Attend this panel to learn about best practices from the operators of world-class HBase deployments. Panelists: Jeremy Carroll, Pinterest Dave Latham, Flurry Alex Levchuk, Facebook Rajiv Chittajallu, Yahoo! Moderator: Eric Sammer, Cloudera

1500 JIRAs in 20 Minutes (5:00pm-5:20pm)
with Ian Varley (Salesforce.com)

1500 JIRAs in 20 Minutes (5:00pm-5:20pm)
Presented by Ian Varley (Salesforce.com)
Located in Nob Hill AB

We all know there’s a thunderous pace of development on HBase. But what’s actually going into all of these JIRAs? In this Thunder Talk, we’ll cover the most interesting commits from the past year at a pace that will make...

Apache HBase Replication (5:20pm-5:40pm)
with Chris Trezzo (Twitter)

Apache HBase Replication (5:20pm-5:40pm)
Presented by Chris Trezzo (Twitter)
Located in Nob Hill AB

HBase Replication is a rich feature that enables users to asynchronously copy table data between HBase clusters. It plays an integral part in highly-available HBase deployments acting as a key piece to Disaster Recovery strategies and providing a valuable tool...

Using Metrics to Monitor and Debug Apache HBase (5:00pm-5:20pm)
with Elliott Clark (Cloudera)

Using Metrics to Monitor and Debug Apache HBase (5:00pm-5:20pm)
Presented by Elliott Clark (Cloudera)
Located in Yerba Buena 10-12

In this session we will talk about the metrics exposed by HBase. We’ll cover what metrics are there, what they mean, and how to access them. We’ll also look at examples of region hot spotting, region server queue times, replication...

Project Valta: A Resource Management Layer over Apache HBase (5:20pm-5:40pm)
with Andrew Wang (Cloudera), and Lars George (Cloudera)

Project Valta: A Resource Management Layer over Apache HBase (5:20pm-5:40pm)
Presented by Andrew Wang (Cloudera), and Lars George (Cloudera)
Located in Yerba Buena 10-12

While it is very common to find resource management functionality in commercial, enterprise grade data storage systems, it is the one feature that HBase still lacks. In fact, a single malicious or foolish user can take down the entire cluster...

Mixing Low Latency with Analytical Workloads for Customer Experience Management (5:00pm-5:20pm)
with Neil Ferguson (Causata)

Mixing Low Latency with Analytical Workloads for Customer Experience Management (5:00pm-5:20pm)
Presented by Neil Ferguson (Causata)
Located in Yerba Buena 13-15

Causata’s event-based HBase data store has two main access patterns: low latency access (sub 50ms) to individual customer’s profiles for offer management, and streaming access to profiles matching certain query criteria, for predictive analytics. We recently migrated to HBase from...

Apache HBase, Apache Hadoop, DNA and YOU! (5:20pm-5:40pm)
with Jeremy Pollack (Ancestry.com)

Apache HBase, Apache Hadoop, DNA and YOU! (5:20pm-5:40pm)
Presented by Jeremy Pollack (Ancestry.com)
Located in Yerba Buena 13-15

Every week, Ancestry DNA analyzes thousands of peoples’ DNA, decoding their family origins and finding their long-lost relatives. To that end, we used GERMLINE, an algorithm for finding hidden family relationships within a pool of DNA. However, the reference implementation...

5:40pm – 8:00pm

Party Time!