HBaseCon 2013 is a wrap, thanks to our speakers, sponsors, and attendees! Session video and slides will be available here soon.
HBaseCon 2013 sessions are organized into four tracks: Operations, Internals, Ecosystem, and Case Studies. (See Track descriptions below.) Sessions are 20 minutes or 40 minutes long.
Click here to view other training/community events occurring before and after HBaseCon!
Track Descriptions
| Operations | Internals | Ecosystem | Case Studies | |
|---|---|---|---|---|
| 7:30am – 6:00pm |
Registration Open |
|||
| 8:00am – 9:00am |
Breakfast with the Committers |
|||
| 9:00am – 10:30am |
|
|||
| 10:30am – 11:00am |
Break |
|||
| 11:00am – 11:40am |
This presentation explains how Pinterest operates HBase on Amazon EC2 with success. Pinterest runs many of its products on HBase, and recently switched most HBase deployments to hi1.4xlarge instances (SSD backed) to enable interesting features, such as using HBase Snapshot... |
Table snapshotting is a new Apache HBase 0.94.6+/0.95+ feature that introduces the ability to quickly capture the data in a table with little impact to the cluster as well the ability to rollback or make read-write clones of a table.... |
In this talk we will look at the current status of using Hive for querying your data stored in HBase. The talk will include a running example of a web table storing web crawl data in HBase, and Hive queries... Phoenix is an open source project from Salesforce.com that puts a SQL skin on top of HBase. This talk will focus on answering: 1) why put a SQL skin on top of HBase? and 2) how does Phoenix marry the... |
Yahoo! has been using HBase for a long time in isolated instances, most notably for the personalization platform powering its homepage experiences. The introduction of multi-tenancy has lowered the barriers for all Hadoop users to use HBase. We will cover... |
| 11:50am – 12:30pm |
HBase has been powering Facebook’s messaging system for over two years. This talk will discuss common failure scenarios that Facebook has encountered in this period and efforts to make HBase more reliable/available. Finally, we will explore techniques for handling correlated... |
This talk will explain how to get to an MTTR for regions under one minute. It will first cover the possible failures (.meta.; master; region server, datanode, software, hardware) and then explore how HBase detects the failures, recovers the data,... |
Cloudera Impala is an open source project that allows low latency and analytical queries over big data in Apache Hadoop; with Impala it is now possible to use SQL in conjunction with HBase. With a developer friendly interface on HBase’s... This session provides an overview of Apache Drill that delivers full ANSI SQL capability for HBase users. Apache Drill, which is based on Google’s Dremel technology, has an ambitious design goal of providing sub-second query capability over trillions of database... |
eBay search powers search on the ebay.com website and is in the critical path of eBay’s user experience and revenue. Sellers and buyer are continuously updating the underlying data ecosystem and the Search system has to process these changes in... |
| 12:30pm – 1:30pm |
Lunch |
|||
| 1:30pm – 2:10pm |
This presentation covers operating an OpenTSDB Cluster, including best practices for scaling, how to maximize throughput, and share metrics related to the scale of Box’s cluster. We will also cover data collection processes and how Box uses OpenTSDB as an... |
This talk will cover all the specifics needed for a Java developer to start creating coprocessors. It will start with a brief introduction to what a coprocessor is and why it is useful, describing the difference between observers and endpoints... |
Honeycomb is an exciting new open source storage engine plugin for MySQL that enables MySQL to store and query tables directly in HBase. By storing tables directly in HBase and allowing direct access to MySQL for queries and modification, Honeycomb... This presentation explores the design and challenges HappiestMinds faced while implementing a storage and search infrastructure for a large publisher where books/documents/artifacts related records are stored in Apache HBase. Upon bulk insert of book records into HBase, the Elasticsearch index... |
At Pinterest, we have been increasingly using HBase for a variety of applications – real-time, interactive, and batch oriented. In this talk, we discuss our experience with architecting and scaling our Feed storage on HBase. “Feeds” are central to user... At Groupon, HBase now powers most of the backend technology for real time delivery of “deal” experience across all platforms, as well as powers our batch clusters for consolidated user data. We have over 40 billion data points in our... |
| 2:20pm – 3:00pm |
|
Attend this panel to learn about the new development efforts that are moving the HBase code base into the future — as well as ones on the community’s wish list. Panelists: Nick Dimiduk, Hortonworks Jonathan Gray, Continuuity Lars Hofhansl, Salesforce.com... |
Because BigTable-flavored systems support retrieval only by a single row key, the lack of indexing on non-key columns pushes all details of more complicated indexing schemes up to the application. Intel has extended HBase with a general full-text indexing framework... One of the foremost common applications of HBase is its use as random-access, planet-sized data store for semi-structured data. HBase doesn’t offer indexed access via non-primary key out of the box however – the main challenge being the maintenance of... |
LongTail Video recently launched a real-time video analytics service on top of Hadoop and HBase running on Amazon’s AWS cloud. In this talk, we will discuss our architecture, specifically how we use Hadoop for real-time analytics by processing data in... |
| 3:00pm – 3:20pm |
Break |
|||
| 3:20pm – 4:00pm |
Based on production experience with HBase since 2009, as well as recent benchmark results, this talk will take you through network designs and optimizations for HBase, as well as possible ways to make the network work better for Apache Hadoop/HBase.... |
This talk will take an HDFS-centric look at the filesystem issues in HBase. It will dissect the interface between HBase and HDFS, with a focus on the filesystem services that HBase relies on, durability, crash recovery, and performance characteristics of... |
Flume NG has added capabilities to write to HBase over the last year, using the “standard” HBase client API and the AsyncHBase API. In addition to being able to “put” events into HBase, Flume also supports doing “increments” on data... In the Continuuity AppFabric, a flow processes events in realtime with exactly-once guarantee, passing data for processing downstream via queues. All operations involved in processing of a data object, from de-queuing through data operations in the course of processing to... |
Experian sends 700+ million emails daily, which it analyzes in real time to see campaign performance and create new segments. Experian’s ETL framework can source data from various systems, transform it, and persist in HBase. The framework also provides the... Simply Measured originally built out its entire data storage platform on MongoDB. Things seemed rosy for a while, but they were crumbling at the edges. Consistency was an issue, and Sharding for each customer’s data source didn’t work well as... |
| 4:10pm – 4:50pm |
It is important when looking at rolling out Big Data infrastructure, especially HBase, to have your Operations team onboard. The best way to do this is to relate HBase back to their previous experience (with SQL databases, for example) and... |
Compactions are a critical aspect of HBase storage design, yet they are frequently a pain point in cluster management, affecting the availability and requiring manual tuning. This talk will provide brief overview of existing HBase compaction algorithm, the problems it... |
Consumers are constantly searching for something new and to stay competitive, organizations must act immediately based on up-to-date data. Outdated recommendations decrease the likelihood of presenting the right offer and make it harder to develop loyal customers, and to provide... Many real-world machine learning applications run on Big Data matrices; they operate on a matrix input with a very large number of rows, some kinds of which, such as text data, also have a plenty of columns and are extremely... |
Explorys has been using HBase and Hadoop since HBase 0.20, and will walk through lessons learned over years of usage from their first HBase implementation through a series of upgrades and changes, including impacts to schema design, data loading, data... At RichRelevance, we service 10 of the top 20 Internet retailer chains and deliver more than $5.5 billions in attributable sales. Our Hadoop infrastructure has a capacity to handle upwards of 1.5+ PB. Behavioral Targeting, specifically user segmentation and building... |
| 5:00pm – 5:40pm |
Attend this panel to learn about best practices from the operators of world-class HBase deployments. Panelists: Jeremy Carroll, Pinterest Dave Latham, Flurry Alex Levchuk, Facebook Rajiv Chittajallu, Yahoo! Moderator: Eric Sammer, Cloudera |
We all know there’s a thunderous pace of development on HBase. But what’s actually going into all of these JIRAs? In this Thunder Talk, we’ll cover the most interesting commits from the past year at a pace that will make... HBase Replication is a rich feature that enables users to asynchronously copy table data between HBase clusters. It plays an integral part in highly-available HBase deployments acting as a key piece to Disaster Recovery strategies and providing a valuable tool... |
In this session we will talk about the metrics exposed by HBase. We’ll cover what metrics are there, what they mean, and how to access them. We’ll also look at examples of region hot spotting, region server queue times, replication... While it is very common to find resource management functionality in commercial, enterprise grade data storage systems, it is the one feature that HBase still lacks. In fact, a single malicious or foolish user can take down the entire cluster... |
Causata’s event-based HBase data store has two main access patterns: low latency access (sub 50ms) to individual customer’s profiles for offer management, and streaming access to profiles matching certain query criteria, for predictive analytics. We recently migrated to HBase from... Every week, Ancestry DNA analyzes thousands of peoples’ DNA, decoding their family origins and finding their long-lost relatives. To that end, we used GERMLINE, an algorithm for finding hidden family relationships within a pool of DNA. However, the reference implementation... |
| 5:40pm – 8:00pm |
Party Time! |
|||