Some of the featured topics this year

This is not the complete schedule. We will be adding more sessions in the next few days as we confirm speakers travel itinerary. Check back for updates. A description of the talks can be found on the sessions and workshops pages.

Spark

10:00am - Salon C - Kurt Brown (Netflix): Elevating Your Data Platform
10:00am -Classroom 106 - Chris Fregly (IBM): Spark After Dark 1.6
11:00am - Classroom 106 - Michelle Casbon (Idibon): Under the Hood of Idibon’s Scalable NLP Services
11:00am -Salon DE - Ben Reiter: Delivering Real Time Analytics for Mobile Ad Serving
12:00pm - Salon C - Jay Kreps (Confluent): Apache Kafka and the Stream Data Platform
2:00pm - Salon C - Eric Schmidt (Google): Two Worlds Become A Much Better One
3:00pm - Salon C - Patrick McFadin (DataStax): Laying down the SMACK on your data pipelines
3:00pm - Classroom 203 - Michael Berthold (KNIME): Blending Tools and Data in KNIME: From .csv, R, and Python to Spark, MLlib and Hive
4:00pm - Salon DE - Holden Karau (IBM): Beyond Shuffling - Tips & Tricks for Scaling Apache Spark Programs
5:00pm - Salon C - Sarah Guido (Bitly): Spark: The Good, the Bad, and the Ugly

Data Science

10:00am - Salon DE - John Akred (Silicon Valley Data Science): Running Agile Data Science Teams
11:00am - Conference 301 - Michael Berthold (KNIME): Data Science for the Masses: Mission Impossible?
12:00pm - Amphitheater 204 - Hank Roark (H2O): Fast, Distributed Machine Learning for Python using H2O
2:00pm - Conference 301 - Deigo Oppenheimer (Algorithmia): Algorithm Marketplaces and the new "algorithm economy"
2:00pm - Amphitheater 204 - Ryan Mitchell (LinkeDrive): Spelunking the Web with Python: Writing Scrapers for Any Situation
3:00pm - Classroom 203 - Michael Berthold (KNIME): Blending Tools and Data in KNIME: From .csv, R, and Python to Spark, MLlib and Hive
4:00pm - Salon C - Joel Grus (Google): Learning (and Teaching) Data Science from First Principles
5:00pm - Salon C - Sarah Guido (Bitly): Spark: The Good, the Bad, and the Ugly

Kafka

10:00am -Classroom 106 - Chris Fregly (IBM): Spark After Dark 1.6
11:00am - Salon C - Eric Sammer (Rocana): High cardinality time series search: a new level of scale
12:00pm - Salon C - Jay Kreps (Confluent): Apache Kafka and the Stream Data Platform
3:00pm - Salon C - Patrick McFadin (DataStax): Laying down the SMACK on your data pipelines
4:00pm - Amphitheater 304 - jonathan Gray (Cask): Data Lake Architectures and the fast path with Cask Hydrator
5:00pm - Amphitheater 204 - Fangjin Yang (Imply) Open Source Lambda Architecture with Kafka, Samza, Hadoop, and Druid

Machine Learning

10:00am - Classroom 202 - Preetha Appan (Indeed): Building Recommendations at Scale: Lessons Learned
11:00am - Classroom 106 - Michelle Casbon (Idibon): Under the Hood of Idibon’s Scalable NLP Services
12:00pm - Amphitheater 204 - Hank Roark (H2O): Fast, Distributed Machine Learning for Python using H2O
5:00pm - Amphitheater 204 - Fangjin Yang (Imply) Open Source Lambda Architecture with Kafka, Samza, Hadoop, and Druid

Time Series Data

11:00am - Salon C - Eric Sammer (Rocana): High cardinality time series search: a new level of scale
12:00pm - Salon DE - Patrick McFadin (DataStax): Storing Time Series Data with Apache Cassandra
3:00pm - Conference 301 - Fintan Quill (KX): Time-series analytics for Big Data and IoT

Natural Language Processing / Text Analytics

10:00am - Amphitheater 204 - Rob Monro: NLP Keynote - Sentiment Analysis is a Market for Lemons ... Here's How to Fix it
11:00am - Classroom 106 - Michelle Casbon (Idibon): SUnder the Hood of Idibon’s Scalable NLP Services

2:00pm - Classroom 106 - Christopher Moody (Stitchfix): TBA
3:00pm - Classroom 106 - Nick Gaylord (Idibon): Starting from scratch: Exploring and analyzing text data in Idibon Studio
4:00pm - Classroom 101 - Jason Kessler (CDK): From Sentiment to Persuasion Analysis
4:00pm - Classroom 106 - William Lyon (Neo4j): Natural Language Processing With Graph Databases
5:00pm - Classroom 101 - Brent Schneeman (Homeaway): TBA
5:00pm - Classroom 106 - Lukas Biewald: TBA

Teams / Organization

10:00am - Salon DE - John Akred (Silicon Valley Data Science): Running Agile Data Science Teams
3:00pm - Salon DE - Carl Anderson (Warby Parker): Creating a Data-Driven Organization

Graph Data / Processing

2:00pm - Salon DE - Luca Garulli (OrientDB: Polyglot Persistence vs Multi-Model Databases
4:00pm - Classroom 106 - William Lyon (Neo4j): Natural Language Processing With Graph Databases
5:00pm - Classroom 203 - Nakul Jeirath (Wellaware) A Journey from Relational to Graph Database

Cassandra

12:00pm - Salon DE - Patrick McFadin (DataStax): Storing Time Series Data with Apache Cassandra
3:00pm - Salon C - Patrick McFadin (DataStax): Laying down the SMACK on your data pipelines
5:00pm - Classroom 203 - Nakul Jeirath (Wellaware) A Journey from Relational to Graph Database

Python

2:00pm - Amphitheater 204 - Ryan Mitchell (LinkeDrive): Spelunking the Web with Python: Writing Scrapers for Any Situation
3:00pm - Classroom 203 - Michael Berthold (KNIME): Blending Tools and Data in KNIME: From .csv, R, and Python to Spark, MLlib and Hive

R

10:00am - Classroom 203 - Hadley Wickham (RStudio): Expressing yourself with R (2 hours)
3:00pm - Classroom 203 - Michael Berthold (KNIME): Blending Tools and Data in KNIME: From .csv, R, and Python to Spark, MLlib and Hive

Databases: SQL / NoSQL

2:00pm - Salon DE - Luca Garulli (OrientDB: Polyglot Persistence vs Multi-Model Databases
2:00pm - Classroom 202 - Ed Capriolo (Huffington Post) Building a NoSQL database from scratch

Ad Analytics

Conference 301 - Claudia Perlich (Dstillery): Eulogy to the Click: How predictive modeling and big data is killing our favorite metrics!
11:00am -Salon DE - Ben Reiter: Delivering Real Time Analytics for Mobile Ad Serving