The chicken and egg of big data solutions
Before I came to O’Reilly I was building the “big data and disruptive analytics practice” at a major systems integrator. It was a blast to spend every week talking to customers in different industries...
View ArticleStrata Week: Google unveils its Knowledge Graph
Here’s what caught my attention in the data space this week. Google’s Knowledge Graph “Google does the semantic Web,” says O’Reilly’s Edd Dumbill, “except they call it the Knowledge Graph.” That...
View ArticleTop Stories: May 14-18, 2012
Here’s a look at the top stories published across O’Reilly sites this week. A federal judge learned to code The judge presiding over the Oracle/Google case learned Java, and that skill came in handy...
View ArticleStrata Week: Visualizing a better life
Here are a few of the data stories that caught my attention this week: Visualizing a better life How do you compare the quality of life in different countries? As The Guardian’s Simon Rogers points...
View ArticleStrata Week: Data prospecting with Kaggle
Here are a few of the data stories that caught my attention this week: Prospecting for data The data science competition site Kaggle is extending its features with a new service called Prospect....
View ArticleFour short links: 9 July 2012
Personalized Leukemia Treatment (NY Times) — sequenced the tumor’s DNA, found the misbehaving gene, realized there was an existing experimental treatment to tackle that gene, and it worked. Reminds me...
View ArticleHeavy data and architectural convergence
Recently I spent a day at the Hadoop Summit in San Jose. One session in particular caught my attention because it hints at a continued merging of the RDBMS and Hadoop worlds. EMC’s Lei Chang gave a...
View ArticleSeven reasons why I like Spark
A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month,...
View ArticleFour short links: 25 October 2012
Big Data: the Big Picture (Vimeo) — Jim Stogdill’s excellent talk: although Big Data is presented as part of the Gartner Hype Cycle, it’s an epoch of the Information Age which will have significant...
View ArticleFour short links: 12 November 2012
Teaching Programming to a Highly Motivated Beginner (CACM) — I don’t think there is any better way to internalize knowledge than first spending hours upon hours growing emotionally distraught over...
View ArticlePredicting the future: Strata 2014 hot topics
Conferences like Strata are planned a year in advance. The logistics and coordination required for an event of this magnitude takes a lot of planning, but it also takes a decent amount of prediction:...
View ArticleHow to analyze 100 million images for $624
Jetpac is building a modern version of Yelp, using big data rather than user reviews. People are taking more than a billion photos every single day, and many of these are shared publicly on social...
View ArticleFour short links: 23 May 2014
How to Educate Users (Luke Wroblewski) — help new users in your app, not in a video. Hardware By The Numbers (Renee DiResta) — slides from her keynote at the Solid conference. The mean success rate...
View ArticleInteractive Big Data analysis using approximate answers
Interactive query analysis for (Hadoop scale data) has recently attracted the attention of many companies and open source developers – some examples include Cloudera’s Impala, Shark, Pivotal’s HAWQ,...
View ArticleRunning batch and long-running, highly available service jobs on the same...
As organizations increasingly rely on large computing clusters, tools for leveraging and efficiently managing compute resources become critical. Specifically, tools that allow multiple services and...
View ArticleWorking in the Hadoop Ecosystem
I recently sat down with Mark Grover (@mark_grover), a Software Engineer at Cloudera, to talk about the Hadoop ecosystem. He is a committer on Apache Bigtop and a contributor to Apache Hadoop, Hive,...
View ArticleStream Processing and Mining just got more interesting
Largely unknown outside data engineering circles, Apache Kafka is one of the more popular open source, distributed computing projects. Many data engineers I speak with either already use it or are...
View ArticleDatabricks aims to build next-generation analytic tools for Big Data
Key technologists behind the Berkeley Data Analytics Stack (BDAS) have launched a company that will build software – centered around Apache Spark and Shark – for analyzing big data. Details of their...
View ArticleDealing with Data in the Hadoop Ecosystem
Kathleen Ting (@kate_ting), Technical Account Manager at Cloudera, and our own Andy Oram (@praxagora) sat down to discuss how to work with structured and unstructured data as well as how to keep a...
View ArticleAn Introduction to Hadoop 2.0: Understanding the New Data Operating System
By Rich Raposa Apache Hadoop 2.0 represents a generational shift in the architecture of Apache Hadoop. With YARN, Apache Hadoop is recast as a significantly more powerful platform – one that takes...
View Article
More Pages to Explore .....