Wednesday, May 25, 2011

IBM’s Big Data Platform

IBM has recently announced a new strategy for bringing Big Data to the enterprise. In particular this includes InfoSphere Streams v2 and InfoSphere BigInsights 1.1. Big Data is an issue, of course, largely because the amount of data available to organizations is growing rapidly. Surveys show that many managers already make decisions based on data they don’t trust and that many don’t have the right data they need – this is why 83% of CIOs cite business analytics as a top issue and 60% of CEOs think that they need to better use data. Lots of this data, as with all data, is unstructured. This rising volume will make existing data challenges worse unless organizations can bring together more data from more sources to make better decisions.

So why BigData?
Well, it really comes down to what we call the three V's the volume of data, the velocity of data, and the variety of data.
Volume - Scale from terabytes to zettabytes
Velocity - Streaming data and large volume data movement
Variety - Manage the complexity of multiple relational and non-relational data types and schemas

The Challenge is to bring together a large volume and variety of data to find new insights.
* Identify criminals, networks, and threats from disparate video, audio, and data feeds.
* Make risk decisions based on real-time transactional data.
* Multi-channel customer sentiment and experience a analysis.

While we at IBM feels that a new platform is called for, it should not be a silo – Big Data should be a permanent part of the information architecture and should be used alongside more traditional data management and analysis tools. The key requirements for the platform :

* Support the variety, velocity and volume of Big Data.

* Provide analytics for data in its native format and adjust analysis automatically Text, video, image, time series, statistics, data mining, geospatial etc. Must be able to do predictive analytics too and the platform allows all the data to be used to build a model rather than just a sample or recent records (this generally improves the accuracy of models and one customer went from using 30 days in its fraud models to using 7 years.

* Provide ease of use for developers and users.

* Enterprise class with failure tolerance and scale.

* Integration capabilities to bring in lots of sources and leverage existing integration technologies. Support for governance and incorporation of Big Data insights in the data warehouse.

The platform is based on open source foundational components (Hadoop, HBase, Pig, Lucene, Jaql) with two Big Data Enterprise Engines – Streaming Analytics and Internet Scale Analytics – on top. User environments for administrators, developers and end users are layered on top and all of this plugs in to the usual integration products and solutions. IBM is contributing to various open source projects based on this work, notably jaql.

Specific products announced:

InfoSphere BigInsights 1.1
IBM InfoSphere BigInsights is an analytics platform built on top of Apache Hadoop open framework for storing, managing and gaining insights from Internet-scale data. InfoSphere BigInsights provide capabilities for both IT and Line of Business to quickly get up and running.

The platform leverages significant contributions from IBM Research and Emerging Technologies groups to deliver a robust big data platform that provides the following features and benefits:
*Internet-Scale storage, workloads and analytics for the enterprise
* Highly flexible workloads
* Java / Open Source based on open standards and supports Apache Hadoop and related projects
* Enables customer choice on hardware platforms including commodity hardware thereby lowering costs and enabling an entirely new scale of information to benefit from

Learn more about InfoSphere BigInsights

InfoSphere Streams 2.0
InfoSphere Streams is a high-performance computing platform that allows user-developed applications to rapidly ingest, analyze, and correlate information as it arrives from thousands of real-time sources.

Extends streaming analytics, simplifies development of streaming applications, and improves performance
* Runtime optimizations based on large numbers of Java virtual machines
* More operators and functions out of the box with analytics for text, data mining, statistics

Learn more about InfoSphere Streams 2.0

No comments:

Post a Comment