Sunday, January 12, 2014

Big Data Revolution and Vision ........!!!

Big Data is THE biggest buzzwords  around at the moment and  definitely big data will change the world. Big Data refers to data sets that are too large to be processed and analyzed by traditional IT technologies.

The Big Data Universe is changing right before our eyes and  beginning to explode.Big data absolutely has the potential to change the way governments, organizations, and academic institutions conduct business and make discoveries, and its likely to change how everyone lives their day-to-day lives.In the next five years, we’ll generate more data as humankind than we generated in the previous 5,000 years ...!!! Records and data exist in electronic digital form generated by mobile communications to surveillance cameras to emails to web sites to transaction receipts; it can combine daily news, social media feeds and videos.
What is big data?
Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. 

Gartner defines Big Data as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. According to IBM, 80% of data captured today is unstructured, from sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals, to name a few. All of this unstructured data is Big Data.

In other words, Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges include capture, curation, storage,search, sharing, transfer, analysis and visualization. The trend to larger data sets is due to the additional information (VALUE) derivable from analysis of a single large set of related data allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.

What does Hadoop solve?

  • Organizations are discovering that important predictions can be made by sorting through and analyzing Big Data.
  • However, since 80% of this data is "unstructured", it must be formatted (or structured) in a way that makes it suitable for data mining and subsequent analysis.
  • Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful for analytics purposes.
In 2004, Google published a paper on a process called MapReduce that used such an architecture. MapReduce framework provides a parallel processing model and associated implementation to process huge amount of data. With MapReduce, queries are split and distributed across parallel nodes and processed in parallel (the Map step). The results are then gathered and delivered (the Reduce step). The framework was incredibly successful, so others wanted to replicate the algorithm. Therefore, an implementation of MapReduce framework was adopted by an Apache open source project named Hadoop. Click here to download :MapReduce: Simpli ed Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat.

Big data spans four dimensions -The 4 Vs that characterize big data:

  • Volume – the vast amounts of data generated every second -Example: terabytes, Records, Transactions,Tables and files 
  • Velocity – the speed at which new data is generated and moves around (credit card fraud detection is a good example where millions of transactions are checked for unusual patterns in almost real time) -Example: Batch , Near time,Real time and Streams
  • Variety – the increasingly different types of data (from financial data to social media feeds, from photos to sensor data, from video capture to voice recordings)-Example :  structured, unstructured, semi structured and all 3 types.
  • Veracity – the messiness of the data (just think of Twitter posts with hash tags, abbreviations, typos and colloquial speech) 
Source link:
How the Big Data Explosion Is Changing the World ?
 Big data is the term increasingly used to describe the process of applying serious computing power – the latest in machine learning and artificial intelligence – to seriously massive and often highly complex sets of information. Big data can be comparing utility costs with meteorological data to spot trends and inefficiencies. Big data can be comparing ambulance GPS information with hospital records on patient outcomes to determine the correlation between response time and survival and can also be the tiny device you wear to track your movement, calories and sleep to track your own personal health and fitness. Our daily lives generate an enormous collection of data.Whether you’re surfing the Web, shopping at the store, driving your smart car around town, boarding an airplane, visiting a doctor, attending class at university, each day you are generating a variety of data.The benefit of the data depends on where and to whom you’re talking to - a lot of the ultimate potential is in the ability to discover potential connections, and to predict potential outcomes in a way that wasn’t really possible before.With more data than ever available in digital form, progressively inexpensive data storage, and more advanced computers at the ready to help process and analyze it all.Companies believe that big data has the power to drive practical insights that just weren’t possible before. It’s about managing all that data and providing tools that enable everyone to answers questions– questions they might not have even known they had. IBM CEO Ginni Rometty says big data and predictive decisions will reshape organizations, and computers that learn, like Watson, will be tech's next big wave. Its a vision of future .A hospital uses rapid gene sequencing to stop an outbreak of antibiotic resistant bacteria, saving lives. A railroad company gets an alert from a train’s sensor that a preventative fix is needed, saving the cost and time of removing the train from the tracks later. A university notices a student’s activity level has started to drop to a level consistent with dropouts, and reaches out to assist.

Classic UseCases and its implementation in real-time scenarios : 
----------------------------------------------------------------------------
1) Retailers can exploit the data to track sales and consumer behavior, in store and online; 

2) Health professionals and epidemiologists trying to predict the spread of disease combine data from  health services, border agencies and a variety of other sources.

3) The London Olympics will analyze big data to establish traffic patterns, policing needs and potential terrorist threats. 

4) The finance sector seeks to exploit one of the most valuable mother lodes of data through powerful tools that can make sense of patterns in news, trading activities and other more esoteric sources.

5) India’s Unique identification project [Aadhaar project], spearheaded by NandanNilekani, will collect and process billions of data, to provide identification for each resident across the country and would be used primarily as the basis for efficient delivery of welfare services. It would also act as a tool for effective monitoring of various programs and schemes of the Government.

6) From developing strategies for cricket teams to anylyze the bowling patterns , pitch behavior, detecting Match Fixing issues ...etc

7) Predicting a crime -Chicago Designing Predictive Software Platform to Identify Crime Patterns. Beyond the public safety uses, the platform could also help officials make better decisions for city services like restaurant inspections, snow plowing or garbage delivery.........etc !!!

Data scientists are building specialized systems that can read through billions of bits of data, analyze them via self-learning algorithms and package the insights for immediate use.
------------------------------------------

In the next few years millions of big data-related IT jobs will be created worldwide and  there is a major shortage of the “analytical and managerial talent necessary to make the most of big data.The United States alone faces a shortage of more than 140,000 workers with big data skills as well as up to 1.5 million managers and analysts needed to analyze and make decisions based on big data findings.
 ---------------------------------------------------------------------
Click here - Overview of apache Hadoop 
Click here - Watson - Era of cognitive Computing