Sunday, December 29, 2013

Big Data: Hadoop 2.x (YARN/MRv2) - Single Node Installation

Apache Hadoop 2/ YARN/MR2 Installation for Beginners :

Big Data spans three dimensions: Volume, Velocity and Variety. (IBM defined 4th  dimension or property of Big Data i.e Veracity). Apache™ Hadoop® is an open source software project that enables the distributed processing of large data sets (Big Data) across clusters of commodity Machines(Low-cost Servers). It is designed to scale up to thousands of machines, with a  high degree of fault tolerance and software has the intelligence to detect & handle the failures at the application layer.
NOTE: More details are available@
  • The Apache Hadoop component introduced two new terms for Hadoop 1.0 users - MapReduce2 (MR2) and YARN. 
  • Apache Hadoop YARN is the next-generation Hadoop framework designed  to take Hadoop beyond MapReduce for data-processing- resulted in better cluster utilization that  permit Hadoop to scale to accommodate more and larger jobs. 
  • This blog provides information for users to migrate their Apache Hadoop MapReduce applications from Apache Hadoop 1.x to Apache Hadoop 2.x

Steps to Install Hadoop2.0 on CentOS/RHEL6 on single node Cluster setup:

Step1:  Install Java from link :
            Set the environment variable $JAVA_HOME  properly

NOTE: Java-1.6.0-openjdk  OR other Hadoop Java Versions  listed in a below link are  more preferable. 


Step2: Download Apache Hadoop2.2  to folder /opt from link :

Step 3: Add all hadoop and java environment path variables  to .bashrc  file.
Example :
                 Configure $HOME/.bashrc
                         -  HODOOP_HOME
                         -  JAVA_PATH
                         -  PATH
                         -  HADOOP_HDFS_HOME
                         -  HADOOP_YARN_HOME
                         -  HADOOP_MAPRED_HOME
                         -  HADOOP_CONF_DIR
                         -  YARN_CLASS_PATH
Step 4 : Create a separate Group for Hadoop setup
              # groupadd hadoop

Step 5:  Add  3 user-accounts  in  Group "hadoop"
              #  useradd   -g   hadoop   yarn
              #  useradd   -g   hadoop   hdfs
              #  useradd   -g   hadoop   mapred

          NOTE: Its  good to run  daemons with a  related accounts

Step 6:  Create Data Directories for namenode,datanode and secondary namenode
              #  mkdir  -p  /var/data/hadoop/hdfs/nn
              #  mkdir  -p  /var/data/hadoop/hdfs/dn
              #  mkdir  -p  /var/data/hadoop/hdfs/snn

Step 7:  Set permission for "hdfs" account
           #  chown  hdfs:hadoop  /var/data/hadoop/hdfs  -R

Step 8: Create Log Directories
             #  mkdir  -p /var/log/hadoop/yarn
             #  mkdir logs  (at installation directory  Example /opt/hadoop2.2.0/logs)

Step 9:  Set ownership to yarn

              #  chown  yarn:hadoop   /var/log/hadoop/yarn -  R

              Go to  Hadoop directory  "/opt/hadoop2.2.0/  "

              # chmod g+w logs
              # chown yarn:hadoop  . -R

Step 10: Configure  below listed XML files  at  $HADOOP_PREFIX/etc/hadoop

i)  core-site.xml
iii)  hdfs-site.xml
iv)   mapred-site.xml
vi)   yarn-site.xml

Step 11: Create a passwordless ssh session for "hdfs" user account :
   #  su - hdfs
   hdfs@localhost$    ssh-keygen -t rsa
   hdfs@localhost$   cat ~/.ssh/ >> ~/.ssh/authorized_keys
   hdfs@localhost$    chmod 0600 ~/.ssh/authorized_keys

     ssh-copy-id -i /home/user1/.ssh/ hostname1
     ssh-copy-id -i /home/user1/.ssh/ hostname2
     ssh-copy-id -i /home/user1/.ssh/ hostname3
Step 12:
  Now you are allowed to  login without prompting for the password :
[hdfs@localhost]$ ssh localhost
Last login: Sun Dec 29 04:31:44 2013 from localhost
[hdfs@localhost ~]$


Step 13:  Format  Hadoop File system :
Format the NameNode directory as the HDFS superuser ( "hdfs" user account)
#su - hdfs
$ cd /opt/hadoop2.2/bin
$./hdfs namenode -format

It should show the message : /var/data/hadoop/hdfs/nn has been successfully formated as shown below:

[hdfs@localhost bin]$ ./hdfs namenode -format
13/12/29 02:36:52 INFO namenode.NameNode: STARTUP_MSG:
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = localhost.localdomain/
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.2.0
STARTUP_MSG:   classpath = /opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop-2.2.0/share/hadoop/common/lib/commons-el-1.0.jar:

STARTUP_MSG:   java = 1.7.0_45
13/12/29 02:36:52 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /opt/hadoop-2.2.0/lib/native/ which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
13/12/29 02:36:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-d47a364a-edc6-455f-b3c8-4d2ba54458d5
13/12/29 02:36:54 INFO namenode.HostFileManager: read includes:
13/12/29 02:36:54 INFO namenode.HostFileManager: read excludes:
13/12/29 02:36:54 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
13/12/29 02:36:54 INFO util.GSet: Computing capacity for map BlocksMap
13/12/29 02:36:54 INFO util.GSet: VM type       = 64-bit
13/12/29 02:36:54 INFO util.GSet: 2.0% max memory = 96.7 MB
13/12/29 02:36:54 INFO util.GSet: capacity      = 2^18 = 262144 entries
13/12/29 02:36:54 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
13/12/29 02:36:54 INFO blockmanagement.BlockManager: defaultReplication         = 1
13/12/29 02:36:54 INFO blockmanagement.BlockManager: maxReplication             = 512
13/12/29 02:36:54 INFO blockmanagement.BlockManager: minReplication             = 1
13/12/29 02:36:54 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
13/12/29 02:36:54 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
13/12/29 02:36:54 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
13/12/29 02:36:54 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
13/12/29 02:36:54 INFO namenode.FSNamesystem: fsOwner             = hdfs (auth:SIMPLE)
13/12/29 02:36:54 INFO namenode.FSNamesystem: supergroup          = supergroup
13/12/29 02:36:54 INFO namenode.FSNamesystem: isPermissionEnabled = true
13/12/29 02:36:54 INFO namenode.FSNamesystem: HA Enabled: false
13/12/29 02:36:54 INFO namenode.FSNamesystem: Append Enabled: true
13/12/29 02:36:54 INFO util.GSet: Computing capacity for map INodeMap
13/12/29 02:36:54 INFO util.GSet: VM type       = 64-bit
13/12/29 02:36:54 INFO util.GSet: 1.0% max memory = 96.7 MB
13/12/29 02:36:54 INFO util.GSet: capacity      = 2^17 = 131072 entries
13/12/29 02:36:54 INFO namenode.NameNode: Caching file names occuring more than 10 times
13/12/29 02:36:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
13/12/29 02:36:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
13/12/29 02:36:54 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
13/12/29 02:36:54 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
13/12/29 02:36:54 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
13/12/29 02:36:54 INFO util.GSet: Computing capacity for map Namenode Retry Cache
13/12/29 02:36:54 INFO util.GSet: VM type       = 64-bit
13/12/29 02:36:54 INFO util.GSet: 0.029999999329447746% max memory = 96.7 MB
13/12/29 02:36:54 INFO util.GSet: capacity      = 2^12 = 4096 entries
13/12/29 02:36:55 INFO common.Storage: Storage directory /var/data/hadoop/hdfs/nn has been successfully formatted.
13/12/29 02:36:56 INFO namenode.FSImage: Saving image file /var/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
13/12/29 02:36:56 INFO namenode.FSImage: Image file /var/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 196 bytes saved in 0 seconds.
13/12/29 02:36:56 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
13/12/29 02:36:56 INFO util.ExitUtil: Exiting with status 0
13/12/29 02:36:56 INFO namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/
[hdfs@localhost bin]$

Step 14:  Start HDFS service - Namenode Daemon process

$cd  ../sbin
[hdfs@localhost bin]$ cd ../sbin/
[hdfs@localhost sbin]$ ./ start namenode
starting namenode, logging to /opt/hadoop-2.2.0/logs/hadoop-hdfs-namenode localhost.localdomain.out

Step 15: Check the status of namenode daemon
[hdfs@localhost ]$ jps
4537 Jps
4300 NameNode   =====> started successfully

[hdfs@localhost sbin]$ ps -ef | grep java
hdfs      4300     1 11 02:38 pts/1    00:00:04 /usr/java/default/bin/java -Dproc_namenode -Xmx100m -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop-hdfs-namenode-localhost.localdomain.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS org.apache.hadoop.hdfs.server.namenode.NameNode
Step 16 :  Start HDFS service - Secondary Namenode Daemon process

[hdfs@localhost sbin]$ ./ start secondarynamenode
starting secondarynamenode, logging to /opt/hadoop-2.2.0/logs/hadoop-hdfs-secondarynamenode-localhost.localdomain.out
[hdfs@localhost sbin]$

Step 17 : Check the status of Secondarynamenode daemon
[hdfs@localhost bin]$ jps
4300 NameNode
4913 SecondaryNameNode ======> started successfully

[hdfs@localhost sbin]$ ps -ef | grep java | grep 4913
 hdfs      4913     1  7 02:46 pts/1    00:00:04 /usr/java/default/bin/java -Dproc_secondarynamenode -Xmx100m -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop-hdfs-secondarynamenode-localhost.localdomain.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS -Dhdfs.audit.logger=INFO,NullAppender,RFAS org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
 Step 18: Start HDFS service - DataNode Daemon process
[hdfs@localhost sbin]$ ./ start datanode
starting datanode, logging to /opt/hadoop-2.2.0/logs/hadoop-hdfs-datanode-localhost.localdomain.out
 [hdfs@localhost sbin]$

Step 19: Check the status of Datanode daemon
[hdfs@localhost bin]$ jps
4300 NameNode
4913 SecondaryNameNode
4949 Jps
4373 DataNode ======> started successfully

 [hdfs@localhost sbin]$ ps -ef | grep java | grep 4373
hdfs      4373     1 34 02:39 pts/1    00:00:06 /usr/java/default/bin/java -Dproc_datanode -Xmx100m -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=hadoop-hdfs-datanode-localhost.localdomain.log -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dhadoop.policy.file=hadoop-policy.xml -server,RFAS,RFAS,RFAS,RFAS org.apache.hadoop.hdfs.server.datanode.DataNode
Step 20:Start YARN  service - resourcemanager Daemon process

[hdfs@localhost sbin]$ ./ start resourcemanager
starting resourcemanager, logging to /opt/hadoop-2.2.0/logs/yarn-hdfs-resourcemanager-localhost.localdomain.out

Step 21 : Check the status of  ResourceManager daemon
[hdfs@localhost bin]$  jps
4300 NameNode
4913 SecondaryNameNode
4949 Jps
4373 DataNode
4500 ResourceManager  ======> started successfully

 [hdfs@localhost sbin]$ ps -ef | grep java | grep 4500
hdfs      4500     1  3 02:41 pts/1    00:00:08 /usr/java/default/bin/java -Dproc_resourcemanager -Xmx200m -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dyarn.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hdfs-resourcemanager-localhost.localdomain.log -Dyarn.log.file=yarn-hdfs-resourcemanager-localhost.localdomain.log -Dyarn.home.dir= -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dyarn.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hdfs-resourcemanager-localhost.localdomain.log -Dyarn.log.file=yarn-hdfs-resourcemanager-localhost.localdomain.log -Dyarn.home.dir=/opt/hadoop-2.2.0 -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -classpath /opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/share/hadoop/common/lib/*:/opt/hadoop-2.2.0/share/hadoop/common/*:/opt/hadoop-2.2.0/share/hadoop/hdfs:/opt/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/opt/hadoop-2.2.0/share/hadoop/hdfs/*:/opt/hadoop-2.2.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.2.0/share/hadoop/yarn/*:/opt/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.2.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/opt/hadoop-2.2.0/share/hadoop/yarn/*:/opt/hadoop-2.2.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.2.0/etc/hadoop//rm-config/ org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

Step 22:Start YARN  service - NodeManager Daemon process
[hdfs@localhost sbin]$ ./ start nodemanager
starting nodemanager, logging to /opt/hadoop-2.2.0/logs/yarn-hdfs-nodemanager-localhost.localdomain.out
[hdfs@localhost sbin]$

Step 23 : Check the status of  Nodemanager daemon
[hdfs@localhost bin]$ jps
4300 NameNode  
4744 NodeManager  ======> started successfully
4913 SecondaryNameNode
4949 Jps
4373 DataNode
4500 ResourceManager
[root@localhost bin]#

 [hdfs@localhost sbin]$ ps -ef | grep java | grep 4744
hdfs      4744     1  2 02:42 pts/1    00:00:03 /usr/java/default/bin/java -Dproc_nodemanager -Xmx200m -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dyarn.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hdfs-nodemanager-localhost.localdomain.log -Dyarn.log.file=yarn-hdfs-nodemanager-localhost.localdomain.log -Dyarn.home.dir= -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dhadoop.log.dir=/opt/hadoop-2.2.0/logs -Dyarn.log.dir=/opt/hadoop-2.2.0/logs -Dhadoop.log.file=yarn-hdfs-nodemanager-localhost.localdomain.log -Dyarn.log.file=yarn-hdfs-nodemanager-localhost.localdomain.log -Dyarn.home.dir=/opt/hadoop-2.2.0 -Dhadoop.home.dir=/opt/hadoop-2.2.0 -Dhadoop.root.logger=INFO,RFA -Dyarn.root.logger=INFO,RFA -Djava.library.path=/opt/hadoop-2.2.0/lib/native -classpath /opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/etc/hadoop/:/opt/hadoop-2.2.0/share/hadoop/common/lib/*:/opt/hadoop-2.2.0/share/hadoop/common/*:/opt/hadoop-2.2.0/share/hadoop/hdfs:/opt/hadoop-2.2.0/share/hadoop/hdfs/lib/*:/opt/hadoop-2.2.0/share/hadoop/hdfs/*:/opt/hadoop-2.2.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.2.0/share/hadoop/yarn/*:/opt/hadoop-2.2.0/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.2.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/contrib/capacity-scheduler/*.jar:/opt/hadoop-2.2.0/share/hadoop/yarn/*:/opt/hadoop-2.2.0/share/hadoop/yarn/lib/*:/opt/hadoop-2.2.0/etc/hadoop//nm-config/ org.apache.hadoop.yarn.server.nodemanager.NodeManager ________________________________________________________
Step 24: This command gives you information on  hdfs system

[hdfs@localhost bin]$ ./hadoop dfsadmin -report

Configured Capacity: 16665448448 (15.52 GB)
Present Capacity: 12396371968 (11.55 GB)
DFS Remaining: 12396347392 (11.54 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Datanodes available: 1 (1 total, 0 dead)
Live datanodes:

Name: (localhost)
Hostname: localhost
Decommission Status : Normal
Configured Capacity: 16665448448 (15.52 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 4269076480 (3.98 GB)
DFS Remaining: 12396347392 (11.54 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.38%
Last contact: Sun Dec 29 03:11:02 PST 2013
[hdfs@localhost bin]$

Step25:  Stop all the services by running "  "

[hdfs@localhost sbin]$ ./
This script is Deprecated. Instead use and
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [] stopping secondarynamenode
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
[hdfs@localhost sbin]$
Step 26:  Start all the services by running " " 

Added the YARN architecture block diagram to  locate the presence of daemons in different components .

[hdfs@localhost sbin]$ ./

check the status of  all services :

[hdfs@localhost sbin]$ jps
6161 NameNode
6260 DataNode
6719 NodeManager
6750 Jps
6355 SecondaryNameNode
6429 ResourceManager
[root@localhost bin]#

                                     Job Definition and control Flow  between Hadoop/Yarn components:

Step 27:  Run sample application program "pi"  from hadoop-mapreduce-examples-2.2.0.jar

First test with hadoop  to run  existing hadoop program  -  launch the program, monitor progress, and get/put files on the HDFS. This program calculates the value of " pi " in parallel  i.e  2 maps  with 10 samples:

              $ hadoop jar   /usr/lib/hadoop/hadoop-examples.jar    pi    2   10 

[hdfs@localhost bin]$ ./hadoop jar ../share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 2 10
Number of Maps  = 2
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Starting Job
13/12/29 04:33:12 INFO client.RMProxy: Connecting to ResourceManager at /
13/12/29 04:33:13 INFO input.FileInputFormat: Total input paths to process : 2
13/12/29 04:33:13 INFO mapreduce.JobSubmitter: number of splits:2
13/12/29 04:33:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1388320369543_0001
13/12/29 04:33:15 INFO impl.YarnClientImpl: Submitted application application_1388320369543_0001 to ResourceManager at /
13/12/29 04:33:15 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1388320369543_0001/
13/12/29 04:33:15 INFO mapreduce.Job: Running job: job_1388320369543_0001
13/12/29 04:33:38 INFO mapreduce.Job: Job job_1388320369543_0001 running in uber mode : false
13/12/29 04:33:38 INFO mapreduce.Job:  map 0% reduce 0%
13/12/29 04:35:22 INFO mapreduce.Job:  map 83% reduce 0%
13/12/29 04:35:23 INFO mapreduce.Job:  map 100% reduce 0%
13/12/29 04:36:10 INFO mapreduce.Job:  map 100% reduce 100%
13/12/29 04:36:16 INFO mapreduce.Job: Job job_1388320369543_0001 completed successfully
13/12/29 04:36:16 INFO mapreduce.Job: Counters: 43
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=238681
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=528
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=208977
        Total time spent by all reduces in occupied slots (ms)=39840
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=292
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=1712
        CPU time spent (ms)=3320
        Physical memory (bytes) snapshot=454049792
        Virtual memory (bytes) snapshot=3515953152
        Total committed heap usage (bytes)=268247040
    Shuffle Errors
    File Input Format Counters
        Bytes Read=236
    File Output Format Counters
        Bytes Written=97
Job Finished in 184.356 seconds

Estimated value of Pi is 3.80000000000000000000
[hdfs@localhost bin]$
Step 28 : Verify the  Running Services Using the Web Interface:

Web Interface for the resource Manager can be viewed by
Shows the running application on single node cluster

Application Overview -Final Status( FINISHED)

Step 29 :   Create a Directory on HDFS 

[hdfs@localhost bin]$ ./hadoop fs -mkdir test1
Step 30:   Put local file "hellofile" into HDFS (/test1)

[hdfs@localhost bin]$ ./hadoop fs -put hellofile /test1

Step 31: Check the  input file "hellofile" on HDFS

[hdfs@localhost bin]$ ./hadoop fs -ls /test1
Found 1 items
-rw-r--r--   1 hdfs supergroup        113 2013-12-29 04:56 /test1/hellofile
[hdfs@localhost bin]$
Step 32: Run application program "WordCount"  from hadoop-mapreduce-examples-2.2.0.jar

WordCount Example:
WordCount example reads text files and counts how often words occur. The input is text files and the output is text files, each line of which contains a word and the count of how often it occured, separated by a tab.Each mapper takes a line as input and breaks it into words. It then emits a key/value pair of the word and 1. Each reducer sums the counts for each word and emits a single key/value with the word and sum.

To run the example, the command syntax is
bin/hadoop jar hadoop-*-examples.jar wordcount <in-dir> <out-dir>

All of the files in the input directory (called in-dir in the command line above) are read and the counts of words in the input are written to the output directory (called out-dir above).It is assumed that both inputs and outputs are stored in HDFS.If your input is not already in HDFS, but is rather in a local file system somewhere, you need to copy the data into HDFS as shown in above steps 29-31.
NOTE: Similarly you could think of processing  bigger Data Files ( Weather data , Healthcare data, Machine Log data ...etc).