Saturday, June 8, 2019

Hewlett Packard Enterprise to Acquire Supercomputer Pioneer Cray



Supercomputers have long been a mainstay of military and intelligence agencies, used for chores ranging from cracking codes to designing nuclear weapons. They have many civilian uses as well, like predicting weather, creating new drugs and simulating the effect of crashes on auto designs.Computing firms are racing to reach exascale performance, or a quintillion operations per second.

Hewlett Packard Enterprise will pay about $1.3 billion to acquire Cray, which has designed some of the most powerful computer systems in use. It will also enable HPE to start selling supercomputer components to corporate clients and others. HPE will integrate Cray’s supercomputer technology into its product portfolio and build an HPC-as-a-service and AI cloud services on HPE GreenLake. It’s synergistic. Because of this acquisition, HPE has access to one of the only differentiated stacks in the market and can now compete more aggressively in those five vertical markets. The combination of Cray and HPE creates an industry leader in the fast-growing High-Performance Computing (HPC) and AI markets and creates a number of opportunities that neither company would likely be able to capture on their own

Interestingly, Cray is actually the second supercomputer manufacturer picked up by HPE over its lifetime; the company also picked up the remaining assets of Silicon Graphics back in 2016. On November 1, 2016, Hewlett Packard Enterprise completed its acquisition of SGI (formerly Silicon Graphics ) for $275 million.

HPE is already the HPC server market leader, but by adding Cray into the fold, HPE strengthens its position against its two chief rivals, Dell EMC and IBM. Currently, IBM is the maker of the world’s two fastest supercomputers, Summit and Sierra. In 2018, HPE ranked first in HPC server sales with 34.8% market share, followed by Dell EMC with 20.8% and IBM with 7.1% of the market. All other HPC players, including Lenovo, Atos, Inspur, and Sugon, have single-digit market share, including Cray with 2.3%.

Cray, based in Seattle, traces its lineage to a company founded in 1972 in Minnesota by the computer designer Seymour Cray (“The father of supercomputing,”). That company was bought in 1996 by Silicon Graphics; it was sold in 2000 to Tera Computer, which adopted the Cray name.  Cray was impressively successful for a small company, but it’s more difficult as a standalone company against the bigger companies. Cray has morphed into an integrator and scale-out specialist, combining processors from the likes of Intel, AMD, and NVIDIA into supercomputers, and applying their own software, I/O, and interconnect technologies. The company is currently working with AMD on a US government-backed project to build the world's first exascale supercomputer at Oak Ridge National Laboratory. The system will be capable of a quintillion calculations per second.

Frontier Supercomputer at ORNL , partnership with AMD


Cray is currently contracted to build two of the world’s fastest supercomputers for two US Department of Energy Labs: the Oak Ridge National Laboratory and the Argonne National Laboratory. Both systems, one called Frontier being built in partnership with AMD and one called Aurora with Intel, are promised to bring so-called “exascale” performance, with raw performance power of the excess of 1.5 exfaflops, or a quintillion calculations per second. Both systems will support the converged use of analytics, AI, and HPC at extreme scale, using Cray’s new Shasta system architecture, software stack and Slingshot interconnect.

Aurora Supercomputer at ANL , partnership with with Intel

What worries some officials in the United States is the rapid rise of suppliers based in China. One of them, Lenovo, which bought former IBM hardware operations, led the rankings with 140 supercomputers installed. Two others, Inspur and Sugon, were second with 84 and third with 54, respectively. With the acquisition, HPE will get Cray’s current generation XC and CS supercomputers and the next-generation Shasta supercomputer that features a new high-speed interconnect, code-named Slingshot. Cray’s products also include high-performance storage, a full HPC system software stack, and data analytics and AI solutions that are fully integrated.

source

Most HPC hardware vendors today sell commoditized cluster stacks comprising of Infiniband or Ethernet-based networking. But IBM and HPE are the only HPC vendors that can differentiate themselves: IBM with its Power9 processors, and now HPE with Cray’s Slingshot and ClusterStor HPC storage hardware. Cray gives HPE a unique, technological edge,” he said. “Slingshot provides HPE with an extremely high-bandwidth network for HPC and AI workloads. It enables new network topologies that are required for extreme-scale HPC and AI. Supercomputers of this scale can be massively beneficial to data-intensive industries like astronomy, climate science, medicine, neuroscience, and physics. Increasingly, these systems can be used in artificial intelligence research, which, in turn, can help accelerate many other areas of scientific inquiry. That said, a supercomputer like Aurora or Frontier tends only to be built and financed by the government for, at least initially, military applications.

Thankfully, over time, these upcoming exascale supercomputers will likely be freed from the military apparatus and put to work divining new insights from data. According to HPE, the acquisition of Cray is primarily to help it gain an edge in AI research and the hardware required to train ever-larger neural nets. The company was also trying to make further inroads into the enterprise market, and HPE has experience and enterprise customers at a time when many businesses are recognizing the need to invest in HPC for AI, data analytics, and other business operations, such as marketing and ERP. The requirements in the commercial markets are pushing up into the HPC space. Cray was trying to get into the commercial market too, but they didn’t have the experience to do that, and now they are part of a company that has a big position in the commercial markets. For a lot of enterprises this is a new type of compute, but they have to start embracing it because workloads increasingly require MPC (Massively Parallel Computing). You can’t compete anymore without doing AI, big data analytics, and increasingly, simulation and modeling. They don’t necessarily want to purchase the hardware and put all that together. It’s complex. So if you can get it with a cloud-like consumption model as HPE GreenLake offers, then you turn it into an OpEx investment.


Broadly speaking, major acquisitions and mergers in the supercomputing space are rare events. Due to their ever-increasing price tag, only a small number of world-class supercomputers are sold each year. And due to these prices the buyers are often governments, which inevitably gives supercomputer construction a nationalistic element to it. None the less, because costs are increasing – Frontier is the US’s most expensive system yet at over $500M for the system alone. HPE’s plans include using Cray’s technologies to improve HPE GreenLake, the company’s HPC-as-a-Service offering



Sunday, May 12, 2019

Apache Hadoop 3.x installation on multinode cluster RHEL7 (ppc64le)


Hadoop is an open-source Apache project that allows creation of parallel processing applications on large data sets, distributed across networked nodes. It’s composed of the Hadoop Distributed File System (HDFS™) that handles scalability and redundancy of data across nodes, and Hadoop YARN: a framework for job scheduling that executes data processing tasks on all nodes.

Apache Hadoop 3.x Benefits

  •     Support multiple standby NameNodes.
  •     Supports multiple NameNodes for multiple namespaces.
  •     Storage overhead reduced from 200% to 50%.
  •     Support GPUs.
  •     Intra-node disk balancing.
  •     Support for Opportunistic Containers and Distributed Scheduling.
  •     Support for Microsoft Azure Data Lake and Aliyun Object Storage System file-system

Architecture of Hadoop Cluster: 

Apache hadoop has 2 core components .
1) HDFS - its for storage
2) YARN - its for computation 

You could see the HDFS and YARN architecture as shown below:


HDFS ARCHITECTURE

 
YARN ARCHITECTURE

Before configuring the master and worker nodes, it’s good to understand the different components of a Hadoop cluster. A master node keeps knowledge about the distributed file system, like the inode table on an ext3 filesystem, and schedules resources allocation. node-master will handle this role in this guide, and host two daemons:
•    The NameNode: manages the distributed file system and knows where stored data blocks inside the cluster are.
•    The ResourceManager: manages the YARN jobs and takes care of scheduling and executing processes on worker nodes.
Worker nodes store the actual data and provide processing power to run the jobs and will host two daemons:
•    The DataNode manages the actual data physically stored on the node;
•    The NodeManager manages execution of tasks on the node.



Prerequisites for Implementing Hadoop

  •     Operating system –  RHEL 7.6
  •     Hadoop – You require Hadoop 3.x package
  •     Passwordless SSH connections between nodes inthe cluster
  •     Firewall settings on machines in the cluster
  •     Machine details :
    Master node  : hadoopNode1 (Power8 server with K80 GPUs running RHEL7)
    Worker nodes: hadoopNode1 hadoopNode2 (Power8 server with K80 GPUs running RHEL7)

-------------------------------------  ------------------------------------------------------------------

Hadoop Installation Steps  :

Step 1: Download the Java 8 Package and Save this file in your home directory.

Java is the primary requirement for running Hadoop on any system.  They have compiled all the Hadoop jar files using Java 8 run time version. The user now has to install Java 8 to use Hadoop 3.0. And user having JDK 7 has to upgrade it to JDK 8.

If   your machine is IBM power architecture (ppc64le) , then you need to get IBM java package from this link copied below:

Download Link : https://developer.ibm.com/javasdk/downloads/sdk8/

Step 2: Extract the Java TarFile.

Step 3: Download the Hadoop 3.x Package.

Download stable version of Hadoop
wget http://apache.spinellicreations.com/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz

Step 4: Extract the Hadoop tar File.
Extract the files @ /home/users/sachinpb/sachinPB/

 tar xzvf hadoop-3.2.0.tar.gz

At the high level "/home/users/sachinpb/sachinPB/hadoop-3.2.0, you will see the following directories:

├── bin
│   ├── container-executor
│   ├── hadoop
│   ├── hadoop.cmd
│   ├── hdfs
│   ├── hdfs.cmd
│   ├── mapred
│   ├── mapred.cmd
│   ├── oom-listener
│   ├── test-container-executor
│   ├── yarn
│   └── yarn.cmd
├── etc
│   └── hadoop
│       ├── core-site.xml
│       ├── hadoop-env.sh
│       ├── hdfs-site.xml
│       ├── log4j.properties
│       ├── mapred-site.xml
│       ├── workers
│       ├── yarn-env.sh
│       └── yarn-site.xml
├── include
├── lib
│   └── native
│       ├── examples
│       ├── libhadoop.a
│       ├── libhadooppipes.a
│       ├── libhadoop.so -> libhadoop.so.1.0.0
│       ├── libhadoop.so.1.0.0
│       ├── libhadooputils.a
│       ├── libnativetask.a
│       ├── libnativetask.so -> libnativetask.so.1.0.0
│       └── libnativetask.so.1.0.0
├── logs
│ 
├── sbin
│   ├── hadoop-daemon.sh
│   ├── httpfs.sh
│   ├── mr-jobhistory-daemon.sh
│   ├── refresh-namenodes.sh
│   ├── start-all.sh
│   ├── start-balancer.sh
│   ├── start-dfs.sh
│   ├── start-secure-dns.sh
│   ├── start-yarn.sh
│   ├── stop-all.cmd
│   ├── stop-all.sh
│   ├── stop-balancer.sh
│   ├── stop-dfs.sh
│   ├── stop-secure-dns.sh
│   ├── stop-yarn.sh
│   ├── workers.sh
│   ├── yarn-daemon.sh
│ 
└── share
    ├── doc
    │   └── hadoop
    └── hadoop
        ├── client
        ├── common
        ├── hdfs
        ├── mapreduce
        ├── tools
        └── yarn


 Step 5:  Add the Hadoop and Java paths in the bash file (.bashrc).

update ~/.bashrc
export HADOOP_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_CONF_DIR=$HOME/sachinPB/hadoop-3.2.0/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_COMMON_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_HDFS_HOME=$HOME/sachinPB/hadoop-3.2.0
export YARN_HOME=$HOME/sachinPB/hadoop-3.2.0
export PATH=$PATH:$HOME/sachinPB/hadoop-3.2.0/bin

#set Java Home

export JAVA_HOME=/opt/ibm/java-ppc64le-80
export PATH=$PATH:/opt/ibm/java-ppc64le-80/bin

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

source .bashrc

 Step 6: Edit the Hadoop Configuration files as per your application requirements.
              HOW TO  CONFIGURE AND RUN BIG DATA APPLICATIONS ?

              Configuration files at : /home/users/sachinpb/sachinPB/hadoop-3.2.0/etc/Hadoop

 Step 7: Open core-site.xml and edit the property mentioned below inside configuration tag.

SET NAMENODE LOCATION

core-site.xml
-----------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopNode1:9000</value>
</property>
</configuration>
-----------------------------------------------------------------------------

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside configuration tag.

SET PATH FOR HDFS

hdfs-site.xml
-----------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/tmp/data/hadoop/hdfs/nn</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/tmp/data/hadoop/hdfs/dn</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>
----------------------------------------------------------------------------

Step 9: Edit the mapred-site.xml file and edit the property mentioned below inside the configuration tag.

SET YARN AS JOB SCHEDULER

mapred-site.xml
-------------------------------------------------------------------------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>
-----------------------------------------------------------------------------

Step 10: Edit yarn-site.xml and edit the property mentioned below inside configuration tag.

CONFIGURE YARN

yarn-site.xml
----------------------------------------------------------------
<?xml version="1.0"?>

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
</configuration>
-----------------------------------------------------------------

Step 11: Edit hadoop-env.sh/yarn-env.sh and add the Java Path as mentioned below.

 export JAVA_HOME=/opt/ibm/java-ppc64le-80
 export HADOOP_HOME=$HOME/sachinPB/hadoop-3.2.0

Step 12 : The file workers is used by startup scripts to start required daemons on all nodes. This a very new change in Hadoop3 ( Please check)

CONFIGURE SLAVES
workers
-------------------------------
hadoopNode1
hadoopNode2
---------------------------------

check Java and hadoop version

[sachinpb@hadoopNode1 hadoop]$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)


Check Hadoop version:
[sachinpb@hadoopNode1 hadoop]$ hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /storage/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
[sachinpb@hadoopNode1 hadoop]$

-----------------------------------------------

Step 13: Next,  format the NameNode.

HDFS needs to be formatted like any classical file system. On node-master, run the following command:  "hdfs namenode -format"

[sachinpb@hadoopNode1]$ hdfs namenode -format -clusterId CID-dd2f4c82-65d6-4c9b-a100-5d153c4a9512
2019-05-07 03:44:04,380 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoopNode1/$HOST1_IPADDRESS
STARTUP_MSG:   args = [-format, -clusterId, CID-dd2f4c82-65d6-4c9b-a100-5d153c4a9512]
STARTUP_MSG:   version = 3.2.0
STARTUP_MSG:   classpath = /home/users/sachinpb/sachinPB/hadoop-3.2.0/etc/hadoop:/home/users/sachinpb/sachinPB/hadoop-.
.
.
.
.

 [/tmp/data/hadoop/hdfs/nn/current/VERSION, /tmp/data/hadoop/hdfs/nn/current/seen_txid, /tmp/data/hadoop/hdfs/nn/current/fsimage_0000000000000000000.md5, /tmp/data/hadoop/hdfs/nn/current/fsimage_0000000000000000000, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000001-0000000000000000002, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000003-0000000000000000004, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000005-0000000000000000006, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000007-0000000000000000008, /tmp/data/hadoop/hdfs/nn/current/edits_inprogress_0000000000000000009]
2019-05-07 03:44:08,926 INFO common.Storage: Storage directory /tmp/data/hadoop/hdfs/nn has been successfully formatted.
2019-05-07 03:44:08,937 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
2019-05-07 03:44:09,063 INFO namenode.FSImageFormatProtobuf: Image file /tmp/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2019-05-07 03:44:09,089 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-05-07 03:44:09,104 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoopNode1/$HOST1_IPADDRESS
************************************************************/
[sachinpb@hadoopNode1 logs]$

------------------------------------------

Step 14: Once the NameNode is formatted, go to Hadoop-3.0/sbin directory and start all the daemons.

Your Hadoop installation is now configured and ready to run big data application

Step 15: Start hdfs and yarn daemons:

Form directory : /home/users/sachinpb/sachinPB/hadoop-3.2.0/sbin

NOTE: Copy the Hadoop home directory  across all the nodes in your cluster (if its not available on shared directory)

[sachinpb@hadoopNode1 sbin]$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as sachinpb in 10 seconds..
Starting namenodes on [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode1: namenode is running as process 146418. 
Starting datanodes
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode2: Welcome to hadoopNode2!
hadoopNode2:
hadoopNode1: datanode is running as process 146666.
hadoopNode2: datanode is running as process 112502. 
Starting secondary namenodes [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode1: secondarynamenode is running as process 147091.
[sachinpb@hadoopNode1 sbin]$

Step 16:  All the Hadoop services are up and running. [on other platforms, you could use jps command  to see hadoop daemons. IBM JAVA does not provide jps or jstat. So  you need to check the Hadoop process by ps command]

[sachinpb@hadoopNode1 sbin]$ ps -ef | grep NameNode
sachinpb   105015      1  0 02:59 ?        00:00:27 /opt/ibm/java-ppc64le-80/bin/java -Dproc_namenode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-namenode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-namenode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode
-----
sachinpb   105713      1  0 02:59 ?        00:00:12 /opt/ibm/java-ppc64le-80/bin/java -Dproc_secondarynamenode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-secondarynamenode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-secondarynamenode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
 [sachinpb@hadoopNode1 sbin]$ ps -ef | grep DataNode
sachinpb   105268      1  0 02:59 ?        00:00:19 /opt/ibm/java-ppc64le-80/bin/java -Dproc_datanode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-datanode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-datanode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
------
 [sachinpb@hadoopNode1 sbin]$ ps -ef | grep ResourceManager
sachinpb   106257      1  1 02:59 pts/3    00:00:50 /opt/ibm/java-ppc64le-80/bin/java -Dproc_resourcemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dservice.libdir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/yarn,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/yarn/lib,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/hdfs,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/hdfs/lib,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-resourcemanager-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-resourcemanager-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

[sachinpb@hadoopNode1 sbin]$ ps -ef | grep NodeManager
sachinpb   106621      1  1 02:59 ?        00:01:08 /opt/ibm/java-ppc64le-80/bin/java -Dproc_nodemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-nodemanager-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-nodemanager-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager

Ckeck similarly the status of hadoop daemons on other worker node [hadoopNode2]:

[sachinpb@hadoopNode2 ~]$ ps -ef | grep hadoop
sachinpb    77718      1  7 21:52 ?        00:00:07 /opt/ibm/java-ppc64le-80/bin/java -Dproc_datanode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-datanode-hadoopNode2.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-datanode-hadoopNode2.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
sachinpb    78006      1 12 21:52 ?        00:00:11 /opt/ibm/java-ppc64le-80/bin/java -Dproc_nodemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-nodemanager-hadoopNode2.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-nodemanager-hadoopNode2.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager
[sachinpb@hadoopNode2 ~]$

 Step 17:  Now open the Browser and go to localhost:9870/dfshealth.html to check the NameNode interface.

NOTE:  In Hadoop 2.x, web UI port is 50070 but in Hadoop3.x, it is moved to 9870. You can access HDFS web UI from localhost:9870


Step 18: run Hadoop application – Example: wordcount mapreduce program

[sachinpb@hadoopNode1 hadoop-3.2.0]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /user/sachinpb/helloworld /user/sachinpb/helloworld_out
2019-05-07 04:04:35,044 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2019-05-07 04:04:36,137 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/sachinpb/.staging/job_1557225898252_0003
2019-05-07 04:04:36,374 INFO input.FileInputFormat: Total input files to process : 1
2019-05-07 04:04:36,486 INFO mapreduce.JobSubmitter: number of splits:1
2019-05-07 04:04:36,536 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-05-07 04:04:36,728 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1557225898252_0003
2019-05-07 04:04:36,729 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-05-07 04:04:36,939 INFO conf.Configuration: resource-types.xml not found
2019-05-07 04:04:36,939 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-05-07 04:04:36,997 INFO impl.YarnClientImpl: Submitted application application_1557225898252_0003
2019-05-07 04:04:37,029 INFO mapreduce.Job: The url to track the job: http://hadoopNode1:8088/proxy/application_1557225898252_0003/
2019-05-07 04:04:37,030 INFO mapreduce.Job: Running job: job_1557225898252_0003
2019-05-07 04:04:45,137 INFO mapreduce.Job: Job job_1557225898252_0003 running in uber mode : false
2019-05-07 04:04:45,138 INFO mapreduce.Job:  map 0% reduce 0%
2019-05-07 04:04:51,189 INFO mapreduce.Job:  map 100% reduce 0%
2019-05-07 04:04:59,223 INFO mapreduce.Job:  map 100% reduce 100%
2019-05-07 04:04:59,232 INFO mapreduce.Job: Job job_1557225898252_0003 completed successfully
2019-05-07 04:04:59,348 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=41
                FILE: Number of bytes written=443547
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=126
                HDFS: Number of bytes written=23
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3968
                Total time spent by all reduces in occupied slots (ms)=4683
                Total time spent by all map tasks (ms)=3968
                Total time spent by all reduce tasks (ms)=4683
                Total vcore-milliseconds taken by all map tasks=3968
                Total vcore-milliseconds taken by all reduce tasks=4683
                Total megabyte-milliseconds taken by all map tasks=4063232
                Total megabyte-milliseconds taken by all reduce tasks=4795392
        Map-Reduce Framework
                Map input records=1
                Map output records=3
                Map output bytes=29
                Map output materialized bytes=41
                Input split bytes=109
                Combine input records=3
                Combine output records=3
                Reduce input groups=3
                Reduce shuffle bytes=41
                Reduce input records=3
                Reduce output records=3
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=288
                CPU time spent (ms)=4030
                Physical memory (bytes) snapshot=350552064
                Virtual memory (bytes) snapshot=3825860608
                Total committed heap usage (bytes)=177668096
                Peak Map Physical memory (bytes)=226557952
                Peak Map Virtual memory (bytes)=1911750656
                Peak Reduce Physical memory (bytes)=123994112
                Peak Reduce Virtual memory (bytes)=1914109952
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=17
        File Output Format Counters
                Bytes Written=23
[sachinpb@hadoopNode1 hadoop-3.2.0]$

------------------------


Step 19: Verify the output file in HDFS:

[sachinpb@hadoopNode1 hadoop-3.2.0]$ hdfs dfs -cat /user/sachinpb/helloworld_out/part-r-00000
---------------------
2019    4
hello    6
world   7
---------------------

Step 20:  MONITOR YOUR HDFS CLUSTER

[sachinpb@hadoopNode1]$ hdfs dfsadmin -report
Configured Capacity: 1990698467328 (1.81 TB)
Present Capacity: 1794297528320 (1.63 TB)
DFS Remaining: 1794297511936 (1.63 TB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: $HOST1_IPADDRESS:9866 (hadoopNode1)
Hostname: hadoopNode1
Decommission Status : Normal
Configured Capacity: 995349233664 (926.99 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 118666317824 (110.52 GB)
DFS Remaining: 876682903552 (816.47 GB)
DFS Used%: 0.00%
DFS Remaining%: 88.08%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 09 23:23:50 PDT 2019
Last Block Report: Thu May 09 23:11:35 PDT 2019
Num of Blocks: 0


Name: $HOST2_IPADDESS:9866 (hadoopNode2)
Hostname: hadoopNode2
Decommission Status : Normal
Configured Capacity: 995349233664 (926.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 77734621184 (72.40 GB)
DFS Remaining: 917614608384 (854.60 GB)
DFS Used%: 0.00%
DFS Remaining%: 92.19%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 09 23:23:49 PDT 2019
Last Block Report: Thu May 09 23:20:58 PDT 2019
Num of Blocks: 0

NOTE : you could see two live data nodes (hadoopNode1 & hadoopNode2) in this cluster  with all details about allocated HDFS space and block size...etc .This way you can check the health of the above Hadoop cluster. Also, we tested the wordcount application on this cluster as shown above.

Step 21: How to stop the Hadoop daemons in cluster environmnet:

 cd to /home/users/sachinpb/sachinPB/hadoop-3.2.0/sbin

[sachinpb@hadoopNode1 sbin]$ ./stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as sachinpb in 10 seconds.
WARNING: Use CTRL-C to abort.
Stopping namenodes on [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
Stopping datanodes
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode2: Welcome to hadoopNode2!
hadoopNode2:
Stopping secondary namenodes [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
Stopping nodemanagers
Stopping resourcemanagers on []
[sachinpb@hadoopNode1 sbin]$

I hope this blog helped in understanding how to install Hadoop 3.x in a multinode setup i.e  cluster and how to perform operation on HDFS files

----------------------------------------END-------------------------------------------
Reference:
1) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/SingleCluster.html
2) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
3) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
4) http://www.sachinpbuzz.com/2014/01/big-data-hadoop-20yarn-multi-node.html

Thursday, February 21, 2019

Kubernetes [K8s] Architecture and setup on RHEL

Kubernetes (k8s or Kube ) is an open source container management platform designed to run enterprise-class, cloud-enabled and web-scalable IT workloads.In other words,  Kubernetes is a container orchestrator to provision, manage, and scale apps.With the rise of containerization in the world of Devops, the need of a platform to effectively orchestrate these containers also grew.  Since Kubernetes operates at the container level rather than at the hardware level, it provides some generally applicable features common to PaaS offerings.  It is a vendor-agnostic cluster and container management tool, open-sourced by Google in 2014. Kubernetes allows you to manage the life cycle of containerized apps in a cluster.

Kubernetes Architecture : 

Similar to the Apache Mesos and Docker Swarm, Kubernetes is a container orchestrator to provision, manage, and scale apps. The key paradigm of Kubernetes is its declarative model. Kubernetes is designed on the principles of scalability, availability, security and portability. It optimizes the cost of infrastructure by efficiently distributing the workload across available resources. This architecture of Kubernetes provides a flexible, loosely-coupled mechanism for service discovery. Like most distributed computing platforms, a Kubernetes cluster consists of at least one master and multiple compute nodes (also known as worker nodes). The master is responsible for exposing the application program interface (API), scheduling the deployments and managing the overall cluster. Each node runs a container runtime, such as Docker or rkt (container system developed by CoreOS as a light weight and secure alternative to Docker), along with an agent that communicates with the master. The node also runs additional components for logging, monitoring, service discovery and optional add-ons. Nodes are the workhorses of a Kubernetes cluster. They expose compute, networking and storage resources to applications. Nodes can be virtual machines (VMs) running in a cloud or bare metal servers running within the data center.
source
A pod is a collection of one or more containers. The pod serves as Kubernetes’ core unit of management. Pods act as the logical boundary for containers sharing the same context and resources. The grouping mechanism of pods make up for the differences between containerization and virtualization by making it possible to run multiple dependent processes together. At runtime, pods can be scaled by creating replica sets, which ensure that the deployment always runs the desired number of pods.

Replica sets deliver the required scale and availability by maintaining a pre-defined set of pods at all times. A single pod or a replica set can be exposed to the internal or external consumers via services. Services enable the discovery of pods by associating a set of pods to a specific criterion. Pods are associated to services through key-value pairs called labels and selectors. Any new pod with labels that match the selector will automatically be discovered by the service. This architecture provides a flexible, loosely-coupled mechanism for service discovery.
The definition of Kubernetes objects, such as pods, replica sets and services, are submitted to the master. Based on the defined requirements and availability of resources, the master schedules the pod on a specific node. The node pulls the images from the container image registry and coordinates with the local container runtime to launch the container.

etcd is an open source, distributed key-value database from CoreOS, which acts as the single source of truth (SSOT) for all components of the Kubernetes cluster. The master queries etcd to retrieve various parameters of the state of the nodes, pods and containers.

 
source

Kubernetes Components:

 i) Master Components
Master components provide the cluster’s control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controller’s ‘replicas’ field is unsatisfied. The master stores the state and configuration data for the entire cluster in ectd, a persistent and distributed key-value data store. Each node has access to ectd and through it, nodes learn how to maintain the configurations of the containers they’re running. You can run etcd on the Kubernetes master or in standalone configurations.


  • kube-apiserver
  • etcd
  • kube-scheduler
  • kube-controller-manager
  • cloud-controller-manager

ii) Node Components
Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment . All nodes in a Kubernetes cluster must be configured with a container runtime, which is typically Docker. The container runtime starts and manages the containers as they’re deployed to nodes in the cluster by Kubernetes.
  • kubelet
  • kube-proxy
  • Container Runtime

----------------------------------------------------------------------------

Installation Steps of Kubernetes 1.7 on RHEL 

---------------------------------------------------------------------------


Lets do the installation on two x86 nodes installed with RHEL 7.5
1) master-node
2) worker-node

Step 1: on master-node, disable SELinux & setup firewall rules
  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

Set the following firewall rules and other configuration details.


  1.  firewall-cmd --permanent --add-port=6443/tcp
  2.  firewall-cmd --permanent --add-port=2379-2380/tcp
  3.  firewall-cmd --permanent --add-port=10250/tcp
  4.  firewall-cmd --permanent --add-port=10251/tcp
  5.  firewall-cmd --permanent --add-port=10252/tcp
  6.  firewall-cmd --permanent --add-port=10255/tcp
  7.  firewall-cmd --reload
  8.  modprobe br_netfilter
  9.  echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables

NOTE:You MUST disable swap in order for the kubelet to work properly

Step 2: Configure Kubernetes Repository
cat /etc/yum.repos.d/kubernetes.repo
-----------------------------------------------------------
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
----------------------------------------------------------


Step 3: configure  Docker  - validated version(18.06) for kubernetes

cat /etc/yum.repos.d/docker-main.repo
-------------------------------------------------------
[docker-main-repo]
name=Docker main Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
-------------------------------------------------------

Step 4: Install docker

[root@master-node ]# yum install docker-ce-18.06*
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package docker-ce.x86_64 0:18.06.3.ce-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=======================================================================================================================================
 Package                      Arch                      Version                              Repository                           Size
=======================================================================================================================================
Installing:
 docker-ce                    x86_64                    18.06.3.ce-3.el7                     docker-ce-stable                     41 M

Transaction Summary
=======================================================================================================================================
Install  1 Package

Total size: 41 M
Installed size: 168 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
  Installing : docker-ce-18.06.3.ce-3.el7.x86_64                                                                                   1/1
  Verifying  : docker-ce-18.06.3.ce-3.el7.x86_64                                                                                   1/1

Installed:
  docker-ce.x86_64 0:18.06.3.ce-3.el7

Complete!
[root@master-node ]#
-------------------------------------------------------------------

Step 5: Start and enable kubenetes

[root@master-node ~]# systemctl  start kubelet
[root@master-node ~]# systemctl  status kubelet
? kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           +-10-kubeadm.conf
   Active: active (running) since Thu 2019-02-21 01:00:56 EST; 1min 50s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 9551 (kubelet)
    Tasks: 69
   Memory: 56.3M
   CGroup: /system.slice/kubelet.service
           +-9551 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubele...

Feb 21 01:02:22 master-node kubelet[9551]: W0221 01:02:22.527780    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:22 master-node kubelet[9551]: E0221 01:02:22.528025    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:27 master-node kubelet[9551]: W0221 01:02:27.528986    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:27 master-node kubelet[9551]: E0221 01:02:27.529196    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:32 master-node kubelet[9551]: W0221 01:02:32.530265    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:32 master-node kubelet[9551]: E0221 01:02:32.530448    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:37 master-node kubelet[9551]: W0221 01:02:37.531526    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:37 master-node kubelet[9551]: E0221 01:02:37.531645    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:42 master-node kubelet[9551]: W0221 01:02:42.532552    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:42 master-node kubelet[9551]: E0221 01:02:42.532683    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Hint: Some lines were ellipsized, use -l to show in full.
[root@master-node ~]#

[root@master-node ]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.
[root@master-node]#
-----------------------------------

step 6 : Start and enable docker

[root@master-node]# systemctl restart docker
[root@master-node]# systemctl status docker
? docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-02-21 00:41:14 EST; 7s ago
     Docs: https://docs.docker.com
 Main PID: 30271 (dockerd)
    Tasks: 161
   Memory: 90.8M
   CGroup: /system.slice/docker.service
           +-30271 /usr/bin/dockerd
           +-30284 docker-containerd --config /var/run/docker/containerd/containerd.toml
           +-30490 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30523 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30541 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30639 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30667 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30747 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30764 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30810 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30938 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...

Feb 21 00:41:15 master-node dockerd[30271]: time="2019-02-21T00:41:15-05:00" level=info msg="shim docker-containerd-shim started...d=30541
Feb 21 00:41:15 master-node dockerd[30271]: time="2019-02-21T00:41:15-05:00" level=info msg="shim docker-containerd-shim started...d=30639
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30667
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30747
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30764
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30810
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30938
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30983
Feb 21 00:41:17 master-node dockerd[30271]: time="2019-02-21T00:41:17-05:00" level=info msg="shim reaped" id=cc95fdfdc1d7d0d6104...62d4d91
Feb 21 00:41:17 master-node dockerd[30271]: time="2019-02-21T00:41:17.122817061-05:00" level=info msg="ignoring event" module=li...Delete"
Hint: Some lines were ellipsized, use -l to show in full.
[root@master-node]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@master-node]#


----------------------------------------
Step 7: Check the version of kubernetes installed on the master node :
[root@master-node ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:05:53Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
[root@master-node ~]#

----------------------------------------------

Step 8:
Check the version of docker installed :

[root@master-node]# docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        d7080c1
 Built:             Wed Feb 20 02:26:51 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       d7080c1
  Built:            Wed Feb 20 02:28:17 2019
  OS/Arch:          linux/amd64
  Experimental:     false


---------------------------------------------------------------

Step 9: Initialize Kubernetes Master with ‘kubeadm init’


[root@master-node ~]# kubeadm init
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master-node localhost] and IPs [IP_ADDRESS_master-node 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master-node localhost] and IPs [IP_ADDRESS_master-node 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master-node kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 IP_ADDRESS_master-node]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 20.502302 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master-node" as an annotation
[mark-control-plane] Marking the node master-node as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master-node as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: od9n1d.rltj6quqmm2kojd7
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join IP_ADDRESS_master-node:6443 --token od9n1d.rltj6quqmm2kojd7 --discovery-token-ca-cert-hash sha256:9ea1e1163550080fb9f5f63738fbf094f065de12cd38f493ec4e7c67c735fc7b

[root@master-node ~]#
-----------------------------------------------
If you get error -PORT already in use - please run kubectl reset
kubernetes master has been initialized successfully as shown above .
------------------------------------------------------------------------------------------------------

Step 10:Set the cluster for root user.

[root@master-node ~]# mkdir -p $HOME/.kube
[root@master-node ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master-node ~]# chown $(id -u):$(id -g) $HOME/.kube/config
[root@master-node ~]#


------------------------

Step 11 : Check the status on master node  and Deploy pod network to the cluster

--------------------------
[root@master-node ~]# kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
master-node   NotReady   master   12m   v1.13.3
[root@master-node ~]#


[root@master-node]# kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       nginx                              0/1     Pending   0          125m
kube-system   coredns-86c58d9df4-d4j4x           1/1     Running   0          3h49m
kube-system   coredns-86c58d9df4-sg8tk           1/1     Running   0          3h49m
kube-system   etcd-master-node                      1/1     Running   0          3h48m
kube-system   kube-apiserver-master-node            1/1     Running   0          3h48m
kube-system   kube-controller-manager-master-node   1/1     Running   0          3h48m
kube-system   kube-proxy-b6wcd                   1/1     Running   0          159m
kube-system   kube-proxy-qfdhq                   1/1     Running   0          3h49m
kube-system   kube-scheduler-master-node            1/1     Running   0          3h48m
kube-system   weave-net-5c46g                    2/2     Running   0          159m
kube-system   weave-net-7qsnj                    2/2     Running   0          3h35m
[root@master-node]#

-------------------


Step 12:  deploy network

The Weave Net addon for Kubernetes comes with a Network Policy Controller that automatically monitors Kubernetes for any NetworkPolicy annotations on all namespaces and configures iptables rules to allow or block traffic as directed by the policies.


[root@master-node ~]#  export kubever=$(kubectl version | base64 | tr -d '\n')
[root@master-node ~]#  kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created
[root@master-node ~]#

------------------------

Step 13:Check the status on master node

[root@master-node ~]# kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-node   Ready    master   14m   v1.13.3
[root@master-node ~]#



-----------------------------
Perform the following steps on each worker node

Step 14: 
  Disable SELinux and other configuration details

  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
  3. firewall-cmd --permanent --add-port=10250/tcp
  4. firewall-cmd --permanent --add-port=10255/tcp
  5. firewall-cmd --permanent --add-port=30000-32767/tcp
  6. firewall-cmd --permanent --add-port=6783/tcp
  7. firewall-cmd  --reload
  8. echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables

NOTE: You MUST disable swap in order for the kubelet to work properly
----------------------------
Step 15: Configure Kubernetes  and docker Repositories on worker node  ( same as steps above)

-------------
Step 16:
Install docker

--------------
Step 17:

Start and enable docker service

-----------------

Step 18:

Now you can  Join worker node to master node

Whenever kubernetes master initialized , then in the output we get command and token.  Copy that command and run

[root@worker-node ~]#  kubeadm join IP_ADDRESS_master-node:6443 --token od9n1d.rltj6quqmm2kojd7 --discovery-token-ca-cert-hash sha256:9ea1e1163550080fb9f5f63738fbf094f065de12cd38f493ec4e7c67c735fc7b
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "IP_ADDRESS_master-node:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://IP_ADDRESS_master-node:6443"
[discovery] Requesting info from "https://IP_ADDRESS_master-node:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "IP_ADDRESS_master-node:6443"
[discovery] Successfully established connection with API Server "IP_ADDRESS_master-node:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "worker-node" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

[root@worker-node ~]#

This will activate the services required -------------------------

Step 19:

Now verify Nodes status from master node using kubectl command


[root@master-node]# kubectl get nodes
NAME       STATUS   ROLES    AGE    VERSION
master-node   Ready    master   119m   v1.13.3
worker-node   Ready    <none>   49m    v1.13.3
[root@master-node ]#


As we can see master and worker nodes are in ready status. This concludes that kubernetes 1.7 has been installed successfully and also we have successfully joined worker node.  Now we can create pods and services


---------------------------------   oooooooooooooooooo -------------------------------------------

Reference:
1) https://docs.google.com/presentation/d/1mbjjxNlPzgZIH1ciyprMRoIAYiEZuFQlG7ElXUvP1wg/edit#slide=id.g3d4e7af7b7_2_52
2) https://github.com/kubernetes-sigs/kube-batch
3) https://github.com/intel/multus-cni
4) https://kubernetes.io/docs/tutorials/kubernetes-basics
5) http://www.developintelligence.com/blog/2017/02/kubernetes-actually-use
6) https://kubernetes.io/docs/setup/independent/install-kubeadm/
7) https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/
8) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/getting_started_with_kubernetes/get_started_orchestrating_containers_with_kubernetes
9) https://github.com/kubernetes/kubeadm/issues/339
10) https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/
11) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/getting_started_with_kubernetes/get_started_orchestrating_containers_with_kubernetes
12) https://kubernetes.io/docs 
13) https://thenewstack.io/kubernetes-an-overview/
14) https://blog.newrelic.com/engineering/what-is-kubernetes/