Saturday, July 6, 2019

IBM's Summit & Sierra - The most powerful computers on the planet

Supercomputing is the Formula One of computing. It’s where companies test bleeding-edge technology at an unprecedented scale. Supercomputers are generally used for research purposes, including tasks such as the virtual testing of nuclear bombs, trying to understand how the universe was formed, forecasting climate change and aerodynamic modeling for aircraft. 

        The U.S. now has two machines atop the world’s supercomputer rankings, with a pair of IBM  built systems holding first and second place. Summit and Sierra, supercomputers at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory, are now ranked the #1 and #2 fastest computers. They are helping us model supernovas, pioneer new materials and explore cancer, genetics and the environment — using technologies available to all businesses.
source

Summit is owned by the Oak Ridge National Laboratory and is designed for artificial intelligence workloads that pertain to high-energy physics and materials discovery, among other things. The lab claims it can perform more than 3 billion-billion calculations per second in some cases. Summit, an IBM-built supercomputer now running at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL), captured the number one spot with a performance of 122.3 petaflops on High Performance Linpack (HPL), the benchmark used to rank the TOP500 list. Summit has 4,356 nodes, each one equipped with two 22-core Power9 CPUs, and six NVIDIA Tesla V100 GPUs. The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.
source

Sierra is jointly operated by the DOE’s National Nuclear Security Administration and the Lawrence Livermore National Lab.Sierra, a new system at the DOE’s Lawrence Livermore National Laboratory took the number three spot, delivering 71.6 petaflops on HPL. Built by IBM, Sierra’s architecture is quite similar to that of Summit, with each of its 4,320 nodes powered by two Power9 CPUs plus four NVIDIA Tesla V100 GPUs and using the same Mellanox EDR InfiniBand as the system interconnect.

Both machines are powered by a combination of IBM’s Power9 central processing units and Nvidia Corp.’s V100 graphics processing units. They’re enormous too, made up of numerous rows of refrigerator-sized computer cabinets. Summit boasts 2.4 million processor cores in total, while Sierra has 1.6 million.

NOTE:Next, World’s fastest supercomputer will be built by AMD and Cray for US government. Frontier is expected to go online in 2021 with 1.5 exaflops of processing power.

General Purpose Computing on Graphics Processing Units [GPGPU]

source
GPGPU is the utilization of a GPU (graphics processing unit), which would typically only handle computer graphics, to assist in performing tasks that are traditionally handled solely by the CPU (central processing unit). GPGPU allows information to be transferred in both directions, from CPU to GPU and GPU to CPU. Such bidirectional processing can hugely improve efficiency in a wide variety of tasks related to images and video. If the application you use supports OpenCL or CUDA, you will normally see huge performance boosts when using hardware that supports the relevant GPGPU framework.

NVIDIA was an early and aggressive advocate of leveraging graphics processors for other massively parallel processing tasks (often referred to as general-purpose computing on graphics processing units, or GPGPU). However, GPGPU has been embraced in the HPC (high-performance computing) server space, and NVIDIA is the dominant supplier of GPUs for HPC. AMD acknowledges that it is there with the company's ROCm (Radeon Open Compute Platform) initiative. AMD is behind, but that doesn’t mean they’re not trying to catch up.
  
How do OpenCL and CUDA fit into the equation? OpenCL is currently the leading open source GPGPU framework. CUDA, on the other hand, is the leading proprietary GPGPU framework. It should be noted that Nvidia cards actually support OpenCL as well as CUDA, they just aren’t quite as efficient as AMD GPUs when it comes to OpenCL computation

CUDA and OpenCL offer two different interfaces for programming GPUs. OpenCL is an open standard that can be used to program CPUs, GPUs, and other devices from different vendors, while CUDA is specific to NVIDIA GPUs. Although OpenCL promises a portable language for GPU programming, its generality may entail a performance penalty.  CUDA can be used in two different ways, (1) via the runtime API, which provides a C-like set of routines and extensions, and (2), via the driver API, which provides lower level control over the hardware but requires more code and programming effort. Both OpenCL and CUDA call a piece of code that runs on the GPU a kernel. OpenCL promises a portable language for GPU programming, capable of targeting very dissimilar parallel processing devices. Unlike a CUDA kernel, an OpenCL kernel can be compiled at runtime, which would add to an OpenCL’s running time. On the other hand, this just-in-time compile may allow the compiler to generate code that makes better use of the target GPU.

To compete with CUDA, AMD has shifted from OpenCL to its ROCm platform. AMD is also developing a thin "HIP" compatibility layer that compiles to either CUDA or ROCm. AMD's hipBLAS, hipSPARSE, and hipDNN all translate to the cu- or roc- equivalents, depending on hardware target. So, for example, hipBLAS would link to either cuBLAS or rocBLAS. On the hardware side, AMD's Radeon VII now looks competitive with, e.g. Nvidia's 2080 Ti. AMD now offers HIP, which converts  CUDA, such that it works on both AMD and NVIDIA hardware. Once the CUDA-code has been translated successfully, software can run on both NVIDIA and AMD hardware without problems.

Radeon Open Compute Platform (ROCm) :
ROCm is a universal platform for GPU-accelerated computing. A modular design lets any hardware vendor build drivers that support the ROCm stack. ROCm also integrates multiple programming languages and makes it easy to add support for other languages. 

The Department of Energy announced that Frontier, their forthcoming supercomputer in 2021, will have AMD Radeon Instinct GPUs. This is a $600M contract. It seems there will soon be growing pressure for cross-platform (Nvidia/AMD) programming models in the HPC space.The $600 million award marks the first system announcement to come out of the second CORAL (Collaboration of Oak Ridge, Argonne and Livermore) procurement process (CORAL-2). Poised to deliver “greater than 1.5 exaflops of HPC and AI processing performance,” Frontier (ORNL-5) will be based on Cray’s new Shasta architecture and Slingshot interconnect and will feature future-generation AMD Epyc CPUs and Radeon Instinct GPUs. This will start with Cray working with AMD to enhance these tools for optimized GPU scaling with extensions for Radeon Open Compute Platform (ROCm). These software enhancements will leverage low-level integrations of AMD ROCmRDMA technology with Cray Slingshot to enable direct communication between the Slingshot NIC to read and write data directly to GPU memory for higher application performance.

Exploring AMD’s Ambitious ROCm Initiative :
AMD released the innovative ROCm hardware-accelerated, parallel computing environment, and since then, the company has continued to refine its bold vision for an open-source, multi-platform, high-performance computing environment. Over the past two years, ROCm developers have contributed many new features and components to the ROCm open software platform. Now, the much-anticipated release of the Vega 7nm technology based GPU environment adds another important ingredient to the mix, empowering a second generation of high-performance applications that will benefit from ROCm’s acceleration features and “write it once” programming paradigm.

ROCm is a universal platform for GPU-accelerated computing. A modular design lets any hardware vendor build drivers that support the ROCm stack. ROCm also integrates multiple programming languages and makes it easy to add support for other languages. ROCm even provides tools for porting vendor-specific CUDA code into a vendor-neutral ROCm format, which makes the massive body of source code written for CUDA available to AMD hardware and other hardware environments.
ROCm is designed as a universal platform, supporting multiple languages and GPU technologies. 
 
source

Lower in the stack, ROCm provides the Heterogeneous Computing Platform, a Linux driver, and a runtime stack optimized for “HPC and ultra-scale class computing.” ROCm’s modular design means the programming stack is easily ported to other environments.

At the heart of the ROCm platform is the Heterogeneous Compute Compiler (HCC). The open source HCC is based on the LLVM compiler with the Clang C++ preprocessor. HCC supports several versions of standard C++, including C++11, C++14, and some C++17 features. HCC also supports GPU-based acceleration and other parallel programming features, providing a path for programmers to access the advanced capabilities of AMD GPUs in the same way that the proprietary NVCC CUDA compiler provides access to NVIDIA hardware. 

    Important features include the following:
  • Multi-GPU coarse-grain shared virtual memory
  • Process concurrency and preemption
  • Large memory allocations
  • HSA signals and atomics
  • User-mode queues and DMA
  • Standardized loader and code-object format
  • Dynamics and offline-compilation support
  • Peer-to-peer multi-GPU operation with RDMA support
  • Profiler trace and event-collection API
  • Systems-management API and tools

    Solid Compilation Foundation and Language Support

  • LLVM compiler foundation
  • HCC C++ and HIP for application portability
  • GCN assembler and disassembler

How does HIP work?

The below image explains it: CUDA gets converted to HIP and HIP gets compiled for the NVIDIA GPU with NVCC, and for the AMD GPU with their new C++ compiler HCC.

source
 AMD announced its next-gen Navi-based Radeon RX 5700 and 5700 XT graphics cards recently.If you’re an AMD fan hoping that this will be the moment in history when the company finally pulls ahead of Nvidia with a high-end video card — like it may be doing against Intel with desktop CPUs — this isn’t that moment.  Despite its new Navi architecture, which offers 1.25x the performance per clock and 1.5x performance per watt, these aren’t even as high-end as AMD’s existing (and complicated) 13.8 TFLOP Radeon VII GPU. At up to 9.75 TFLOPs and 7.95 TFLOPs of raw computing power respectively, and with 8GB of GDDR6 memory instead of 16GB of HBM2, the 5700-series isn’t a world-beater.
source

source

Intel announced that its first "discrete" graphics chip (GPU) is coming in 2020. By "discrete GPU", the company means a graphics chip on its own, an entirely separate component that isn't integrated into a processor chip(CPU). Typically , Intel GPUs are integrated with its CPUs. Intel's GPU will be released in 2020 will be designed for enterprise applications like machine learning , as well as consumer level applications that benefit from the dedicated power of  discrete GPUs.

We eagerly await doing the price/performance comparisons across these enterprise GPU compute engines.

Reference :
http://www.admin-magazine.com/HPC/Articles/Discovering-ROCm
https://www.bdti.com/InsideDSP/2016/12/15/AMD

Friday, June 28, 2019

Enable PRE- and POST-execution processing feature of Spectrum LSF in large scale Cluster

An HPC cluster consists of hundreds or thousands of compute servers that are networked together. InfiniBand is pervasively used in high-performance computing (HPC) to remove data exchange bottlenecks, delivering very high throughput and very low latency. As HPC becomes more mainstream and embraced by enterprise users, there is a need for assurances that performance is optimized

Users schedule their jobs to run on HPC cluster by submitting them through Spectrum LSF.  IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments.

One of the important task before  execution of application is preparing jobs inorder to get best performance on the given HPC cluster. Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity. Note the system's usable memory and configure your job script to maximize performance. Next, prepare/tune  the nodes (or servers)  to desired values. This can be done by pre-execution processing. For example , Set the CPU statically to highest frequency ( or any requirement for that matter). After , execution of LSF job , you could set it back to previously set values by post-execution processing.
source

Configuration to enable pre- and post-execution processing :



The pre- and post-execution processing feature is enabled by defining at least one of the parameters in the list below at the application or queue level, or by using the -E option of the bsub command to specify a pre-execution command. In some situations, specifying a queue-level or application-level pre-execution command can have advantages over requiring users to use bsub -E. For example, license checking can be set up at the queue or application level so that users do not have to enter a pre-execution command every time they submit a job.

The following example illustrates how job-based pre- and post-execution processing works at the queue or application level for setting the environment prior to job execution and for transferring resulting files after the job runs.

source
Host-based pre- and post-execution processing is different from job-based pre- and post-execution processing in that it is intended for parallel jobs (you can also use this feature for sequential jobs) and is executed on all execution hosts, as opposed to only the first execution host. The purpose of this is to set up the execution hosts(or servers) before all job-based pre-execution and other pre-processing which depend on host-based preparation, and clean up execution hosts after job-based post execution and other post-processing.

There are two ways to enable host-based pre- and post-execution processing for a job:
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.queues.
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.applications.
     
Lets take the example of queue level configurations with HOST_PRE_EXEC/HOST_POST_EXEC :

LSF queue (QUEUE_NAM=Queue_pre_post) set with HOST_PRE_EXEC and HOST_POST_EXEC .

where HOST_PRE_EXEC points to the " pre_setup_perf.sh" .Similarly, HOST_POST_EXEC points to "post_setup_perf.sh" and set previous values back on the server after Performance testing.

Modify the configuration file "lsb.queues" . For example
------------------------------------------------------------------------------

Begin Queue
QUEUE_NAME   = Queue_pre_post
PRIORITY     = 40
INTERACTIVE  = NO
FAIRSHARE    = USER_SHARES[[default,1]]
HOSTS        = server1 server2          # hosts on which jobs in this queue can run
EXCLUSIVE    = Y
HOST_PRE_EXEC     = /home/sachinpb/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC    = /home/sachinpb/post_setup_perf.sh >> /tmp/post.out
DESCRIPTION  = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue

----------------------------------------------------------------------------------

After modification, please run the command.


badmin reconfigure
-------------------------------------------------------------------------------


Check the queue status:


[sachinpb@server1 ~]$ bqueues -l Queue_pre_post
QUEUE: Queue_pre_post
  -- For P8 performance jobs, running only if hosts are lightly loaded.
PARAMETERS/STATISTICS
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV PJOBS
 40    0  Open:Active       -    -    -    -     0     0     0     0     0    0     0
Interval for a host to accept two jobs is 0 seconds
SCHEDULING PARAMETERS
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -
SCHEDULING POLICIES:  FAIRSHARE  EXCLUSIVE  NO_INTERACTIVE
USER_SHARES:  [default, 1]
SHARE_INFO_FOR: Queue_pre_post/

USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
sachinpb          1       0.326      0        0       350.7        0       0.000
USERS: all
HOSTS:  server1 server2
HOST_PRE_EXEC:  /home/sachinpb/sachin/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC:  /home/sachinpb/sachin/post_setup_perf.sh >> /tmp/post.out

[sachinpb@server1 ~]$
---------------------------------------------------------------------------------

HOST_PRE_EXEC=command (in lsb.queues):
  •     Enables host-based pre-execution processing at the queue level.
  •     The pre-execution command runs on all execution hosts before the job starts.
  •     If the HOST_PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.
  •     The HOST_PRE_EXEC command uses the same environment variable values as the job.
  •     The HOST_PRE_EXEC command can only be used for host-based pre- and post-execution processing.

HOST_POST_EXEC=command (in lsb.queues):
  •     Enables host-based post-execution processing at the queue level.
  •     The HOST_POST_EXEC command uses the same environment variable values as the job.
  •     The post-execution command for the queue remains associated with the job. The original post-execution command runs even if the job is requeued or if the post-execution command for the queue is changed after job submission.
  •     Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.
  •     The post-execution command runs after the job finishes, even if the job fails.
  •     Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.
  •     The HOST_POST_EXEC command can only be used for host-based pre- and post-execution processing.

-------------------------------------------------------
Now submit the LSF job  as shown :

[sachinpb@server1 sachin]$ bsub  -q Queue_pre_post -n 8 -R "span[ptile=4]" < myjob.script
bsub> Job <19940> is submitted to queue <Queue_pre_post>.
[sachinpb@server1 ]$ bjobs
JOBID   USER    STAT  QUEUE          FROM_HOST   EXEC_HOST    JOB_NAME     SUBMIT_TIME
19940 sachinpb  RUN   Queue_pre_post  server1   server2     myjob.script  Jun 28 05:16
                                                                                  server2
                                                                                  server2
                                                                                  server2
                                                                                  server1
                                                                                  server1
                                                                                  server1
                                                                                  server1
[sachinpb@server1 ]$
---------------------------------------------------------------------------

Logs from each server available at " $SACHIN_HOME/logs" for both pre- and post-execute processing as shown below. Check for log files after completion of LSF jobID 19940:

There should be 4  log files -:
  • There are 2 pre-check-logs  executed on server1 and server2
  • Similarly, there are  2 post-check-logs  executed on server1 and server2

Example:

[sachinpb@server1 logs]$ ls -alsrt
 8 -rw-rw-r-- 1  sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server2.out
 8 -rw-r--r-- 1    sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server1.out
 8 -rw-r--r-- 1   sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server1.out
 8 -rw-rw-r-- 1  sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server2.out
[sachinpb@server1 logs]$


---------------------------------------------------------------------------

pre-exec script:

[sachinpb@server1]$ cat pre_setup_perf.sh
#!/bin/bash
echo "Start Pre-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh pre_check | tee $SACHIN_HOME/logs/preScript_${DATE}_${HOST}.out

echo "End of Pre-execution script on $(hostname)"
[sachinpb@server1 ]$

----------------------------------------------------------------------------

post-exec script:

[sachinpb@server1 ]$ cat post_setup_perf.sh
#!/bin/bash
echo "Start Post-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh post_check | tee /$SACHIN_HOME/logs/postScript_${DATE}_${HOST}.out
echo "End of Post-execution script on $(hostname)"
[sachinpb@server1 ]$

-----------------------------------------------------------------------------

NOTE: Similarly,  you could do this in PRE_EXEC/POST_EXEC=command (in lsb.applications, lsb.queues) & HOST_PRE_EXEC/HOST-POST-EXEC=command (in lsb.applications) as per the application requirements.

I hope this blog helped in understanding how to configure pre- and post processing feature of Spectrum LSF. 

Saturday, June 8, 2019

Hewlett Packard Enterprise to Acquire Supercomputer Pioneer Cray



Supercomputers have long been a mainstay of military and intelligence agencies, used for chores ranging from cracking codes to designing nuclear weapons. They have many civilian uses as well, like predicting weather, creating new drugs and simulating the effect of crashes on auto designs.Computing firms are racing to reach exascale performance, or a quintillion operations per second.

Hewlett Packard Enterprise will pay about $1.3 billion to acquire Cray, which has designed some of the most powerful computer systems in use. It will also enable HPE to start selling supercomputer components to corporate clients and others. HPE will integrate Cray’s supercomputer technology into its product portfolio and build an HPC-as-a-service and AI cloud services on HPE GreenLake. It’s synergistic. Because of this acquisition, HPE has access to one of the only differentiated stacks in the market and can now compete more aggressively in those five vertical markets. The combination of Cray and HPE creates an industry leader in the fast-growing High-Performance Computing (HPC) and AI markets and creates a number of opportunities that neither company would likely be able to capture on their own

Interestingly, Cray is actually the second supercomputer manufacturer picked up by HPE over its lifetime; the company also picked up the remaining assets of Silicon Graphics back in 2016. On November 1, 2016, Hewlett Packard Enterprise completed its acquisition of SGI (formerly Silicon Graphics ) for $275 million.

HPE is already the HPC server market leader, but by adding Cray into the fold, HPE strengthens its position against its two chief rivals, Dell EMC and IBM. Currently, IBM is the maker of the world’s two fastest supercomputers, Summit and Sierra. In 2018, HPE ranked first in HPC server sales with 34.8% market share, followed by Dell EMC with 20.8% and IBM with 7.1% of the market. All other HPC players, including Lenovo, Atos, Inspur, and Sugon, have single-digit market share, including Cray with 2.3%.

Cray, based in Seattle, traces its lineage to a company founded in 1972 in Minnesota by the computer designer Seymour Cray (“The father of supercomputing,”). That company was bought in 1996 by Silicon Graphics; it was sold in 2000 to Tera Computer, which adopted the Cray name.  Cray was impressively successful for a small company, but it’s more difficult as a standalone company against the bigger companies. Cray has morphed into an integrator and scale-out specialist, combining processors from the likes of Intel, AMD, and NVIDIA into supercomputers, and applying their own software, I/O, and interconnect technologies. The company is currently working with AMD on a US government-backed project to build the world's first exascale supercomputer at Oak Ridge National Laboratory. The system will be capable of a quintillion calculations per second.

Frontier Supercomputer at ORNL , partnership with AMD


Cray is currently contracted to build two of the world’s fastest supercomputers for two US Department of Energy Labs: the Oak Ridge National Laboratory and the Argonne National Laboratory. Both systems, one called Frontier being built in partnership with AMD and one called Aurora with Intel, are promised to bring so-called “exascale” performance, with raw performance power of the excess of 1.5 exfaflops, or a quintillion calculations per second. Both systems will support the converged use of analytics, AI, and HPC at extreme scale, using Cray’s new Shasta system architecture, software stack and Slingshot interconnect.

Aurora Supercomputer at ANL , partnership with with Intel

What worries some officials in the United States is the rapid rise of suppliers based in China. One of them, Lenovo, which bought former IBM hardware operations, led the rankings with 140 supercomputers installed. Two others, Inspur and Sugon, were second with 84 and third with 54, respectively. With the acquisition, HPE will get Cray’s current generation XC and CS supercomputers and the next-generation Shasta supercomputer that features a new high-speed interconnect, code-named Slingshot. Cray’s products also include high-performance storage, a full HPC system software stack, and data analytics and AI solutions that are fully integrated.

source

Most HPC hardware vendors today sell commoditized cluster stacks comprising of Infiniband or Ethernet-based networking. But IBM and HPE are the only HPC vendors that can differentiate themselves: IBM with its Power9 processors, and now HPE with Cray’s Slingshot and ClusterStor HPC storage hardware. Cray gives HPE a unique, technological edge,” he said. “Slingshot provides HPE with an extremely high-bandwidth network for HPC and AI workloads. It enables new network topologies that are required for extreme-scale HPC and AI. Supercomputers of this scale can be massively beneficial to data-intensive industries like astronomy, climate science, medicine, neuroscience, and physics. Increasingly, these systems can be used in artificial intelligence research, which, in turn, can help accelerate many other areas of scientific inquiry. That said, a supercomputer like Aurora or Frontier tends only to be built and financed by the government for, at least initially, military applications.

Thankfully, over time, these upcoming exascale supercomputers will likely be freed from the military apparatus and put to work divining new insights from data. According to HPE, the acquisition of Cray is primarily to help it gain an edge in AI research and the hardware required to train ever-larger neural nets. The company was also trying to make further inroads into the enterprise market, and HPE has experience and enterprise customers at a time when many businesses are recognizing the need to invest in HPC for AI, data analytics, and other business operations, such as marketing and ERP. The requirements in the commercial markets are pushing up into the HPC space. Cray was trying to get into the commercial market too, but they didn’t have the experience to do that, and now they are part of a company that has a big position in the commercial markets. For a lot of enterprises this is a new type of compute, but they have to start embracing it because workloads increasingly require MPC (Massively Parallel Computing). You can’t compete anymore without doing AI, big data analytics, and increasingly, simulation and modeling. They don’t necessarily want to purchase the hardware and put all that together. It’s complex. So if you can get it with a cloud-like consumption model as HPE GreenLake offers, then you turn it into an OpEx investment.


Broadly speaking, major acquisitions and mergers in the supercomputing space are rare events. Due to their ever-increasing price tag, only a small number of world-class supercomputers are sold each year. And due to these prices the buyers are often governments, which inevitably gives supercomputer construction a nationalistic element to it. None the less, because costs are increasing – Frontier is the US’s most expensive system yet at over $500M for the system alone. HPE’s plans include using Cray’s technologies to improve HPE GreenLake, the company’s HPC-as-a-Service offering



Sunday, May 12, 2019

Apache Hadoop 3.x installation on multinode cluster RHEL7 (ppc64le)


Hadoop is an open-source Apache project that allows creation of parallel processing applications on large data sets, distributed across networked nodes. It’s composed of the Hadoop Distributed File System (HDFS™) that handles scalability and redundancy of data across nodes, and Hadoop YARN: a framework for job scheduling that executes data processing tasks on all nodes.

Apache Hadoop 3.x Benefits

  •     Support multiple standby NameNodes.
  •     Supports multiple NameNodes for multiple namespaces.
  •     Storage overhead reduced from 200% to 50%.
  •     Support GPUs.
  •     Intra-node disk balancing.
  •     Support for Opportunistic Containers and Distributed Scheduling.
  •     Support for Microsoft Azure Data Lake and Aliyun Object Storage System file-system

Architecture of Hadoop Cluster: 

Apache hadoop has 2 core components .
1) HDFS - its for storage
2) YARN - its for computation 

You could see the HDFS and YARN architecture as shown below:


HDFS ARCHITECTURE

 
YARN ARCHITECTURE

Before configuring the master and worker nodes, it’s good to understand the different components of a Hadoop cluster. A master node keeps knowledge about the distributed file system, like the inode table on an ext3 filesystem, and schedules resources allocation. node-master will handle this role in this guide, and host two daemons:
•    The NameNode: manages the distributed file system and knows where stored data blocks inside the cluster are.
•    The ResourceManager: manages the YARN jobs and takes care of scheduling and executing processes on worker nodes.
Worker nodes store the actual data and provide processing power to run the jobs and will host two daemons:
•    The DataNode manages the actual data physically stored on the node;
•    The NodeManager manages execution of tasks on the node.



Prerequisites for Implementing Hadoop

  •     Operating system –  RHEL 7.6
  •     Hadoop – You require Hadoop 3.x package
  •     Passwordless SSH connections between nodes inthe cluster
  •     Firewall settings on machines in the cluster
  •     Machine details :
    Master node  : hadoopNode1 (Power8 server with K80 GPUs running RHEL7)
    Worker nodes: hadoopNode1 hadoopNode2 (Power8 server with K80 GPUs running RHEL7)

-------------------------------------  ------------------------------------------------------------------

Hadoop Installation Steps  :

Step 1: Download the Java 8 Package and Save this file in your home directory.

Java is the primary requirement for running Hadoop on any system.  They have compiled all the Hadoop jar files using Java 8 run time version. The user now has to install Java 8 to use Hadoop 3.0. And user having JDK 7 has to upgrade it to JDK 8.

If   your machine is IBM power architecture (ppc64le) , then you need to get IBM java package from this link copied below:

Download Link : https://developer.ibm.com/javasdk/downloads/sdk8/

Step 2: Extract the Java TarFile.

Step 3: Download the Hadoop 3.x Package.

Download stable version of Hadoop
wget http://apache.spinellicreations.com/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz

Step 4: Extract the Hadoop tar File.
Extract the files @ /home/users/sachinpb/sachinPB/

 tar xzvf hadoop-3.2.0.tar.gz

At the high level "/home/users/sachinpb/sachinPB/hadoop-3.2.0, you will see the following directories:

├── bin
│   ├── container-executor
│   ├── hadoop
│   ├── hadoop.cmd
│   ├── hdfs
│   ├── hdfs.cmd
│   ├── mapred
│   ├── mapred.cmd
│   ├── oom-listener
│   ├── test-container-executor
│   ├── yarn
│   └── yarn.cmd
├── etc
│   └── hadoop
│       ├── core-site.xml
│       ├── hadoop-env.sh
│       ├── hdfs-site.xml
│       ├── log4j.properties
│       ├── mapred-site.xml
│       ├── workers
│       ├── yarn-env.sh
│       └── yarn-site.xml
├── include
├── lib
│   └── native
│       ├── examples
│       ├── libhadoop.a
│       ├── libhadooppipes.a
│       ├── libhadoop.so -> libhadoop.so.1.0.0
│       ├── libhadoop.so.1.0.0
│       ├── libhadooputils.a
│       ├── libnativetask.a
│       ├── libnativetask.so -> libnativetask.so.1.0.0
│       └── libnativetask.so.1.0.0
├── logs
│ 
├── sbin
│   ├── hadoop-daemon.sh
│   ├── httpfs.sh
│   ├── mr-jobhistory-daemon.sh
│   ├── refresh-namenodes.sh
│   ├── start-all.sh
│   ├── start-balancer.sh
│   ├── start-dfs.sh
│   ├── start-secure-dns.sh
│   ├── start-yarn.sh
│   ├── stop-all.cmd
│   ├── stop-all.sh
│   ├── stop-balancer.sh
│   ├── stop-dfs.sh
│   ├── stop-secure-dns.sh
│   ├── stop-yarn.sh
│   ├── workers.sh
│   ├── yarn-daemon.sh
│ 
└── share
    ├── doc
    │   └── hadoop
    └── hadoop
        ├── client
        ├── common
        ├── hdfs
        ├── mapreduce
        ├── tools
        └── yarn


 Step 5:  Add the Hadoop and Java paths in the bash file (.bashrc).

update ~/.bashrc
export HADOOP_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_CONF_DIR=$HOME/sachinPB/hadoop-3.2.0/etc/hadoop
export HADOOP_MAPRED_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_COMMON_HOME=$HOME/sachinPB/hadoop-3.2.0
export HADOOP_HDFS_HOME=$HOME/sachinPB/hadoop-3.2.0
export YARN_HOME=$HOME/sachinPB/hadoop-3.2.0
export PATH=$PATH:$HOME/sachinPB/hadoop-3.2.0/bin

#set Java Home

export JAVA_HOME=/opt/ibm/java-ppc64le-80
export PATH=$PATH:/opt/ibm/java-ppc64le-80/bin

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

source .bashrc

 Step 6: Edit the Hadoop Configuration files as per your application requirements.
              HOW TO  CONFIGURE AND RUN BIG DATA APPLICATIONS ?

              Configuration files at : /home/users/sachinpb/sachinPB/hadoop-3.2.0/etc/Hadoop

 Step 7: Open core-site.xml and edit the property mentioned below inside configuration tag.

SET NAMENODE LOCATION

core-site.xml
-----------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoopNode1:9000</value>
</property>
</configuration>
-----------------------------------------------------------------------------

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside configuration tag.

SET PATH FOR HDFS

hdfs-site.xml
-----------------------------------------------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
 <property>
   <name>dfs.replication</name>
   <value>1</value>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/tmp/data/hadoop/hdfs/nn</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/tmp/data/hadoop/hdfs/dn</value>
 </property>
 <property>
   <name>dfs.permissions</name>
   <value>false</value>
 </property>
</configuration>
----------------------------------------------------------------------------

Step 9: Edit the mapred-site.xml file and edit the property mentioned below inside the configuration tag.

SET YARN AS JOB SCHEDULER

mapred-site.xml
-------------------------------------------------------------------------
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
    <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>
-----------------------------------------------------------------------------

Step 10: Edit yarn-site.xml and edit the property mentioned below inside configuration tag.

CONFIGURE YARN

yarn-site.xml
----------------------------------------------------------------
<?xml version="1.0"?>

<configuration>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
</configuration>
-----------------------------------------------------------------

Step 11: Edit hadoop-env.sh/yarn-env.sh and add the Java Path as mentioned below.

 export JAVA_HOME=/opt/ibm/java-ppc64le-80
 export HADOOP_HOME=$HOME/sachinPB/hadoop-3.2.0

Step 12 : The file workers is used by startup scripts to start required daemons on all nodes. This a very new change in Hadoop3 ( Please check)

CONFIGURE SLAVES
workers
-------------------------------
hadoopNode1
hadoopNode2
---------------------------------

check Java and hadoop version

[sachinpb@hadoopNode1 hadoop]$ java -version
openjdk version "1.8.0_181"
OpenJDK Runtime Environment (build 1.8.0_181-b13)
OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)


Check Hadoop version:
[sachinpb@hadoopNode1 hadoop]$ hadoop version
Hadoop 3.2.0
Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bf
Compiled by sunilg on 2019-01-08T06:08Z
Compiled with protoc 2.5.0
From source with checksum d3f0795ed0d9dc378e2c785d3668f39
This command was run using /storage/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common/hadoop-common-3.2.0.jar
[sachinpb@hadoopNode1 hadoop]$

-----------------------------------------------

Step 13: Next,  format the NameNode.

HDFS needs to be formatted like any classical file system. On node-master, run the following command:  "hdfs namenode -format"

[sachinpb@hadoopNode1]$ hdfs namenode -format -clusterId CID-dd2f4c82-65d6-4c9b-a100-5d153c4a9512
2019-05-07 03:44:04,380 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoopNode1/$HOST1_IPADDRESS
STARTUP_MSG:   args = [-format, -clusterId, CID-dd2f4c82-65d6-4c9b-a100-5d153c4a9512]
STARTUP_MSG:   version = 3.2.0
STARTUP_MSG:   classpath = /home/users/sachinpb/sachinPB/hadoop-3.2.0/etc/hadoop:/home/users/sachinpb/sachinPB/hadoop-.
.
.
.
.

 [/tmp/data/hadoop/hdfs/nn/current/VERSION, /tmp/data/hadoop/hdfs/nn/current/seen_txid, /tmp/data/hadoop/hdfs/nn/current/fsimage_0000000000000000000.md5, /tmp/data/hadoop/hdfs/nn/current/fsimage_0000000000000000000, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000001-0000000000000000002, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000003-0000000000000000004, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000005-0000000000000000006, /tmp/data/hadoop/hdfs/nn/current/edits_0000000000000000007-0000000000000000008, /tmp/data/hadoop/hdfs/nn/current/edits_inprogress_0000000000000000009]
2019-05-07 03:44:08,926 INFO common.Storage: Storage directory /tmp/data/hadoop/hdfs/nn has been successfully formatted.
2019-05-07 03:44:08,937 INFO namenode.FSImageFormatProtobuf: Saving image file /tmp/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 using no compression
2019-05-07 03:44:09,063 INFO namenode.FSImageFormatProtobuf: Image file /tmp/data/hadoop/hdfs/nn/current/fsimage.ckpt_0000000000000000000 of size 401 bytes saved in 0 seconds .
2019-05-07 03:44:09,089 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
2019-05-07 03:44:09,104 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoopNode1/$HOST1_IPADDRESS
************************************************************/
[sachinpb@hadoopNode1 logs]$

------------------------------------------

Step 14: Once the NameNode is formatted, go to Hadoop-3.0/sbin directory and start all the daemons.

Your Hadoop installation is now configured and ready to run big data application

Step 15: Start hdfs and yarn daemons:

Form directory : /home/users/sachinpb/sachinPB/hadoop-3.2.0/sbin

NOTE: Copy the Hadoop home directory  across all the nodes in your cluster (if its not available on shared directory)

[sachinpb@hadoopNode1 sbin]$ ./start-all.sh
WARNING: Attempting to start all Apache Hadoop daemons as sachinpb in 10 seconds..
Starting namenodes on [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode1: namenode is running as process 146418. 
Starting datanodes
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode2: Welcome to hadoopNode2!
hadoopNode2:
hadoopNode1: datanode is running as process 146666.
hadoopNode2: datanode is running as process 112502. 
Starting secondary namenodes [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode1: secondarynamenode is running as process 147091.
[sachinpb@hadoopNode1 sbin]$

Step 16:  All the Hadoop services are up and running. [on other platforms, you could use jps command  to see hadoop daemons. IBM JAVA does not provide jps or jstat. So  you need to check the Hadoop process by ps command]

[sachinpb@hadoopNode1 sbin]$ ps -ef | grep NameNode
sachinpb   105015      1  0 02:59 ?        00:00:27 /opt/ibm/java-ppc64le-80/bin/java -Dproc_namenode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-namenode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-namenode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.NameNode
-----
sachinpb   105713      1  0 02:59 ?        00:00:12 /opt/ibm/java-ppc64le-80/bin/java -Dproc_secondarynamenode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhdfs.audit.logger=INFO,NullAppender -Dhadoop.security.logger=INFO,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-secondarynamenode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-secondarynamenode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
 [sachinpb@hadoopNode1 sbin]$ ps -ef | grep DataNode
sachinpb   105268      1  0 02:59 ?        00:00:19 /opt/ibm/java-ppc64le-80/bin/java -Dproc_datanode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-datanode-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-datanode-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
------
 [sachinpb@hadoopNode1 sbin]$ ps -ef | grep ResourceManager
sachinpb   106257      1  1 02:59 pts/3    00:00:50 /opt/ibm/java-ppc64le-80/bin/java -Dproc_resourcemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dservice.libdir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/yarn,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/yarn/lib,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/hdfs,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/hdfs/lib,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common,/home/users/sachinpb/sachinPB/hadoop-3.2.0/share/hadoop/common/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-resourcemanager-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-resourcemanager-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.resourcemanager.ResourceManager

[sachinpb@hadoopNode1 sbin]$ ps -ef | grep NodeManager
sachinpb   106621      1  1 02:59 ?        00:01:08 /opt/ibm/java-ppc64le-80/bin/java -Dproc_nodemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-nodemanager-hadoopNode1.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-nodemanager-hadoopNode1.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager

Ckeck similarly the status of hadoop daemons on other worker node [hadoopNode2]:

[sachinpb@hadoopNode2 ~]$ ps -ef | grep hadoop
sachinpb    77718      1  7 21:52 ?        00:00:07 /opt/ibm/java-ppc64le-80/bin/java -Dproc_datanode -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dhadoop.security.logger=ERROR,RFAS -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-datanode-hadoopNode2.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-datanode-hadoopNode2.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml org.apache.hadoop.hdfs.server.datanode.DataNode
sachinpb    78006      1 12 21:52 ?        00:00:11 /opt/ibm/java-ppc64le-80/bin/java -Dproc_nodemanager -Djava.library.path=/home/users/sachinpb/sachinPB/hadoop-3.2.0/lib -Dyarn.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dyarn.log.file=hadoop-sachinpb-nodemanager-hadoopNode2.log -Dyarn.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dyarn.root.logger=INFO,console -Dhadoop.log.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0/logs -Dhadoop.log.file=hadoop-sachinpb-nodemanager-hadoopNode2.log -Dhadoop.home.dir=/home/users/sachinpb/sachinPB/hadoop-3.2.0 -Dhadoop.id.str=sachinpb -Dhadoop.root.logger=INFO,RFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.yarn.server.nodemanager.NodeManager
[sachinpb@hadoopNode2 ~]$

 Step 17:  Now open the Browser and go to localhost:9870/dfshealth.html to check the NameNode interface.

NOTE:  In Hadoop 2.x, web UI port is 50070 but in Hadoop3.x, it is moved to 9870. You can access HDFS web UI from localhost:9870


Step 18: run Hadoop application – Example: wordcount mapreduce program

[sachinpb@hadoopNode1 hadoop-3.2.0]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /user/sachinpb/helloworld /user/sachinpb/helloworld_out
2019-05-07 04:04:35,044 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2019-05-07 04:04:36,137 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/sachinpb/.staging/job_1557225898252_0003
2019-05-07 04:04:36,374 INFO input.FileInputFormat: Total input files to process : 1
2019-05-07 04:04:36,486 INFO mapreduce.JobSubmitter: number of splits:1
2019-05-07 04:04:36,536 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2019-05-07 04:04:36,728 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1557225898252_0003
2019-05-07 04:04:36,729 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-05-07 04:04:36,939 INFO conf.Configuration: resource-types.xml not found
2019-05-07 04:04:36,939 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-05-07 04:04:36,997 INFO impl.YarnClientImpl: Submitted application application_1557225898252_0003
2019-05-07 04:04:37,029 INFO mapreduce.Job: The url to track the job: http://hadoopNode1:8088/proxy/application_1557225898252_0003/
2019-05-07 04:04:37,030 INFO mapreduce.Job: Running job: job_1557225898252_0003
2019-05-07 04:04:45,137 INFO mapreduce.Job: Job job_1557225898252_0003 running in uber mode : false
2019-05-07 04:04:45,138 INFO mapreduce.Job:  map 0% reduce 0%
2019-05-07 04:04:51,189 INFO mapreduce.Job:  map 100% reduce 0%
2019-05-07 04:04:59,223 INFO mapreduce.Job:  map 100% reduce 100%
2019-05-07 04:04:59,232 INFO mapreduce.Job: Job job_1557225898252_0003 completed successfully
2019-05-07 04:04:59,348 INFO mapreduce.Job: Counters: 54
        File System Counters
                FILE: Number of bytes read=41
                FILE: Number of bytes written=443547
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=126
                HDFS: Number of bytes written=23
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
                HDFS: Number of bytes read erasure-coded=0
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3968
                Total time spent by all reduces in occupied slots (ms)=4683
                Total time spent by all map tasks (ms)=3968
                Total time spent by all reduce tasks (ms)=4683
                Total vcore-milliseconds taken by all map tasks=3968
                Total vcore-milliseconds taken by all reduce tasks=4683
                Total megabyte-milliseconds taken by all map tasks=4063232
                Total megabyte-milliseconds taken by all reduce tasks=4795392
        Map-Reduce Framework
                Map input records=1
                Map output records=3
                Map output bytes=29
                Map output materialized bytes=41
                Input split bytes=109
                Combine input records=3
                Combine output records=3
                Reduce input groups=3
                Reduce shuffle bytes=41
                Reduce input records=3
                Reduce output records=3
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=288
                CPU time spent (ms)=4030
                Physical memory (bytes) snapshot=350552064
                Virtual memory (bytes) snapshot=3825860608
                Total committed heap usage (bytes)=177668096
                Peak Map Physical memory (bytes)=226557952
                Peak Map Virtual memory (bytes)=1911750656
                Peak Reduce Physical memory (bytes)=123994112
                Peak Reduce Virtual memory (bytes)=1914109952
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=17
        File Output Format Counters
                Bytes Written=23
[sachinpb@hadoopNode1 hadoop-3.2.0]$

------------------------


Step 19: Verify the output file in HDFS:

[sachinpb@hadoopNode1 hadoop-3.2.0]$ hdfs dfs -cat /user/sachinpb/helloworld_out/part-r-00000
---------------------
2019    4
hello    6
world   7
---------------------

Step 20:  MONITOR YOUR HDFS CLUSTER

[sachinpb@hadoopNode1]$ hdfs dfsadmin -report
Configured Capacity: 1990698467328 (1.81 TB)
Present Capacity: 1794297528320 (1.63 TB)
DFS Remaining: 1794297511936 (1.63 TB)
DFS Used: 16384 (16 KB)
DFS Used%: 0.00%
Replicated Blocks:
        Under replicated blocks: 0
        Blocks with corrupt replicas: 0
        Missing blocks: 0
        Missing blocks (with replication factor 1): 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0
Erasure Coded Block Groups:
        Low redundancy block groups: 0
        Block groups with corrupt internal blocks: 0
        Missing block groups: 0
        Low redundancy blocks with highest priority to recover: 0
        Pending deletion blocks: 0

-------------------------------------------------
Live datanodes (2):

Name: $HOST1_IPADDRESS:9866 (hadoopNode1)
Hostname: hadoopNode1
Decommission Status : Normal
Configured Capacity: 995349233664 (926.99 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 118666317824 (110.52 GB)
DFS Remaining: 876682903552 (816.47 GB)
DFS Used%: 0.00%
DFS Remaining%: 88.08%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 09 23:23:50 PDT 2019
Last Block Report: Thu May 09 23:11:35 PDT 2019
Num of Blocks: 0


Name: $HOST2_IPADDESS:9866 (hadoopNode2)
Hostname: hadoopNode2
Decommission Status : Normal
Configured Capacity: 995349233664 (926.99 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 77734621184 (72.40 GB)
DFS Remaining: 917614608384 (854.60 GB)
DFS Used%: 0.00%
DFS Remaining%: 92.19%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Thu May 09 23:23:49 PDT 2019
Last Block Report: Thu May 09 23:20:58 PDT 2019
Num of Blocks: 0

NOTE : you could see two live data nodes (hadoopNode1 & hadoopNode2) in this cluster  with all details about allocated HDFS space and block size...etc .This way you can check the health of the above Hadoop cluster. Also, we tested the wordcount application on this cluster as shown above.

Step 21: How to stop the Hadoop daemons in cluster environmnet:

 cd to /home/users/sachinpb/sachinPB/hadoop-3.2.0/sbin

[sachinpb@hadoopNode1 sbin]$ ./stop-all.sh
WARNING: Stopping all Apache Hadoop daemons as sachinpb in 10 seconds.
WARNING: Use CTRL-C to abort.
Stopping namenodes on [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
Stopping datanodes
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
hadoopNode2: Welcome to hadoopNode2!
hadoopNode2:
Stopping secondary namenodes [hadoopNode1]
hadoopNode1: Welcome to hadoopNode1!
hadoopNode1:
Stopping nodemanagers
Stopping resourcemanagers on []
[sachinpb@hadoopNode1 sbin]$

I hope this blog helped in understanding how to install Hadoop 3.x in a multinode setup i.e  cluster and how to perform operation on HDFS files

----------------------------------------END-------------------------------------------
Reference:
1) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/SingleCluster.html
2) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-common/ClusterSetup.html
3) https://hadoop.apache.org/docs/r3.0.3/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html
4) http://www.sachinpbuzz.com/2014/01/big-data-hadoop-20yarn-multi-node.html

Thursday, February 21, 2019

Kubernetes [K8s] Architecture and setup on RHEL

Kubernetes (k8s or Kube ) is an open source container management platform designed to run enterprise-class, cloud-enabled and web-scalable IT workloads.In other words,  Kubernetes is a container orchestrator to provision, manage, and scale apps.With the rise of containerization in the world of Devops, the need of a platform to effectively orchestrate these containers also grew.  Since Kubernetes operates at the container level rather than at the hardware level, it provides some generally applicable features common to PaaS offerings.  It is a vendor-agnostic cluster and container management tool, open-sourced by Google in 2014. Kubernetes allows you to manage the life cycle of containerized apps in a cluster.

Kubernetes Architecture : 

Similar to the Apache Mesos and Docker Swarm, Kubernetes is a container orchestrator to provision, manage, and scale apps. The key paradigm of Kubernetes is its declarative model. Kubernetes is designed on the principles of scalability, availability, security and portability. It optimizes the cost of infrastructure by efficiently distributing the workload across available resources. This architecture of Kubernetes provides a flexible, loosely-coupled mechanism for service discovery. Like most distributed computing platforms, a Kubernetes cluster consists of at least one master and multiple compute nodes (also known as worker nodes). The master is responsible for exposing the application program interface (API), scheduling the deployments and managing the overall cluster. Each node runs a container runtime, such as Docker or rkt (container system developed by CoreOS as a light weight and secure alternative to Docker), along with an agent that communicates with the master. The node also runs additional components for logging, monitoring, service discovery and optional add-ons. Nodes are the workhorses of a Kubernetes cluster. They expose compute, networking and storage resources to applications. Nodes can be virtual machines (VMs) running in a cloud or bare metal servers running within the data center.
source
A pod is a collection of one or more containers. The pod serves as Kubernetes’ core unit of management. Pods act as the logical boundary for containers sharing the same context and resources. The grouping mechanism of pods make up for the differences between containerization and virtualization by making it possible to run multiple dependent processes together. At runtime, pods can be scaled by creating replica sets, which ensure that the deployment always runs the desired number of pods.

Replica sets deliver the required scale and availability by maintaining a pre-defined set of pods at all times. A single pod or a replica set can be exposed to the internal or external consumers via services. Services enable the discovery of pods by associating a set of pods to a specific criterion. Pods are associated to services through key-value pairs called labels and selectors. Any new pod with labels that match the selector will automatically be discovered by the service. This architecture provides a flexible, loosely-coupled mechanism for service discovery.
The definition of Kubernetes objects, such as pods, replica sets and services, are submitted to the master. Based on the defined requirements and availability of resources, the master schedules the pod on a specific node. The node pulls the images from the container image registry and coordinates with the local container runtime to launch the container.

etcd is an open source, distributed key-value database from CoreOS, which acts as the single source of truth (SSOT) for all components of the Kubernetes cluster. The master queries etcd to retrieve various parameters of the state of the nodes, pods and containers.

 
source

Kubernetes Components:

 i) Master Components
Master components provide the cluster’s control plane. Master components make global decisions about the cluster (for example, scheduling), and detecting and responding to cluster events (starting up a new pod when a replication controller’s ‘replicas’ field is unsatisfied. The master stores the state and configuration data for the entire cluster in ectd, a persistent and distributed key-value data store. Each node has access to ectd and through it, nodes learn how to maintain the configurations of the containers they’re running. You can run etcd on the Kubernetes master or in standalone configurations.


  • kube-apiserver
  • etcd
  • kube-scheduler
  • kube-controller-manager
  • cloud-controller-manager

ii) Node Components
Node components run on every node, maintaining running pods and providing the Kubernetes runtime environment . All nodes in a Kubernetes cluster must be configured with a container runtime, which is typically Docker. The container runtime starts and manages the containers as they’re deployed to nodes in the cluster by Kubernetes.
  • kubelet
  • kube-proxy
  • Container Runtime

----------------------------------------------------------------------------

Installation Steps of Kubernetes 1.7 on RHEL 

---------------------------------------------------------------------------


Lets do the installation on two x86 nodes installed with RHEL 7.5
1) master-node
2) worker-node

Step 1: on master-node, disable SELinux & setup firewall rules
  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux

Set the following firewall rules and other configuration details.


  1.  firewall-cmd --permanent --add-port=6443/tcp
  2.  firewall-cmd --permanent --add-port=2379-2380/tcp
  3.  firewall-cmd --permanent --add-port=10250/tcp
  4.  firewall-cmd --permanent --add-port=10251/tcp
  5.  firewall-cmd --permanent --add-port=10252/tcp
  6.  firewall-cmd --permanent --add-port=10255/tcp
  7.  firewall-cmd --reload
  8.  modprobe br_netfilter
  9.  echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables

NOTE:You MUST disable swap in order for the kubelet to work properly

Step 2: Configure Kubernetes Repository
cat /etc/yum.repos.d/kubernetes.repo
-----------------------------------------------------------
[kubernetes]
name=Kubernetes
baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
----------------------------------------------------------


Step 3: configure  Docker  - validated version(18.06) for kubernetes

cat /etc/yum.repos.d/docker-main.repo
-------------------------------------------------------
[docker-main-repo]
name=Docker main Repository
baseurl=https://yum.dockerproject.org/repo/main/centos/7
enabled=1
gpgcheck=1
gpgkey=https://yum.dockerproject.org/gpg
-------------------------------------------------------

Step 4: Install docker

[root@master-node ]# yum install docker-ce-18.06*
Loaded plugins: langpacks, product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Resolving Dependencies
--> Running transaction check
---> Package docker-ce.x86_64 0:18.06.3.ce-3.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=======================================================================================================================================
 Package                      Arch                      Version                              Repository                           Size
=======================================================================================================================================
Installing:
 docker-ce                    x86_64                    18.06.3.ce-3.el7                     docker-ce-stable                     41 M

Transaction Summary
=======================================================================================================================================
Install  1 Package

Total size: 41 M
Installed size: 168 M
Is this ok [y/d/N]: y
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
  Installing : docker-ce-18.06.3.ce-3.el7.x86_64                                                                                   1/1
  Verifying  : docker-ce-18.06.3.ce-3.el7.x86_64                                                                                   1/1

Installed:
  docker-ce.x86_64 0:18.06.3.ce-3.el7

Complete!
[root@master-node ]#
-------------------------------------------------------------------

Step 5: Start and enable kubenetes

[root@master-node ~]# systemctl  start kubelet
[root@master-node ~]# systemctl  status kubelet
? kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           +-10-kubeadm.conf
   Active: active (running) since Thu 2019-02-21 01:00:56 EST; 1min 50s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 9551 (kubelet)
    Tasks: 69
   Memory: 56.3M
   CGroup: /system.slice/kubelet.service
           +-9551 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubele...

Feb 21 01:02:22 master-node kubelet[9551]: W0221 01:02:22.527780    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:22 master-node kubelet[9551]: E0221 01:02:22.528025    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:27 master-node kubelet[9551]: W0221 01:02:27.528986    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:27 master-node kubelet[9551]: E0221 01:02:27.529196    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:32 master-node kubelet[9551]: W0221 01:02:32.530265    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:32 master-node kubelet[9551]: E0221 01:02:32.530448    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:37 master-node kubelet[9551]: W0221 01:02:37.531526    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:37 master-node kubelet[9551]: E0221 01:02:37.531645    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Feb 21 01:02:42 master-node kubelet[9551]: W0221 01:02:42.532552    9551 cni.go:203] Unable to update cni config: No networks fo...i/net.d
Feb 21 01:02:42 master-node kubelet[9551]: E0221 01:02:42.532683    9551 kubelet.go:2192] Container runtime network not ready: N...ialized
Hint: Some lines were ellipsized, use -l to show in full.
[root@master-node ~]#

[root@master-node ]# systemctl enable kubelet
Created symlink from /etc/systemd/system/multi-user.target.wants/kubelet.service to /etc/systemd/system/kubelet.service.
[root@master-node]#
-----------------------------------

step 6 : Start and enable docker

[root@master-node]# systemctl restart docker
[root@master-node]# systemctl status docker
? docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; vendor preset: disabled)
   Active: active (running) since Thu 2019-02-21 00:41:14 EST; 7s ago
     Docs: https://docs.docker.com
 Main PID: 30271 (dockerd)
    Tasks: 161
   Memory: 90.8M
   CGroup: /system.slice/docker.service
           +-30271 /usr/bin/dockerd
           +-30284 docker-containerd --config /var/run/docker/containerd/containerd.toml
           +-30490 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30523 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30541 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30639 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30667 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30747 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30764 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30810 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...
           +-30938 docker-containerd-shim -namespace moby -workdir /var/lib/docker/containerd/daemon/io.containerd.runtime.v1.linux/...

Feb 21 00:41:15 master-node dockerd[30271]: time="2019-02-21T00:41:15-05:00" level=info msg="shim docker-containerd-shim started...d=30541
Feb 21 00:41:15 master-node dockerd[30271]: time="2019-02-21T00:41:15-05:00" level=info msg="shim docker-containerd-shim started...d=30639
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30667
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30747
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30764
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30810
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30938
Feb 21 00:41:16 master-node dockerd[30271]: time="2019-02-21T00:41:16-05:00" level=info msg="shim docker-containerd-shim started...d=30983
Feb 21 00:41:17 master-node dockerd[30271]: time="2019-02-21T00:41:17-05:00" level=info msg="shim reaped" id=cc95fdfdc1d7d0d6104...62d4d91
Feb 21 00:41:17 master-node dockerd[30271]: time="2019-02-21T00:41:17.122817061-05:00" level=info msg="ignoring event" module=li...Delete"
Hint: Some lines were ellipsized, use -l to show in full.
[root@master-node]# systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
[root@master-node]#


----------------------------------------
Step 7: Check the version of kubernetes installed on the master node :
[root@master-node ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.3", GitCommit:"721bfa751924da8d1680787490c54b9179b1fed0", GitTreeState:"clean", BuildDate:"2019-02-01T20:05:53Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}
[root@master-node ~]#

----------------------------------------------

Step 8:
Check the version of docker installed :

[root@master-node]# docker version
Client:
 Version:           18.06.3-ce
 API version:       1.38
 Go version:        go1.10.3
 Git commit:        d7080c1
 Built:             Wed Feb 20 02:26:51 2019
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.3-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.3
  Git commit:       d7080c1
  Built:            Wed Feb 20 02:28:17 2019
  OS/Arch:          linux/amd64
  Experimental:     false


---------------------------------------------------------------

Step 9: Initialize Kubernetes Master with ‘kubeadm init’


[root@master-node ~]# kubeadm init
[init] Using Kubernetes version: v1.13.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master-node localhost] and IPs [IP_ADDRESS_master-node 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master-node localhost] and IPs [IP_ADDRESS_master-node 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master-node kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 IP_ADDRESS_master-node]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 20.502302 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "master-node" as an annotation
[mark-control-plane] Marking the node master-node as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master-node as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: od9n1d.rltj6quqmm2kojd7
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join IP_ADDRESS_master-node:6443 --token od9n1d.rltj6quqmm2kojd7 --discovery-token-ca-cert-hash sha256:9ea1e1163550080fb9f5f63738fbf094f065de12cd38f493ec4e7c67c735fc7b

[root@master-node ~]#
-----------------------------------------------
If you get error -PORT already in use - please run kubectl reset
kubernetes master has been initialized successfully as shown above .
------------------------------------------------------------------------------------------------------

Step 10:Set the cluster for root user.

[root@master-node ~]# mkdir -p $HOME/.kube
[root@master-node ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@master-node ~]# chown $(id -u):$(id -g) $HOME/.kube/config
[root@master-node ~]#


------------------------

Step 11 : Check the status on master node  and Deploy pod network to the cluster

--------------------------
[root@master-node ~]# kubectl get nodes
NAME       STATUS     ROLES    AGE   VERSION
master-node   NotReady   master   12m   v1.13.3
[root@master-node ~]#


[root@master-node]# kubectl get pods --all-namespaces
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       nginx                              0/1     Pending   0          125m
kube-system   coredns-86c58d9df4-d4j4x           1/1     Running   0          3h49m
kube-system   coredns-86c58d9df4-sg8tk           1/1     Running   0          3h49m
kube-system   etcd-master-node                      1/1     Running   0          3h48m
kube-system   kube-apiserver-master-node            1/1     Running   0          3h48m
kube-system   kube-controller-manager-master-node   1/1     Running   0          3h48m
kube-system   kube-proxy-b6wcd                   1/1     Running   0          159m
kube-system   kube-proxy-qfdhq                   1/1     Running   0          3h49m
kube-system   kube-scheduler-master-node            1/1     Running   0          3h48m
kube-system   weave-net-5c46g                    2/2     Running   0          159m
kube-system   weave-net-7qsnj                    2/2     Running   0          3h35m
[root@master-node]#

-------------------


Step 12:  deploy network

The Weave Net addon for Kubernetes comes with a Network Policy Controller that automatically monitors Kubernetes for any NetworkPolicy annotations on all namespaces and configures iptables rules to allow or block traffic as directed by the policies.


[root@master-node ~]#  export kubever=$(kubectl version | base64 | tr -d '\n')
[root@master-node ~]#  kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$kubever"
serviceaccount/weave-net created
clusterrole.rbac.authorization.k8s.io/weave-net created
clusterrolebinding.rbac.authorization.k8s.io/weave-net created
role.rbac.authorization.k8s.io/weave-net created
rolebinding.rbac.authorization.k8s.io/weave-net created
daemonset.extensions/weave-net created
[root@master-node ~]#

------------------------

Step 13:Check the status on master node

[root@master-node ~]# kubectl get nodes
NAME       STATUS   ROLES    AGE   VERSION
master-node   Ready    master   14m   v1.13.3
[root@master-node ~]#



-----------------------------
Perform the following steps on each worker node

Step 14: 
  Disable SELinux and other configuration details

  1. setenforce 0
  2. sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
  3. firewall-cmd --permanent --add-port=10250/tcp
  4. firewall-cmd --permanent --add-port=10255/tcp
  5. firewall-cmd --permanent --add-port=30000-32767/tcp
  6. firewall-cmd --permanent --add-port=6783/tcp
  7. firewall-cmd  --reload
  8. echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables

NOTE: You MUST disable swap in order for the kubelet to work properly
----------------------------
Step 15: Configure Kubernetes  and docker Repositories on worker node  ( same as steps above)

-------------
Step 16:
Install docker

--------------
Step 17:

Start and enable docker service

-----------------

Step 18:

Now you can  Join worker node to master node

Whenever kubernetes master initialized , then in the output we get command and token.  Copy that command and run

[root@worker-node ~]#  kubeadm join IP_ADDRESS_master-node:6443 --token od9n1d.rltj6quqmm2kojd7 --discovery-token-ca-cert-hash sha256:9ea1e1163550080fb9f5f63738fbf094f065de12cd38f493ec4e7c67c735fc7b
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "IP_ADDRESS_master-node:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://IP_ADDRESS_master-node:6443"
[discovery] Requesting info from "https://IP_ADDRESS_master-node:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "IP_ADDRESS_master-node:6443"
[discovery] Successfully established connection with API Server "IP_ADDRESS_master-node:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "worker-node" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

[root@worker-node ~]#

This will activate the services required -------------------------

Step 19:

Now verify Nodes status from master node using kubectl command


[root@master-node]# kubectl get nodes
NAME       STATUS   ROLES    AGE    VERSION
master-node   Ready    master   119m   v1.13.3
worker-node   Ready    <none>   49m    v1.13.3
[root@master-node ]#


As we can see master and worker nodes are in ready status. This concludes that kubernetes 1.7 has been installed successfully and also we have successfully joined worker node.  Now we can create pods and services


---------------------------------   oooooooooooooooooo -------------------------------------------

Reference:
1) https://docs.google.com/presentation/d/1mbjjxNlPzgZIH1ciyprMRoIAYiEZuFQlG7ElXUvP1wg/edit#slide=id.g3d4e7af7b7_2_52
2) https://github.com/kubernetes-sigs/kube-batch
3) https://github.com/intel/multus-cni
4) https://kubernetes.io/docs/tutorials/kubernetes-basics
5) http://www.developintelligence.com/blog/2017/02/kubernetes-actually-use
6) https://kubernetes.io/docs/setup/independent/install-kubeadm/
7) https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/
8) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/getting_started_with_kubernetes/get_started_orchestrating_containers_with_kubernetes
9) https://github.com/kubernetes/kubeadm/issues/339
10) https://www.linuxtechi.com/install-kubernetes-1-7-centos7-rhel7/
11) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_atomic_host/7/html/getting_started_with_kubernetes/get_started_orchestrating_containers_with_kubernetes
12) https://kubernetes.io/docs 
13) https://thenewstack.io/kubernetes-an-overview/
14) https://blog.newrelic.com/engineering/what-is-kubernetes/