Sunday, August 22, 2021

Spectrum LSF 10.1 Installation and Applying Patch | FP | interim FIX on Linux Platform

IBM Spectrum LSF (LSF, originally Platform Load Sharing Facility) is a workload management platform, job scheduler, for distributed high performance computing (HPC) by IBM. In January, 2012, Platform Computing was acquired by IBM. The product is now called IBM® Spectrum LSF.

IBM® Spectrum LSF is a complete workload management solution for demanding HPC environments that takes your job requirements, finds the best resources to run the job, and monitors its progress. Jobs always run according to host load and site policies.

LSF cluster (source)

  • Cluster is a  group of computers (hosts) running LSF that work together as a single unit, combining computing power, workload, and resources. A cluster provides a single-system image for a network of computing resources. Hosts can be grouped into a cluster in a number of ways. A cluster can contain 1) All the hosts in a single administrative group  2) All the hosts on a subnetwork.
  • Job is a unit of work that is running in the LSF system or  job is a command or set of commands  submitted to LSF for execution. LSF schedules, controls, and tracks the job according to configured policies.
  • Queue is a cluster-wide container for jobs. All jobs wait in queues until they are scheduled and dispatched to hosts.
  • Resources are the objects in your cluster that are available to run work. 

Spectrum LSF 10.1 base Installation  and applying FP /PTF/FIX

Plan your installation and install a new production IBM Spectrum LSF cluster on UNIX or Linux hosts. The following diagram illustrates an example directory structure after the LSF installation is complete.


Plan your installation to determine the required parameters for the install.config file.

a )  lsf10.1_lsfinstall.tar.Z

The standard installer package. Use this package in a heterogeneous cluster with a mix of systems other than x86-64. Requires approximately 1 GB free space.

b)  lsf10.1_lsfinstall_linux_x86_64.tar.Z 


Use this smaller installer package in a homogeneous x86-64 or ppc cluster accordingly . 


Get the LSF distribution packages for all host types you need and put them in the same directory as the extracted LSF installer script. Copy that package to LSF_TARDIR path mentioned in Step 3.

For example:

Linux 2.6 kernel glibc version 2.3, the distribution package is lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z.

Linux  kernel glibc version 3.x, the distribution package is lsf10.1_lnx310-lib217-ppc64le.tar.Z


LSF uses entitlement files to determine which feature set is enabled or disabled based on the edition of the product. Copy  entitlement configuration file to LSF_ENTITLEMENT_FILE  path mentioned in step 3.

The following LSF entitlement configuration files are available for each edition:

LSF Standard Edition  ===>  lsf_std_entitlement.dat

LSF Express Edition   ===>  lsf_exp_entitlement.dat

LSF Advanced Edition  ==>  lsf_adv_entitlement.dat


Step 1 : Get the LSF installer script package that you selected and extract it.

# zcat lsf10.1_lsfinstall_linux_x86_64.tar.Z | tar xvf -

Step 2 :  Go to extracted directory :

 cd lsf10.1_lsfinstall

Step 3 : Configure install.config as per the plan

 cat install.config
  LSF_ADD_SERVERS="myhost1 myhost2 myhost3 myhost4 myhost5 myhost6 myhost7 myhost8"


Step 4:  Start LSF 10.1 base installation 

          ./lsfinstall -f install.config

Logging installation sequence in /root/LSF_new/lsf10.1_lsfinstall/Install.log
International Program License Agreement
Press Enter to continue viewing the license agreement, or
enter "1" to accept the agreement, "2" to decline it, "3"
to print it, "4" to read non-IBM terms, or "99" to go back
to the previous screen.
Checking the LSF TOP directory /nfs_shared_dir/LSF_HOME ...
... Done checking the LSF TOP directory /nfs_shared_diri/LSF_HOME ...
You are installing IBM Spectrum LSF - 10.1 Standard Edition
Searching LSF 10.1 distribution tar files in /nfs_shared_dir/conf_lsf/lsf_distrib Please wait ...
  1) linux3.10-glibc2.17-x86_64
Press 1 or Enter to install this host type: 1
Installing linux3.10-glibc2.17-x86_64 ...
Please wait, extracting lsf10.1_lnx310-lib217-x86_64 may take up to a few minutes ...
lsfinstall is done.
After installation, remember to bring your cluster up to date by applying the latest updates and bug fixes.

NOTE: You can do LSF installation as  non-root user. That will  be similar but with one extra prompt for multi-node cluster(yes/no)

Step 5 :  This step required only if installation was done by root .

 chown -R lsfadmin:lsfadmin $LSF_TOP

Step 6 :  check the binary files 

cd LSF_TOP/10.1/linux3.10-glibc2.17-x86_64/bin

Step 7 : By default, only root can start the LSF daemons. Any user can submit jobs to your cluster. To make the cluster available to other users, you must manually change the ownership and setuid bit for the lsadmin and badmin binary files to root, and the file permission mode to -rwsr-xr-x (4755) so that the user ID bit for the owner is setuid.

 chown root lsadmin
 chown root badmin
 chmod 4755 lsadmin
 chmod 4755 badmin
 ls -alsrt lsadmin
 ls -alsrt badmin

chown root  $LSF_SERVERDIR/eauth  

chmod u+s $LSF_SERVERDIR/eauth 

Step 8 : Configure  /etc/lsf.sudoers 

[root@myhost1]# cat /etc/lsf.sudoers

NOTE: This lsf.sudoers file is not installed by default. This file is located in /etc. lsf.sudoers file is used to set the parameter LSF_EAUTH_KEY to configure a key for eauth to encrypt and decrypt user authentication data. All the nodes/hosts should have this file . Customers need to configure LSF_EAUTH_KEY in /etc/lsf.sudoers on each side of multi-cluster. 

Step 9 : check $LSF_SERVERDIR/eauth   and copy  lsf.sudoers to all hosts in the cluster

 ls  $LSFTOP/10.1/linux3.10-glibc2.17-x86_64/etc/

scp /etc/lsf.sudoers myhost02:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost03:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost04:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost05:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost06:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost07:/etc/lsf.sudoers
scp /etc/lsf.sudoers myhost08:/etc/lsf.sudoers

Step 10 : Start LSF  as lsfadmin and check base Installation using  lsid command.

Step 11 : Check binary type with  lsid -V

$ lsid -V
IBM Spectrum LSF build 403338, May 27 2016
Copyright International Business Machines Corp. 1992, 2016.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

binary type: linux3.10-glibc2.17-x86_64

NOTE:  Download required FP and interim fixes from 

Step 12 : Before applying PTF12 and interim patches , bring down the LSF daemons.  Use the following commands to shut down the original LSF daemons

 badmin hshutdown all
 lsadmin resshutdown all
 lsadmin limshutdown all

Deactivate all queues to make sure that no new jobs can be dispatched during the upgrade:

badmin qinact all 

Step 13: Then, become the root to apply FP12 and interim patches . 

Set LSF environment :   .   LSF_TOP/conf/profile.lsf

.   /nfs_shared_dir/LSF_HOME/conf/profile.lsf

Step 14: Apply  FP 12 on LSF BASE installation.  The patchinstall is available in $LSF_TOP//install directory

         # cd $LSF_TOP/10.1/install

Perform a check on patches running. It is recommended to check for the patch before its installation

$ patchinstall –c

 ./patchinstall /root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z

[root@myhost7 install]# ./patchinstall /root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z
Logging patch installation sequence in /nfs_shared_dir/LSF_HOME/10.1/install/patch.log
Checking the LSF installation directory /nfs_shared_dir/LSF_HOME ...
Done checking the LSF installation directory /nfs_shared_dir/LSF_HOME.
Checking the patch history directory ...
Done checking the patch history directory /nfs_shared_dir/LSF_HOME/patch.
Checking the backup directory ...
Done checking the backup directory /nfs_shared_dir/LSF_HOME/patch/backup.
Installing package "/root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z"...
Checking the package definition for /root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z ...
Done checking the package definition for /root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z.
Finished backing up files to "/nfs_shared_dir/LSF_HOME/patch/backup/LSF_linux3.10-glibc2.17-x86_64_600488".
Done installing /root/PTF12_x86_2versions/lsf10.1_lnx310-lib217-x86_64-600488.tar.Z.

Step 15: Apply  interim fix1

./patchinstall /root/LSF_patch1/lsf10.1_lnx310-lib217-x86_64-600505.tar.Z

Logging patch installation sequence in /nfs_shared_dir/LSF_HOME/10.1/install/patch.log 
Installing package "/root/LSF_patch1/lsf10.1_lnx310-lib217-x86_64-600505.tar.Z"...
Checking the package definition for /root/LSF_patch1/lsf10.1_lnx310-lib217-x86_64-600505.tar.Z ...
Are you sure you want to update your cluster with this patch? (y/n) [y] y
Backing up existing files ...
Finished backing up files to "/nfs_shared_dir/LSF_HOME/patch/backup/LSF_linux3.10-glibc2.17-x86_64_600505".
Done installing /root/LSF_patch1/lsf10.1_lnx310-lib217-x86_64-600505.tar.Z.
Step 16: Apply interim fix2

 ./patchinstall /root/LSF_patch2/lsf10.1_lnx310-lib217-x86_64-600625.tar.Z

[root@myhost7 install]# ./patchinstall /root/LSF_patch2/lsf10.1_lnx310-lib217-x86_64-600625.tar.Z
Installing package "/root/LSF_patch2/lsf10.1_lnx310-lib217-x86_64-600625.tar.Z"...
Checking the package definition for /root/LSF_patch2/lsf10.1_lnx310-lib217-x86_64-600625.tar.Z ...
Backing up existing files ...
Finished backing up files to "/nfs_shared_dir/LSF_HOME/patch/backup/LSF_linux3.10-glibc2.17-x86_64_600625".
Done installing /root/LSF_patch2/lsf10.1_lnx310-lib217-x86_64-600625.tar.Z.
Step 17: As a root user , Setbit for new command bctrld

  cd LSF_TOP/10.1/linux3.10-glibc2.17-x86_64/bin
  chown root bctrld
  chmod 4755 bctrld

Step 18 :  Check lsf.shared file for multi cluster setup.

Begin Cluster
ClusterName      Servers
CLUSTER1       (cloudhost)
CLUSTER2       (myhost1)
CLUSTER3       (remotehost2)

          End Cluster

Step 19 : Switch back to  user lsfadmin. Use the following commands to start LSF using the newer daemons.

  lsadmin limstartup all
lsadmin resstartup all
badmin hstartup all

Use the following command to reactivate all LSF queues after upgrading: badmin qact all

Step 20 : Modify Conf files as per requirement add queues, clusters...etc . Then run badmin reconfig or lsadmin reconfig as explained in LSF configuration section below.  Restart LSF as "lsfadmin" user .

$ lsid
IBM Spectrum LSF Standard, Jun 10 2021
Copyright International Business Machines Corp. 1992, 2016.
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
My cluster name is CLUSTER2
My master name is myhost1
$ lsclusters -w
CLUSTER1            ok       cloudhost            lsfadmin        7        7
CLUSTER2            ok       myhost1              lsfadmin        8        8
CLUSTER3            ok       remotehost2          lsfadmin        8        8
$ bhosts
myhost1 ok - 20 0 0 0 0 0 myhost2 ok - 20 0 0 0 0 0 myhost3 ok - 19 0 0 0 0 0
myhost4 ok - 44 4 4 0 0 0
myhost5 ok - 44 4 4 0 0 0
myhost6 ok - 20 0 0 0 0 0 myhost7 ok - 20 0 0 0 0 0 myhost8 ok - 19 0 0 0 0 0
Spectrum LSF Cluster Installation and FP12 upgradation completed successfully  as per the details copied above.

You must run hostsetup as root to use --boot="y" option to modify the system scripts to automatically start and stop LSF daemons at system startup or shutdown. . The default is --boot="n".

1. Log on to each LSF server host as root. Start with the LSF master host.

2. Run hostsetup on each LSF server host. For example:

# cd $LSF_TOP/10.1/install

# ./hostsetup --top="$LSF_TOP" --boot="y"

NOTE: For more details on hostsetup usage, enter hostsetup -h.

In case of multi-cluster environment, reinstalling  master cluster would show status=disk  after issuing bclusters command. 

[smpici@c656f7n06 ~]$ bclusters
[Job Forwarding Information ]

            Queue1              send                     CLUSTER1          disc
            Queue2              send                     CLUSTER2          disc
            Queue3              send                     CLUSTER3          disc

where status=disc means communication between the two clusters is not established. The disc status might occur because no jobs are waiting to be dispatched, or because the remote master cannot be located.

Possible solution is to cleanup all the LSF daemons on all clusters. Note : lsfshutdown leaves some of the daemons on Master node. So , you need to manually kill all the LSF daemons on all master nodes.

Later,  bclusters should show the status as shown below:

[smpici@c656f7n06 ~]$ bclusters
[Job Forwarding Information ]

Queue1                              send        CLUSTER1                     ok
Queue2                              send        CLUSTER2                     ok
Queue3                              send        CLUSTER3                     ok


======================= LSF configuration section ===========================

After you change any configuration file, use the lsadmin reconfig and badmin reconfig commands to reconfigure your cluster. Log on to the host as root or the LSF administrator (in our case "lsfadmin")

Run lsadmin reconfig to restart LIM and checks for configuration errors. If no errors are found, you are prompted to either restart the lim daemon on management host candidates only, or to confirm that you want to restart the lim daemon on all hosts. If unrecoverable errors are found, reconfiguration is canceled. Run the badmin reconfig command to reconfigure the mbatchd daemon and checks for configuration errors.

  • lsadmin reconfig to reconfigure the lim daemon
  • badmin reconfig to reconfigure the mbatchd daemon without restarting
  • badmin mbdrestart to restart the mbatchd daemon
  • bctrld restart sbd to restart the sbatchd daemon

More details about cluster reconfiguration commands as shown in the table copied below :

Friday, July 23, 2021

Spectrum scale :High-performance storage GPFS cluster Installation and setup

IBM Spectrum Scale(formerly GPFS) is a scale-out high performance global parallel file system (cluster file system) that provides concurrent access to a single file system or set of file systems from multiple nodes. Enterprises and organizations are creating, analyzing and keeping more data than ever before. Islands of data are being created all over the organization and in the cloud creating complexity, difficult to manage systems and increasing costs. Those that can deliver insights faster while managing rapid infrastructure growth are the leaders in their industry. In delivering those insights, an organization’s underlying information architecture must support the hybrid cloud, big data and artificial intelligence (AI) workloads along with traditional applications while ensuring security, reliability, data efficiency and high performance. IBM Spectrum Scale™ meets these challenges as a parallel high-performance solution with global file and object data access for managing data at scale with the distinctive ability to perform archive and analytics in place.

Manually installing the IBM Spectrum Scale software packages on POWER nodes myhost1, myhost2 and myhost3

The following packages are required for IBM Spectrum Scale Standard Edition on Red Hat Enterprise Linux:

  1. gpfs.base*.rpm
  2. gpfs.gpl*.noarch.rpm
  3. gpfs.msg.en_US*.noarch.rpm
  4. gpfs.gskit*.rpm
  5. gpfs.license*.rpm

Step 1:Download spectrum scale SE package from fix central and Install RPM packages on all nodes:

 rpm -ivh gpfs.base*.rpm gpfs.gpl*rpm gpfs.license.std*.rpm gpfs.gskit*rpm gpfs.msg*rpm*rpm

Step 2 : Verify installed GPFS packages

 [root@myhost1 ]# rpm -qa | grep gpfs

Step 3 : Build GPL ( module by issuing command mmbuildgpl on all nodes in cluster.

Step 4 : Verify GPFS packages installed on all nodes with GPL module built properly.

              Export the path for GPFS commands. 

              export PATH=$PATH:/usr/lpp/mmfs/bin

Step 5 : Use the mmcrcluster command to create a GPFS cluster

mmcrcluster -N NodeFile -C smpi_gpfs_power8

                    where NodeFile has following entries

#cat NodeFile

Step 6: Use the mmchlicense command to designate licenses as needed. This command controls the type of GPFS license associated with the nodes in the cluster. -- accept indicates that you accept the applicable licensing terms. 

 mmchlicense server --accept -N serverLicense

Step 7: mmgetstate command. Displays the state of the GPFS™ daemon on one or more nodes.

 mmgetstate -a

Step 8: mmlslicense command displays information about the IBM Spectrum Scale node licensing designation or about disk and cluster capacity.

 mmlslicense -L

Step 9: The mmcrnsd command is used to create cluster-wide names for NSDs used by GPFS. This is the first GPFS step in preparing disks for use by a GPFS file system.

 mmcrnsd -F NSD_Stanza_smpi_gpfs_power -v no

 where NSD_Stanza_smpi_gpfs_power has

#cat NSD_Stanza_smpi_gpfs_power



Step 10: Use the mmlsnsd command to display the current information for the NSDs belonging to the GPFS cluster.

 mmlsnsd -X

Step 11: Use the mmcrfs command to create a GPFS file system

 mmcrfs smpi_gpfs -F NSD_Stanza_smpi_gpfs_power

Step 12: The mmmount command mounts the specified GPFS file system on one or more nodes in the cluster.

 mmmount smpi_gpfs -a

Step 13 : Use the mmlsfs command to list the attributes of a file system.

 mmlsfs all

Step 14: The mmlsmount command reports if a file system is in use at the time the command is issued.

 mmlsmount all

step 15: How to change the mount point from /gpfs to /my_gpfs 

 mmchfs gpfs -T /my_gpfs

Step 16:  GPFS auto start and auto mount  setup

[root@myhost1 ~]#  systemctl status gpfs.service
● gpfs.service - General Parallel File System
   Loaded: loaded (/usr/lib/systemd/system/gpfs.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2021-07-20 03:27:04 EDT; 3 days ago
  Process: 96622 ExecStart=/usr/lpp/mmfs/bin/mmremote startSubsys systemd $STARTSUBSYS_ARGS (code=exited, status=0/SUCCESS)
 Main PID: 96656 (runmmfs)
   CGroup: /system.slice/gpfs.service
           ├─96656 /usr/lpp/mmfs/bin/mmksh /usr/lpp/mmfs/bin/runmmfs
           └─97093 /usr/lpp/mmfs/bin/mmfsd

[root@myhost1 ~]# systemctl is-active gpfs.service
[root@myhost1 ~]#  systemctl is-enabled gpfs.service
[root@myhost1 ~]# systemctl is-failed gpfs.service
[root@myhost1 ~]# systemctl enable  gpfs.service
Created symlink from /etc/systemd/system/ to /usr/lib/systemd/system/gpfs.service.
[root@myhost1 ~]# systemctl is-enabled gpfs.service
[root@myhost1 ~]# ls -alsrt /etc/systemd/system/
0 lrwxrwxrwx 1 root root 36 Jul 23 05:43 /etc/systemd/system/ -> /usr/lib/systemd/system/gpfs.service

[root@myhost1 ~]# mmgetstate -a
 Node number  Node name        GPFS state
       1                myhost2        active
       2                myhost1        active
       3                myhost3        active

[root@myhost1 ~]# mmchfs smpi_gpfs -A yes
mmchfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

[root@myhost1 ~]# mmlsfs smpi_gpfs  -A
flag                value                    description
------------------- ------------------------ -----------------------------------
 -A                 yes                      Automatic mount option

[root@myhost1 ~]# mmchconfig autoload=yes
mmchconfig: Command successfully completed
mmchconfig: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.
[root@myhost1 ~]#

Step 17: Troubleshoot when GPFS node went to inactive state  or when disk goes down .

[root@myhost1 ~]# mmlscluster
GPFS cluster information
GPFS cluster name:         my_spectrumScale_cluster
GPFS cluster id:           9784093264651231821
GPFS UID domain:          my_spectrumScale_cluster
Remote shell command:      /usr/bin/ssh
Remote file copy command:  /usr/bin/scp
Repository type:           CCR

Node  Daemon node name  IP address     Admin node name  Designation
1   myhost2         10.x.y.1  myhost2        quorum
2   myhost1         10.x.y.2  myhost1        quorum-manager
3   myhost3         10.x.y.3  myhost3        quorum-manager

[root@myhost1 ~]# mmgetstate -a
Node number  Node name        GPFS state
1                      myhost1         active
2                      myhost2        down
3                      myhost3        active

[root@myhost1 ~]# mmstartup -a
Tue Jul 20 03:27:03 EDT 2021: mmstartup: Starting GPFS ...
myhost2:  The GPFS subsystem is already active.
myhost3:  The GPFS subsystem is already active.

[root@myhost1 ~]# mmgetstate -a

Node number  Node name        GPFS state
1                      myhost1        active
2                      myhost2        active
3                      myhost3        active
[root@myhost1 ~]#

[root@myhost1 ~]# mmunmount smpi_gpfs -a
Tue Jul 20 04:12:04 EDT 2021: mmunmount: Unmounting file systems ...
[root@myhost1 ~]# 

[root@myhost1 ~]#  mmlsdisk smpi_gpfs
disk         driver   sector     failure holds    holds                            storage
name         type       size       group metadata data  status        availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512          -1 Yes      Yes   ready         up              system
nsd2         nsd         512          -1 Yes      Yes   ready         down         system
nsd3         nsd         512          -1 Yes      Yes   ready         up              system
[root@myhost1 ~]#

[root@myhost1 ~]#  mmchdisk smpi_gpfs start -d nsd2
mmnsddiscover:  Attempting to rediscover the disks.  This may take a while ...
mmnsddiscover:  Finished.
myhost1:  Rediscovered nsd server access to nsd2.
Scanning file system metadata, phase 1 ...
100 % complete on Tue Jul 20 04:24:14 2021
Scan completed successfully.
Scanning file system metadata, phase 2 ...
100 % complete on Tue Jul 20 04:24:14 2021
Scan completed successfully.
Scanning file system metadata, phase 3 ...
Scan completed successfully.
Scanning file system metadata, phase 4 ...
100 % complete on Tue Jul 20 04:24:14 2021
Scan completed successfully.
Scanning file system metadata, phase 5 ...
100 % complete on Tue Jul 20 04:24:14 2021
Scan completed successfully.
Scanning user file metadata ...
100.00 % complete on Tue Jul 20 04:24:25 2021  (    500736 inodes with total      26921 MB data processed)
Scan completed successfully.

[root@myhost1 ~]#  mmmount  smpi_gpfs  -a
Tue Jul 20 04:24:42 EDT 2021: mmmount: Mounting file systems ...
[root@myhost1 ~]#

[root@myhost1 ~]# mmlsdisk smpi_gpfs
disk         driver   sector     failure holds    holds                            storage
name         type       size       group metadata data  status        availability pool
------------ -------- ------ ----------- -------- ----- ------------- ------------ ------------
nsd1         nsd         512          -1 Yes      Yes   ready         up           system
nsd2         nsd         512          -1 Yes      Yes   ready         up           system
nsd3         nsd         512          -1 Yes      Yes   ready         up           system
[root@myhost1 ~]#

[root@myhost1 ~]# mmgetstate -a
Node number  Node name        GPFS state
1                       myhost1        active
2                       myhost2        active
3                       myhost3        active
[root@myhost1 ~]#

This Quick Start automatically deploys a highly available IBM Spectrum Scale cluster on the Amazon Web Services (AWS) Cloud. This Quick Start deploys IBM Spectrum Scale into a virtual private cloud (VPC) that spans two Availability Zones in your AWS account. You can build a new VPC for IBM Spectrum Scale, or deploy the software into your existing VPC. The deployment and configuration tasks are automated by AWS CloudFormation templates that you can customize during launch.

IBM's container-native storage solution for OpenShift is designed for enterprise customers who need global hybrid cloud data access. These storage services meet the strict requirements for mission critical data. IBM Spectrum® Fusion provides a streamlined way for organizations to discover, secure, protect and manage data from the edge, to the core data center, to the public cloud.

Spectrum Fusion

IBM launched a containerized derivative of its Spectrum Scale parallel file system called Spectrum Fusion. The rationale is that customers need to store and analyze more data at edge sites, while operating in a hybrid and multi-cloud world that requires data availability across all these locations. The ESS arrays provide Edge storage capacity and a containerized Spectrum Fusion can run in any of the locations mentioned. It’s clear that to build, deploy and manage applications requires advanced capabilities that help provide rapid availability to data across the entire enterprise – from the edge to the data center to the cloud. 

Spectrum Fusion combines Spectrum Scale functionality with unspecified IBM data protection software. It will appear first in a hyperconverged infrastructure (HCI) system that integrates compute, storage and networking. This will be equipped with Red Hat Open Shift to support virtual machine and containerized workloads for cloud, edge and containerized data centres.

Spectrum Fusion will integrate with Red Hat Advanced Cluster Manager (ACM) for managing multiple Red Hat OpenShift clusters, and it will support tiering. Spectrum Fusion provides customers with a streamlined way to discover data from across the enterprise as it has a global index of the data it stores. It will manage a single copy of data only – i.e. there is no need to create duplicate data when moving application workloads across the enterprise. Spectrum Fusion will integrate with IBM’s Cloud Satellite, a managed distribution cloud that deploys and runs apps across the on-premises, edge and cloud environments. 


Monday, December 7, 2020

Overview Of MPI Reduction Operations in HPC cluster

Message Passing Interface[ MPI ] is a de facto standard framework for distributed computing in many HPC applications. MPI collective operations involve a group of processes communicating by message passing in an isolated context, known as a communicator. Each process is identified by its rank, an integer number ranging from 0 to P − 1, where P is the size of the communicator. All processes place the same call (SPMD fashion i.e Single Program Multiple Data) depending on the process.

MPI Reductions are among the most useful MPI operations and form an important class of computational operations. . The operation can be either user-specified or from the list of pre-defined operations. Usually, the predefined operations are largely sufficient for any application. 
Consider a system where you have N processes. The goal of the game is to compute the dot product of two N-vectors in parallel. Now the dot product of two vectors u and v Example operation : u⋅v=u1v1+u2v2+...+uNvN . As you can imagine, this is highly parallelizable. If you have N processes, each process i can compute the intermediate value ui×vi. Then, the program needs to find a way to sum all of these values. This is where the reduction comes into play. We can ask MPI to sum all those value and store them either on only one process (for instance process 0) or to redistribute the value to every process.
MPI reduction operations fall into three categories: 
1) Global Reduction Operations: 
2) Combined Reduction and Scatter Operations: 
3) Scan Operations: 
  • MPI SCAN, 
  • MPI EXSCAN, and 
The primary idea of these operations is to collectively compute on a set of input data elements to generate a combined output. MPI REDUCE is a collective function where each process provides some input data (e.g., an array of double-precision floating-point numbers). This input data is combined through an MPI operation, as specified by the“op” parameter. Most applications use MPI predefined operations such as summations or maximum value identification, although some applications also utilize reductions based on user-defined function handlers. The MPI operator “op” is always assumed to be associative. All predefined operations are also assumed to be commutative. Applications, however, may define their own operations that are associative but not commutative. The “canonical” evaluation order of a reduction is determined by the ranks of the processes in the group. However, an MPI implementation can take advantage of associativity, or associativity and commutativity of the operations, in order to change the order of evaluation. Doing so may change the result of the reduction for operations that are not strictly associative and commutative, such as floating-point addition 
The following predefined operations are supplied for MPI_REDUCE and related functions MPI_ALLREDUCE, MPI_REDUCE_SCATTER, and MPI_SCAN. 
These operations are invoked by placing the following in op 
  • [ Name] Meaning 
  • [ MPI_MAX] maximum 
  • [ MPI_MIN] minimum 
  • [ MPI_SUM] sum 
  • [ MPI_PROD] product 
  • [ MPI_LAND] logical and 
  • [ MPI_BAND] bit-wise and 
  • [ MPI_LOR] logical or 
  • [ MPI_BOR] bit-wise or 
  • [ MPI_LXOR] logical xor 
  • [ MPI_BXOR] bit-wise xor 
  • [ MPI_MAXLOC] max value and location 
  • [ MPI_MINLOC] min value and location 


 Example 1: Get the memory on each node and perform MPI_SUM operation to calculate average Memory on the cluster.

Tuesday, October 20, 2020

Ansible Concepts: Run first Command and Playbook on Linux cluster

Ansible is a configuration management and orchestration tool that automates cloud provisioning, configuration management, application deployment, intra-service orchestration, and many other IT needs. The open source product is maintained by Ansible Inc. It was first released in 2012.  Red Hat acquired Ansible in 2015. Red Hat Ansible Engine and Red Hat Ansible Tower are commercial products. Ansible can be run directly from the command line without setting up any configuration files.  You only need to install Ansible on the control server or node. It communicates and performs the required tasks using SSH. No other installation is required. This is different from other orchestration tools like Chef and Puppet where you have to install software both on the control and client nodes.It uses no agents and no additional custom security infrastructure, so it's easy to deploy - and most importantly, it uses a very simple language (YAML, in the form of Ansible Playbooks)Ansible uses configuration files called playbooks for a series of tasks. The playbooks are written in YAML syntax. That allow you to describe your automation jobs in a way that approaches plain English.



 The Ansible Automation engine consists of:

Control node

Any machine with Ansible installed. You can run commands and playbooks, invoking /usr/bin/ansible or /usr/bin/ansible-playbook, from any control node. You can use any computer that has Python installed on it as a control node - laptops, shared desktops, and servers can all run Ansible. However, you cannot use a Windows machine as a control node. You can have multiple control nodes.

Managed nodes

The network devices (and/or servers) you manage with Ansible. Managed nodes are also sometimes called “hosts”. Ansible is not installed on managed nodes.


A list of managed nodes. An inventory file is also sometimes called a “hostfile”. Your inventory can specify information like IP address for each managed node. An inventory can also organize managed nodes, creating and nesting groups for easier scaling. Inventories can be of two types static and dynamic, dynamic inventory can be covered while you go through Ansible thoroughly.


The units of code Ansible executes. Each module has a particular use, from administering users on a specific type of database to managing VLAN interfaces on a specific type of network device. You can invoke a single module with a task, or invoke several different modules in a playbook.


The units of action in Ansible. You can execute a single task once with an ad-hoc command.


Ordered lists of tasks, saved so you can run those tasks in that order repeatedly. Playbooks can include variables as well as tasks. Playbooks are written in YAML and are easy to read, write, share and understand. 

CMDB(Configuration Management Database.) :

It is a repository that acts as a data warehouse for IT installations. It holds data relating to a collection of IT assets (commonly referred to as configuration items (CI)), as well as to describe relationships between such assets.

A network of remote servers on which you can store, manage and process your data, these servers are hosted on internet, storing the data remotely rather than local servers, just launch your resources and instances on cloud, connect them to your servers and you’ve the wisdom of operating your task remotely.


Ansible works by connecting to your nodes and pushing out small programs, called "Ansible modules" to them. These programs are written to be resource models of the desired state of the system. Ansible then executes these modules (over SSH by default), and removes them when finished.

Your library of modules can reside on any machine, and there are no servers, daemons, or databases required. Typically you'll work with your favorite terminal program, a text editor, and probably a version control system to keep track of changes to your content. Passwords are supported, but SSH keys with ssh-agent are one of the best ways to use Ansible. 

By default, Ansible represents what machines it manages using a very simple INI file that puts all of your managed machines in groups of your own choosing. The Ansible inventory file defines the hosts and groups of hosts upon which commands, modules, and tasks in a playbook operate. It resides under the /etc/ansible directory. If necessary, you can also create project-specific inventory files in alternate locations. 

How to install Ansible on RHEL8 machine:

Install instructions for Ansible Engine on RHEL on IBM Power (little endian).
subscription-manager repos --enable="ansible-2.9-for-rhel-8-ppc64le-rpms"
yum install ansible 

Verify installed version of ansible :

[root@myhost123 example]# ansible --version
ansible 2.10.2
  config file = /root/sachin/example/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /root/.local/lib/python3.6/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.6.8 (default, Dec  5 2019, 16:11:43) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

As explained in above section,  Ansible Engine consists of Inventory, API, Modules and Plugins. A user writes playbooks i.e. set of tasks, then the playbook scans the inventory and matches for the listed hosts or IP addresses where the tasks must be executed. Ansible copies all the modules to the managed node and using Python API calls and plugins.  Ansible completes the given tasks. Once the tasks are completed/executed all the modules are destroyed on the Managed Nodes. Ansible on linux executes the modules on managed hosts using SSH

How to use  ANSIBLE for ad-hoc parallel task execution:
Once you have an instance available, you can talk to it right away, without any additional setup:

ansible 'hosts' -m module_name

Eg:  ansible 'localhost' -m shell -a 'id'

ansible all -m ping
ansible -m yum -a "name=httpd state=installed"
ansible -a "/usr/sbin/reboot"


CASE 1:[root@myhost123 example]# ansible all -m ping
myhost123 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    "changed": false,
    "ping": "pong"

CASE 2: # ansible 'localhost' -m shell -a 'id'
localhost | CHANGED | rc=0 >>
uid=0(root) gid=0(root) groups=0(root) context=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023

CASE 3: # ansible myhost123 -m yum -a "name=httpd state=installed"

myhost123 | CHANGED => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Installed: mod_http2-1.11.3-3.module+el8.2.0+7758+84b4ca3e.1.ppc64le",
        "Installed: httpd-2.4.37-21.module+el8.2.0+5008+cca404a3.ppc64le",
        "Installed: httpd-filesystem-2.4.37-21.module+el8.2.0+5008+cca404a3.noarch",
        "Installed: apr-util-1.6.1-6.el8.ppc64le",
        "Installed: apr-util-openssl-1.6.1-6.el8.ppc64le",
        "Installed: apr-1.6.3-9.el8.ppc64le",
        "Installed: redhat-logos-httpd-81.1-1.el8.noarch",
        "Installed: httpd-tools-2.4.37-21.module+el8.2.0+5008+cca404a3.ppc64le",
        "Installed: apr-util-bdb-1.6.1-6.el8.ppc64le"


How To Setup Ansible Master-Slave and Install Apache Web Server
Let’s see the capabilities of Ansible in this example of simple web server setup. We will have the following components:

  1. Control Node – It is the node that will have Ansible installed and it will control the other nodes.
  2. Load Balancer  – A nginx based load balancer will be installed on this node.
  3. Web Server 1 and Server 2  – These nodes will have Apache installed with a simple hello world web page. The load balancer will alternate traffic between these two nodes.

We will first install Ansible on the control node. Then, we will use the control node to set up the load balancer and application nodes.


How to  create playbooks ?   Example Hello world

 [root@myhost123 example]# cat HelloWorld.yml
- name: This is a hello-world example
  hosts: all
    - name: Create a file called '/tmp/output.txt' with the content 'hello world'.
        content: hello world
        dest: /tmp/output.txt

[root@myhost123 example]#

Run Playbook:

[root@myhost123 example]# ansible-playbook  HelloWorld.yml
PLAY [This is a hello-world example] ********************************************************************
TASK [Gathering Facts] **********************************************************************************
ok: [myhost123]
TASK [Create a file called '/tmp/output.txt' with the content 'hello world'.] *************************
ok: [myhost123]
PLAY RECAP **********************************************************************************************
myhost123                   : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
[root@myhost123 example]#

Verify  output:

[root@myhost123 example]# cat /tmp/output.txt
hello world


 All YAML files (regardless of their association with Ansible or not) can optionally begin with --- and end with ...  This is part of the YAML format and indicates the start and end of a document.


Sunday, May 24, 2020

RHEL8 - Next generation of Linux container Capabilities - podman, buildah, skopeo .....!

Container technology is creating a lot of buzz in the recent times. As people move from virtualization to container technology, many enterprises have adopted software container cloud application deployment. Containers leverage some key capabilities available within Linux. Containers depend on key Linux kernel features such as control groups, namespaces, and SELinux in order to manage resources and isolate the applications that are running inside the containers. It’s not just containers that generally work best with Linux, but also the tools used to manage their lifecycles. Today, Kubernetes is the leading container orchestration platform, and it was built on Linux concepts and uses Linux tooling and application programming interfaces (APIs) to manage the containers.

Red Hat OpenShift is a leading hybrid cloud, enterprise Kubernetes application platform, trusted by 1,700+ organizations. It is much easier to use, and it even has a web interface for configuration. They developed container tools for single hosts and in clusters, standardizing on Kubernetes. Other alternative- popular managed Kubernetes service are  AWS EKS(Amazon Elastic Kubernetes Service)/Fargate,  Azure AKS, or Google Cloud Platform’s GKE, Apache Mesos, Docker Swarm, Nomad, OpenStack, Rancher, and Docker Compose.

For RHEL 8, the Docker package is not included and not supported by Red Hat. The docker package has been replaced by the new suite of tools in the Container Tools module as listed

  •     The podman container engine replaced docker engine
  •     The buildah utility replaced docker build
  •     The skopeo utility replaced docker push

Red Hat Quay -A distributed, highly available container registry for entire enterprise.  Unlike other container tools implementations, tools described here do not center around the monolithic Docker container engine and docker command. Instead,  they provide a set of command-line tools that can operate without a container engine. These include:

  • podman - client tool for directly managing pods and container images (run, stop, start, ps, attach, exec, and so on)
  • buildah - client tool for building, pushing and signing container images
  • skopeo - client tool for copying, inspecting, deleting, and signing images
  • runc -  Container runtime client for providing container run and build features to podman and buildahwith OCI format containers
  • crictl - For troubleshooting and working directly with CRI-O container engines
Because these tools are compatible with the Open Container Initiative (OCI), they can be used to manage the same Linux containers that are produced and managed by Docker and other OCI-compatible container engines. However, they are especially suited to run directly on Red Hat Enterprise Linux, in single-node use cases. Each tool in this scenario can be more light-weight and focused on a subset of features. And with no need for a daemon process running to implement a container engine, these tools can run without the overhead of having to work with a daemon process.

For a multi-node container platform, there is OpenShift. Instead of relying on the single-node, daemonless tools, OpenShift requires a daemon-based container engine like  CRI-O Container Engine. . Also, podman stores its data in the same directory structure used by Buildah, Skopeo, and CRI-O, which will allow podman to eventually work with containers being actively managed by CRI-O in OpenShift.

In a nutshell, you get Podman with RHEL in a single node use case (orchestrate yourself) and CRI-O as part of the highly automated OpenShift 4 software stack as shown in diagram.

What is CRI-O? 
CRI-O is an implementation of the Kubernetes CRI (Container Runtime Interface) to enable using OCI (Open Container Initiative) compatible runtimes. It is a lightweight alternative to using Docker as the runtime for kubernetes. It allows Kubernetes to use any OCI-compliant runtime as the container runtime for running pods. Today it supports runc and Kata Containers as the container runtimes but any OCI-conformant runtime can be plugged in principle. CRI-O supports OCI container images and can pull from any container registry. It is a lightweight alternative to using Docker, Moby or rkt as the runtime for Kubernetes.

Why CRI-O ?
CRI-O is an open source, community-driven container engine. Its primary goal is to replace the Docker service as the container engine for Kubernetes implementations, such as OpenShift Container Platform.  The CRI-O container engine provides a stable, more secure, and performant platform for running Open Container Initiative (OCI) compatible runtimes. You can use the CRI-O container engine to launch containers and pods by engaging OCI-compliant runtimes like runc [the default OCI runtime] or Kata Containers.

CRI-O is not supported as a stand-alone container engine. You must use CRI-O as a container engine for a Kubernetes installation, such as OpenShift Container Platform. To run containers without Kubernetes or OpenShift Container Platform, use podman. CRI-O’s purpose is to be the container engine that implements the Kubernetes Container Runtime Interface (CRI) for OpenShift Container Platform and Kubernetes, replacing the Docker service.  The scope of CRI-O is tied to the Container Runtime Interface (CRI). CRI extracted and standardized exactly what a Kubernetes service (kubelet) needed from its container engine. There is little need for direct command-line contact with CRI-O. A set of container-related command-line tools are available to provide full access to CRI-O for testing and monitoring - crictl, runc, podman, buildah, skopeo. Some Docker features are included in other tools instead of in CRI-O. For example, podman offers exact command-line compatibility with many docker command features and extends those features to managing pods as well. No container engine is needed to run containers or pods with podman. Features for building, pushing, and signing container images, which are also not required in a container engine, are available in the buildah command.
Kubernetes and CRI-O process
The following are the components of CRI-O :
  • OCI compatible runtime – Default is runC, other OCI compliant are supported as well e.g Kata Containers.
  • containers/storage – Library used for managing layers and creating root file-systems for the containers in a pod.
  • containers/image – Library is used for pulling images from registries.
  • networking (CNI) – Used for setting up networking for the pods. Flannel, Weave and OpenShift-SDN CNI plugins have been tested.
  • container monitoring (conmon) – Utility within CRI-O that is used to monitor the containers.
  • security is provided by several core Linux capabilities
Runtime in Kubernetes
where : OCI runtime works as low-level runtime
             High-level runtime provides inputs to OCI runtime as per OCI specs

How do Podman, CRI-O and Kata Containers relate to this ecosystem?

An OCI runtime is relatively simple. You give it the root filesystem of the container and a json file describing core properties of the container, and the runtime spins up the container and connects it to an existing network using a pre-start hook.

Listed actions below are the job of a high-level container runtime. On top of this, the high-level container runtime implements the CRI so that Kubernetes has an easy way to drive the runtime.
  •     Actually creating the network of a container.
  •     Managing container images.
  •     Preparing the environment of a container.
  •     Managing local/persistent storage.
runc is the default for most tools such as Docker and Podman.

What CRI-O isn’t:

Building images, for example, is out of scope for CRI-O and that’s left to tools like Docker’s build command, Buildah, or OpenShift’s Source-to-Image (S2I). Once an image is built, CRI-O will happily consume it, but the building of images is left to other tools.
What is Podman?
Podman is a daemonless container engine for developing, managing, and running OCI Containers on your Linux System developed by Red Hat where engineers have paid special attention to using the same nomenclature when executing Podman commands. Containers can either be run as root or in rootless mode. It's a replacement for Docker for local development of containerized applications. Podman commands map 1 to 1 to Docker commands, including their arguments. You could alias docker with podman and never notice that there is a completely different tool managing your local containers.The Podman approach is simply to directly interact with the image registry, with the container and image storage, and with the Linux kernel through the runC container runtime process (not a daemon). Podman allows you to do all of the Docker commands without the daemon dependency.

Podman workflow

One of the core features of Podman is it's focus on security. There is no daemon involved in using Podman. It uses traditional fork-exec model instead and as well heavily utilizes user namespaces and network namespaces. As a result, Podman is a bit more isolated and in general more secure to use than Docker. You can even be root in a container without granting container or Podman any root privileges on the host -- and user in a container won't be able to do any root-level tasks on the host machine.Running rootless Podman and Buildah can do most things people want to do with containers, but there are times when root is still required. The nicest feature is running Podman and containers as a non-root user. This means you never have give a user root privileges on the host, while in the client/server model (like Docker employs), you must open a socket to a privileged daemon running as root to launch the containers. There you are at the mercy of the security mechanisms implemented in the daemon versus the security mechanisms implemented in the host operating systems—a dangerous proposition.

How containers run with container Engine ?
Podman can now ease the transition to Kubernetes and CRI-O :
On a basic level, Kubernetes is often viewed as the application that runs your containers, but Kubernetes really is a huge bundle of utilities or APIs that explain how a group of microservices running in containers on a group of servers can coordinate and work together and share services and resources. Kubernetes only supplies the APIs for  orchestration and scheduling, and resource management. To have a complete container orchestration platform, you’ll need the OS underneath, a container registry, container networking, container storage, logging and monitoring, and a way to integrate continuous integration/continuous delivery (CI/CD). Red Hat OpenShift, a supported Kubernetes for cloud-native applications with enterprise security on multi-cloud environment.
A group of seals is called a pod :)-  Padman manage pods. The Pod concept was introduced by Kubernetes.  Podman pods are similar to the Kubernetes definition. Podman can now capture the YAML description of local pods and containers and then help users transition to a more sophisticated orchestration environment like Kubernetes. Check this developer and user workflow:
  • Create containers/pods locally using Podman on the command line.
  • Verify these containers/pods locally or in a localized container runtime (on a different physical machine).
  • Snapshot the container and pod descriptions using Podman and help users re-create them in Kubernetes.
  • Users add sophistication and orchestration (where Podman cannot) to the snapshot descriptions and leverage advanced functions of Kubernetes.
How containers run in kubernetes cluster?

This container stack within Red Hat Enterprise Linux and Red Hat Enterprise Linux CoreOS serves as part of the foundation for OpenShift. As can be seen in the drawing below, the CRI-O stack in OpenShift shares many of its underlying components with Podman. This allows Red Hat engineers to leverage knowledge gained in experiments conducted in Podman for new capabilities in OpenShift.

Pod-Architecture source

Every Podman pod includes an “infra” container.   This container does nothing, but go to sleep. Its purpose is to hold the namespaces associated with the pod and allow podman to connect other containers to the pod.  This allows you to start and stop containers within the POD and the pod will stay running, where as if the primary container controlled the pod, this would not be possible. Most of the attributes that make up the Pod are actually assigned to the “infra” container.  Port bindings, cgroup-parent values, and kernel namespaces are all assigned to the “infra” container. This is critical to understand, because once the pod is created these attributes are assigned to the “infra” container and cannot be changed. 

In the above diagram, notice the box above each container, conmon, this is the container monitor.  It is a small C Program that’s job is to watch the primary process of the container, and if the container dies, save the exit code.  It also holds open the tty of the container, so that it can be attached to later. This is what allows podman to run in detached mode (backgrounded), so podman can exit but conmon continues to run.  Each container has its own instance of conmon.

Buildah : The buildah command allows you to build container images either from command line or using Dockerfiles. These images can then be pushed to any container registry and can be used by any container engine, including Podman, CRI-O, and Docker. Buildah specializes in building OCI images. Buildah’s commands replicate all of the commands that are found in a Dockerfile. Buildah’s goal is also to provide a lower level coreutils interface to build container images, allowing people to build containers without requiring a Dockerfile. Buildah’s other goal is to allow you to use other scripting languages to build container images without requiring a daemon. The buildah command can be used as a separate command, but is incorporated into other tools as well. For example the podman build command used buildah code to build container images. Buildah is also often used to securely build containers while running inside of a locked down container by a tool like Podman, OpenShift/Kubernetes or Docker. Buildah allows you to have a Kubernetes cluster without any Docker daemon for both runtime and builds.  So, When to use Buildah and when to use Podman. With Podman you can run, build (it calls Buildah under the covers for this), modify and troubleshoot containers in your Kubernetes cluster. With the two projects together, you have a well rounded solution for your OCI container image and container needs. Buildah and Podman are easily installable via yum install buildah podman.

A quick and easy way to summarize the difference between the two projects is the buildah run command emulates the RUN command in a Dockerfile while the podman run command emulates the docker run command in functionality. Buildah is an efficient way to create OCI images while Podman allows you to manage and maintain those images and containers in a production environment using familiar container CLI commands. Together they form a strong foundation to support your OCI container image and container needs.

skopeo: The skopeo command is a tool for copying containers and images between different types of container storage. It can copy containers from one container registry to another. It can copy images to and from a host, as well as to other container environments and registries. Skopeo can inspect images from container image registries, get images and image layers, and use signatures to create and verify images. 

Running containers as root or rootless :

Running the container tools such as podman, skopeo, or buildah as a user with superuser privilege (root user) is the best way to ensure that your containers have full access to any feature available on your system. However, with the feature called "Rootless Containers," generally available as of RHEL 8.1, you can work with containers as a regular user.

Although container engines, such as Docker, let you run docker commands as a regular (non-root) user, the docker daemon that carries out those requests runs as root. So, effectively, regular users can make requests through their containers that harm the system, without there being clarity about who made those requests. By setting up rootless container users, system administrators limit potentially damaging container activities from regular users, while still allowing those users to safely run many container features under their own accounts.
Also, note that Docker is a daemon-based container engine which allows us to deploy applications inside containers as shown in diagram docker-workflow. With the release of RHEL 8 and CentOS 8, docker package has been removed from their default package repositories, docker has been replaced with podman and buildah. If you are comfortable with docker and deploy most the applications inside the docker containers and does not want to switch to podman then there is a way to install and use community version of docker on CentOS 8 and RHEL 8 system by using the official Docker repository for CentOS7/RHEL7, which is a compatible clone.
Docker workflow

NOTE: Technology Preview features provide early access to upcoming product innovations, enabling you to test functionality and provide feedback during the development process. RHEL 8.2 provides access to technology previews of containerized versions of Buildah, a tool for building container images that comply with the Open Container Image (OCI) specification, and Skopeo, a tool that facilitates the movement of container images. Red Hat is adding Udica, a tool that makes it easier to create customized, container-centric SELinux security policies that reduce the risk that a process might “break out” of a container. RHEL 8.2 also introduces enhancements to the Red Hat Universal Base Image, which now supports OpenJDK and .NET 3.0, in addition to making it easier to access source code associated with a given image via a single command. That adds additional management and monitoring capabilities via updates to Red Hat Insights, which is provided to make it easier to define and monitor policies created by the IT organization, as well as reduce any drift from baselines initially defined by the IT team.
Podman installation on RHEL and small demo to illustrate with DB application:
Step 1: yum -y install podman
This command will install Podman and also its dependencies: atomic-registries, runC, skopeo-containers, and SELinux policies. Check this as shown below :
[root@IBMPOWER_sachin]# rpm -qa | grep podman
[root@IBMPOWER_sachin]# rpm -qa | grep skopeo
[root@IBMPOWER_sachin]# rpm -qa | grep runc

Step 2 : Command-line examples to create container and run RHEL container 
[root@IBMPOWER_sachin script]# podman run -it rhel sh
Trying to pull
Getting image source signatures
Copying blob feaa73091cc9 done
Copying blob e20f387c7bf5 done
Copying config 1a9b6d0a58 done
Writing manifest to image destination
Storing signatures

[root@IBMPOWER_sachin ~]# podman images
REPOSITORY                        TAG      IMAGE ID       CREATED       SIZE   latest   1a9b6d0a58f8   2 weeks ago   215 MB
[root@IBMPOWER_sachin ~]#

Step 3 : Install a containerized service for setting up a MariaDB database :
Run a MariaDB persistent container - MariaDB 10.2 with some custom variables and try to let its “data” be persistent.
podman pull
Trying to pull
Getting image source signatures
Copying blob 8574a8f8c7e5 done
Copying blob f60299098adf done
Copying blob 82a8f4ea76cb done
Copying blob a3ac36470b00 done
Copying config 66a314da15 done
Writing manifest to image destination
Storing signatures
[root@IBMPOWER_sachin ~]#
[root@IBMPOWER_sachin ~]# podman images
REPOSITORY                                           TAG      IMAGE ID       CREATED       SIZE   latest   66a314da15d6   11 days ago   453 MB                      latest   1a9b6d0a58f8   2 weeks ago   215 MB
[root@IBMPOWER_sachin ~]#

After you pull an image to your local system and before you run it, it is a good idea to investigate that image. Reasons for investigating an image before you run it include:
  •  Understanding what the image does
  •  Checking what software is inside the image
Example: Get information about  the “user ID running inside the container”, "ExposedPorts" and the “persistent volume location to attach“ ....etc as shown here:
podman inspect  | grep User
podman inspect | grep -A1 ExposedPorts
podman inspect | grep -A1 Volume


Step 4 : Set up a folder that will handle MariaDB’s data once we start our container:
[root@IBMPOWER_sachin ~]# mkdir /root/mysql-data
[root@IBMPOWER_sachin ~]# chown 27:27 /root/mysql-data
 Step 5: Run the container
[root@IBMPOWER_sachin ~]#  
podman run -d -v /root/mysql-data:/var/lib/mysql/data:Z -e MYSQL_USER=user -e MYSQL_PASSWORD=pass -e MYSQL_DATABASE=db -p 3306:3306
[root@IBMPOWER_sachin ~]# podman container list
CONTAINER ID  IMAGE                                                      COMMAND     CREATED        STATUS            PORTS                   NAMES
fd2d30f8ec72  run-mysqld  9 seconds ago  Up 9 seconds ago>3306/tcp  wizardly_jang
[root@IBMPOWER_sachin ~]#
Step 6:  check logs
[root@ ]# podman logs fd2d30f8ec72 | head
=> sourcing ...
=> sourcing ...
=> sourcing ...
---> 11:03:27     Processing basic MySQL configuration files ...
=> sourcing ...
=> sourcing ...
---> 11:03:27     Processing additional arbitrary  MySQL configuration provided by s2i ...
=> sourcing 40-paas.cnf ...
=> sourcing 50-my-tuning.cnf ...
---> 11:03:27     Initializing database ...
Step 7: That started and initialized its database . Lets create some table and check
[root@IBMPOWER_sachin ~]# podman exec -it fd2d30f8ec72 /bin/bash
bash-4.2$ mysql --user=user --password=pass -h -P 3306 -t
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.2.22-MariaDB MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show databases;
| Database           |
| db                 |
| information_schema |
| test               |
3 rows in set (0.00 sec) MariaDB [(none)]>  use test;
Database changed
MariaDB [test]> show tables;
Empty set (0.00 sec) MariaDB [test]> CREATE TABLE hpc_team (username VARCHAR(20), date DATETIME);
Query OK, 0 rows affected (0.00 sec) MariaDB [test]> INSERT INTO hpc_team (username, date) VALUES ('Aboorva', Now());
Query OK, 1 row affected (0.00 sec) MariaDB [test]> INSERT INTO hpc_team (username, date) VALUES ('Nysal', Now());
Query OK, 1 row affected (0.00 sec) MariaDB [test]> INSERT INTO hpc_team (username, date) VALUES ('Sachin', Now());
Query OK, 1 row affected (0.00 sec) MariaDB [test]> select * from hpc_team;
| username | date                |
| Aboorva  | 2020-05-26 11:12:41 |
| Nysal    | 2020-05-26 11:12:55 |
| Sachin   | 2020-05-26 11:13:08 |
3 rows in set (0.00 sec) MariaDB [test]> quit
bash-4.2$ ls
aria_log.00000001  db                ib_buffer_pool  ib_logfile1  ibtmp1             mysql               performance_schema  test
aria_log_control  ib_logfile0     ibdata1  mysql_upgrade_info  tc.log
bash-4.2$ cd test/
bash-4.2$ ls -alsrt
total 108
 4 drwxr-xr-x 6 mysql mysql  4096 May 26 11:03 ..
 4 -rw-rw---- 1 mysql mysql   483 May 26 11:12 hpc_team.frm
 4 drwx------ 2 mysql mysql  4096 May 26 11:12 .
96 -rw-rw---- 1 mysql mysql 98304 May 26 11:13 hpc_team.ibd
Step 8: Check DB folder from host machine :
[root@IBMPOWER_sachin mysql-data]# cd test/
[root@IBMPOWER_sachin test]# ls -alsrt
total 108
 4 drwxr-xr-x 6 27 27  4096 May 26 07:03 ..
 4 -rw-rw---- 1 27 27   483 May 26 07:12 hpc_team.frm
 4 drwx------ 2 27 27  4096 May 26 07:12 .
96 -rw-rw---- 1 27 27 98304 May 26 07:13 hpc_team.ibd
[root@IBMPOWER_sachin test]#

Step 9: We can set up our systemd unit file for handling the database. We’ll use a unit file as shown below:
cat /etc/systemd/system/mariadb-service.service
Description=Custom MariaDB Podman Container
ExecStartPre=-/usr/bin/podman rm "mariadb-service"
ExecStart=/usr/bin/podman run --name mariadb-service -v /root/mysql-data:/var/lib/mysql/data:Z -e MYSQL_USER=user -e MYSQL_PASSWORD=pass -e MYSQL_DATABASE=db -p 3306:3306 --net host
ExecReload=-/usr/bin/podman stop "mariadb-service"
ExecReload=-/usr/bin/podman rm "mariadb-service"
ExecStop=-/usr/bin/podman stop "mariadb-service"