Wednesday, April 19, 2023

Kubernetes - decommissioning a node from the cluster

 Kubernetes cluster is a group of nodes that are used to run containerized applications and services. The cluster consists of a control plane, which manages the overall state of the cluster, and worker nodes, which run the containerized applications.

The control plane is responsible for managing the configuration and deployment of applications on the cluster, as well as monitoring and scaling the cluster as needed. It includes components such as the Kubernetes API server, the etcd datastore, the kube-scheduler, and the kube-controller-manager.

The worker nodes are responsible for running the containerized applications and services. Each node typically runs a container runtime, such as Docker or containerd, as well as a kubelet process that communicates with the control plane to manage the containers running on the node.

In a Kubernetes cluster, applications are deployed as pods, which are the smallest deployable units in Kubernetes. Pods contain one or more containers, and each pod runs on a single node in the cluster. Kubernetes manages the deployment and scaling of the pods across the cluster, ensuring that the workload is evenly distributed and resources are utilized efficiently.

In Kubernetes, the native scheduler is a built-in component responsible for scheduling pods onto worker nodes in the cluster. When a new pod is created, the scheduler evaluates the resource requirements of the pod, along with any constraints or preferences specified in the pod's definition, and selects a node in the cluster where the pod can be scheduled. The native scheduler uses a combination of heuristics and policies to determine the best node for each pod. It considers factors such as the available resources on each node, the affinity and anti-affinity requirements of the pod, any node selectors or taints on the nodes, and the current state of the cluster. The native scheduler in Kubernetes is highly configurable and can be customized to meet the specific needs of different workloads. For example, you can configure the scheduler to prioritize certain nodes in the cluster over others, or to balance the workload evenly across all available nodes.

[sachinpb@remotehostn18 ~]$ kubectl get pods -n kube-system | grep kube-scheduler
kube-scheduler-remotehost18                       1/1     Running            11                  398d

kubectl cordon is a command in Kubernetes that is used to mark a node as unschedulable. This means that Kubernetes will no longer schedule any new pods on the node, but will continue to run any existing pods on the node.

The kubectl cordon command is useful when you need to take a node offline for maintenance or other reasons, but you want to ensure that the existing pods on the node continue to run until they can be safely moved to other nodes in the cluster. By marking the node as unschedulable, you can prevent Kubernetes from scheduling any new pods on the node, which helps to ensure that the overall health and stability of the cluster is maintained.

[sachinpb@remotenode18 ~]$ kubectl get nodes
NAME          STATUS                      ROLES                   AGE    VERSION
remotenode01   Ready                      worker                  270d    v1.23.4
remotenode02   Ready                      worker                  270d    v1.23.4
remotenode03   Ready                      worker                  270d    v1.23.4
remotenode04   Ready                      worker                  81d      v1.23.4
remotenode07   Ready                      worker                  389d    v1.23.4
remotenode08   Ready                      worker                  389d    v1.23.4
remotenode09   Ready                      worker                  389d    v1.23.4
remotenode14   Ready                      worker                  396d    v1.23.4
remotenode15   Ready                      worker                  81d     v1.23.4
remotenode16   Ready                      worker                 396d    v1.23.4
remotenode17   Ready                      worker                 396d    v1.23.4
remotenode18   Ready                      control-plane,master    398d   v1.23.4

[sachinpb@remotenode18 ~]$ kubectl cordon remotenode16
node/remotenode16 cordoned
[sachinpb@remotenode18 ~]$  kubectl uncordon remotenode16
node/remotenode16 uncordoned

[sachinpb@remotenode18 ~]$ kubectl cordon remotenode16
node/remotenode16 cordoned
[sachinpb@remotenode18 ~]$ kubectl get nodes
NAME    STATUS                             ROLES                  AGE    VERSION
remotenode01   Ready                      worker                  270d   v1.23.4
remotenode02   Ready                      worker                  270d   v1.23.4
remotenode03   Ready                      worker                  270d   v1.23.4
remotenode04   Ready                      worker
remotenode07   Ready                      worker                  389d   v1.23.4
remotenode08   Ready                      worker                  389d   v1.23.4
remotenode09   Ready                      worker                  389d   v1.23.4
remotenode14   Ready                      worker                  396d   v1.23.4
remotenode15   Ready                      worker                  81d    v1.23.4
remotenode16   Ready,
SchedulingDisabled   worker                  396d   v1.23.4
remotenode17   Ready                      worker                  396d   v1.23.4
remotenode18   Ready                      control-plane,master    398d   v1.23.4

[sachinpb@remotenode18 ~]$ 

After the node has been cordoned off, you can use the kubectl drain command to safely and gracefully terminate any running pods on the node and reschedule them onto other available nodes in the cluster. Once all the pods have been moved, the node can then be safely removed from the cluster.

kubectl drain is a command in Kubernetes that is used to gracefully remove a node from a cluster. This is typically used when performing maintenance on a node, such as upgrading or replacing hardware, or when decommissioning a node from the cluster.

Source

[sachinpb@remotenode18 ~]$ kubectl drain --ignore-daemonsets remote16
node/remote16 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-j749l, kube-system/fuse-device-plugin-daemonset-59lrp, kube-system/kube-proxy-v26k2, kube-system/nvidia-device-plugin-daemonset-w2k57, kube-system/rdma-shared-dp-ds-zdpfw, sys-monitor/prometheus-op-prometheus-node-exporter-rh4db
node/remote16 drained
[sachinpb@remotenode18 ~]$

By default kubectl drain is non-destructive, you have to override to change that behaviour. It runs with the following defaults:

  --delete-local-data=false
  --force=false
  --grace-period=-1 (Period of time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used.)
  --ignore-daemonsets=false
  --timeout=0s

Each of these safeguard deals with a different category of potential destruction (local data, bare pods, graceful termination, daemonsets). It also respects pod disruption budgets to adhere to workload availability. Any non-bare pod will be recreated on a new node by its respective controller (e.g. daemonset controller, replication controller). It's up to you whether you want to override that behaviour (for example you might have a bare pod if running jenkins job. If you override by setting --force=true it will delete that pod and it won't be recreated). If you don't override it, the node will be in drain mode indefinitely (--timeout=0s))

Source

When a node is drained, Kubernetes will automatically reschedule any running pods onto other available nodes in the cluster, ensuring that the workload is not interrupted. The kubectl drain command ensures that the node is cordoned off, meaning no new pods will be scheduled on it, and then gracefully terminates any running pods on the node. This helps to ensure that the pods are shut down cleanly, allowing them to complete any in-progress tasks and save any data before they are terminated.

After the pods have been rescheduled, the node can then be safely removed from the cluster. This helps to ensure that the overall health and stability of the cluster is maintained, even when individual nodes need to be taken offline for maintenance or other reasons

When kubectl drain returns successfully, that indicates that all of the pods have been safely evicted. It is then safe to bring down the node. After maintenance work we can use kubectl uncordon to tell Kubernetes that it can resume scheduling new pods onto the node.

[sachinpb@remotenode18 ~]$  kubectl uncordon remotenode16
node/remotenode16 uncordoned

Let's try  all the above steps  and see :

1) Retrieve information from a Kubernetes cluster

[sachinpb@remotenode18 ~]$ kubectl get nodes
NAME          STATUS                      ROLES                   AGE    VERSION
remotenode01   Ready                      worker                  270d    v1.23.4
remotenode02   Ready                      worker                  270d    v1.23.4
remotenode03   Ready                      worker                  270d    v1.23.4
remotenode04   Ready                      worker                  81d      v1.23.4
remotenode07   Ready                      worker                  389d    v1.23.4
remotenode08   Ready                      worker                  389d    v1.23.4
remotenode09   Ready                      worker                  389d    v1.23.4
remotenode14   Ready                      worker                  396d    v1.23.4
remotenode15   Ready                      worker                  81d     v1.23.4
remotenode16   Ready                      worker                 396d    v1.23.4
remotenode17   Ready                      worker                 396d    v1.23.4
remotenode18   Ready                      control-plane,master    398d   v1.23.4

--------------------------------

2) Kubernetes cordon is an operation that marks or taints a node in your existing node pool as unschedulable.

[sachinpb@remotenode18 ~]$ kubectl cordon remotenode16
node/remotenode16 cordoned
[sachinpb@remotenode18 ~]$

[sachinpb@remotenode18 ~]$ kubectl get nodes
NAME          STATUS                      ROLES                   AGE    VERSION
remotenode01   Ready                      worker                  270d    v1.23.4
remotenode02   Ready                      worker                  270d    v1.23.4
remotenode03   Ready                      worker                  270d    v1.23.4
remotenode04   Ready                      worker                  81d      v1.23.4
remotenode07   Ready                      worker                  389d    v1.23.4
remotenode08   Ready                      worker                  389d    v1.23.4
remotenode09   Ready                      worker                  389d    v1.23.4
remotenode14   Ready                      worker                  396d    v1.23.4
remotenode15   Ready                      worker                  81d     v1.23.4
remotenode16   Ready,SchedulingDisabled   worker                 396d    v1.23.4
remotenode17   Ready                      worker                 396d    v1.23.4
remotenode18   Ready                      control-plane,master    398d   v1.23.4

3) Drain node in preparation for maintenance. The given node will be marked unschedulable to prevent new pods from arriving. Then drain deletes all pods


[sachinpb@remotenode18 ~]$ kubectl drain remotenode16 --grace-period=2400
node/remotenode16 already cordoned
error: unable to drain node "remotenode16" due to error:cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-j749l, kube-system/fuse-device-plugin-daemonset-59lrp, kube-system/kube-proxy-v26k2, kube-system/nvidia-device-plugin-daemonset-w2k57, kube-system/rdma-shared-dp-ds-zdpfw, sys-monitor/prometheus-op-prometheus-node-exporter-rh4db, continuing command...
There are pending nodes to be drained:
 remotenode16
cannot delete DaemonSet-managed Pods (use --ignore-daemonsets to ignore): kube-system/calico-node-j749l, kube-system/fuse-device-plugin-daemonset-59lrp, kube-system/kube-proxy-v26k2, kube-system/nvidia-device-plugin-daemonset-w2k57, kube-system/rdma-shared-dp-ds-zdpfw, sys-monitor/prometheus-op-prometheus-node-exporter-rh4db
[sachinpb@remotenode18 ~]$

NOTE:

The given node will be marked unschedulable to prevent new pods from arriving. Then drain deletes all pods except mirror pods (which cannot be deleted through the API server). If there are DaemonSet-managed pods, drain will not proceed without –ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings. If there are any pods that are neither mirror pods nor managed–by ReplicationController, DaemonSet or Job–, then drain will not delete any pods unless you use –force.

----------------------------

4) Drain node  with --ignore-daemonsets 

[sachinpb@remotenode18 ~]$ kubectl drain --ignore-daemonsets remotenode16 --grace-period=2400
node/remotenode16 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-j749l, kube-system/fuse-device-plugin-daemonset-59lrp, kube-system/kube-proxy-v26k2, kube-system/nvidia-device-plugin-daemonset-w2k57, kube-system/rdma-shared-dp-ds-zdpfw, sys-monitor/prometheus-op-prometheus-node-exporter-rh4db
node/remotenode16 drained

----------------------

5) Uncordon will mark the node as schedulable.

[sachinpb@remotenode18 ~]$ kubectl uncordon remotenode16
node/remotenode16 uncordoned
[sachinpb@remotenode18 ~]$

-----------------

6) Retrieve information from a Kubernetes cluster

[sachinpb@remotenode18 ~]$ kubectl get nodes
NAME          STATUS                      ROLES                   AGE    VERSION
remotenode01   Ready                      worker                  270d    v1.23.4
remotenode02   Ready                      worker                  270d    v1.23.4
remotenode03   Ready                      worker                  270d    v1.23.4
remotenode04   Ready                      worker                  81d      v1.23.4
remotenode07   Ready                      worker                  389d    v1.23.4
remotenode08   Ready                      worker                  389d    v1.23.4
remotenode09   Ready                      worker                  389d    v1.23.4
remotenode14   Ready                      worker                  396d    v1.23.4
remotenode15   Ready                      worker                  81d     v1.23.4
remotenode16   Ready                      worker                 396d    v1.23.4
remotenode17   Ready                      worker                 396d    v1.23.4
remotenode18   Ready                      control-plane,master    398d   v1.23.4

How to automate above process creating Jenkins pipeline job to cordon ,drain and uncordon the nodes with the help of groovy script:

-------------------------Sample groovy script--------------------------------

node("Kubernetes-master-node") {
    stage("1") {
        sh 'hostname'
        sh 'cat $SACHIN_HOME/manual//hostfile'
        k8s_cordon_drain()
        k8s_uncordon()      
    }    
}

/*
* CI -Kubernetes cluster : This function will cordon/drain the worker nodes in hostfile 

*/
def k8s_cordon_drain() {
  def maxTries = 3 // the maximum number of times to retry the kubectl commands
  def sleepTime = 5 * 1000 // the amount of time to wait between retries (in milliseconds)
  def filename = '$SACHIN_HOME/manual/hostfile'
  def content = readFile(filename)
  def hosts = content.readLines().collect { it.split()[0] }
  println "List of Hostnames to be cordoned from K8s cluster: ${hosts}"
  hosts.each { host ->
    def command1 = "kubectl cordon $host"
    def command2 = "kubectl drain --ignore-daemonsets --grace-period=2400 $host"
    def tries = 0
    def result1 = null
    def result2 = null
    while (tries < maxTries) {
      result1 = sh(script: command1, returnStatus: true)
      if (result1 == 0) {
        println "Successfully cordoned $host"
        break
      } else {
        tries++
        println "Failed to cordoned $host (attempt $tries/$maxTries), retrying in ${sleepTime/1000} seconds..."
        sleep(sleepTime)
      }
    }
    if (result1 == 0) {
      tries = 0
      while (tries < maxTries) {
        result2 = sh(script: command2, returnStatus: true)
        if (result2 == 0) {
          println "Successfully drained $host"
          break
        } else {
          tries++
          println "Failed to drain $host (attempt $tries/$maxTries), retrying in ${sleepTime/1000} seconds..."
          sleep(sleepTime)
        }
      }
    }

    if (result2 != 0) {
      println "Failed to drain $host after $maxTries attempts"
    }
  }
}

/*
* CI - Kubernetes cluster : This function will uncordon the worker nodes in hostfile 

*/
def k8s_uncordon() {
  def maxTries = 3 // the maximum number of times to retry the kubectl commands
  def sleepTime = 5 * 1000 // the amount of time to wait between retries (in milliseconds)
  def filename = '$SACHIN_HOME/manual/hostfile'
  def content = readFile(filename)
  def hosts = content.readLines().collect { it.split()[0] }
  println "List of Hostnames to be uncordoned from K8s cluster: ${hosts}"
  hosts.each { host ->
    def command1 = "kubectl uncordon $host"
    def tries = 0
    def result1 = null
    while (tries < maxTries) {
      result1 = sh(script: command1, returnStatus: true)
      if (result1 == 0) {
        println "Successfully cordoned $host"
        break
      } else {
        tries++
        println "Failed to uncordon $host (attempt $tries/$maxTries), retrying in ${sleepTime/1000} seconds..."
        sleep(sleepTime)
      }
    }
    if (result1 != 0) {
      println "Failed to uncordon $host after $maxTries attempts"
    }
  }
}

                                                                                                                                                                                                                                    ------------------Jenkins Console output for pipeline job -----------------

                                                                                                                                                                                                                                    Started by user jenkins-admin
                                                                                                                                                                                                                                    [Pipeline] Start of Pipeline
                                                                                                                                                                                                                                    [Pipeline] node
                                                                                                                                                                                                                                    Running on Kubernetes-master-node in $SACHIN_HOME/workspace/test_sample4_cordon_drain
                                                                                                                                                                                                                                    [Pipeline] {
                                                                                                                                                                                                                                    [Pipeline] stage
                                                                                                                                                                                                                                    [Pipeline] { (1)
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + hostname
                                                                                                                                                                                                                                    kubernetes-master-node
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + cat $SACHIN_HOME/manual//hostfile
                                                                                                                                                                                                                                    Remotenode16 slots=4
                                                                                                                                                                                                                                    Remotenode17 slots=4
                                                                                                                                                                                                                                    [Pipeline] readFile
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    List of Hostnames to be cordoned from K8s cluster: [Remotenode16, Remotenode17]
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl cordon Remotenode16
                                                                                                                                                                                                                                    node/Remotenode16 cordoned
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully cordoned Remotenode16
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl drain --ignore-daemonsets --grace-period=2400 Remotenode16
                                                                                                                                                                                                                                    node/Remotenode16 already cordoned
                                                                                                                                                                                                                                    WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-j749l, kube-system/fuse-device-plugin-daemonset-59lrp, kube-system/kube-proxy-v26k2, kube-system/nvidia-device-plugin-daemonset-w2k57, kube-system/rdma-shared-dp-ds-zdpfw, sys-monitor/prometheus-op-prometheus-node-exporter-rh4db
                                                                                                                                                                                                                                    node/Remotenode16 drained
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully drained Remotenode16
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl cordon Remotenode17
                                                                                                                                                                                                                                    node/Remotenode17 cordoned
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully cordoned Remotenode17
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl drain --ignore-daemonsets --grace-period=2400 Remotenode17
                                                                                                                                                                                                                                    node/Remotenode17 already cordoned
                                                                                                                                                                                                                                    WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-hz5zh, kube-system/fuse-device-plugin-daemonset-dj72m, kube-system/kube-proxy-g87dc, kube-system/nvidia-device-plugin-daemonset-tk5x8, kube-system/rdma-shared-dp-ds-n4g5w, sys-monitor/prometheus-op-prometheus-node-exporter-gczmz
                                                                                                                                                                                                                                    node/Remotenode17 drained
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully drained Remotenode17
                                                                                                                                                                                                                                    [Pipeline] readFile
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    List of Hostnames to be uncordoned from K8s cluster: [Remotenode16, Remotenode17]
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl uncordon Remotenode16
                                                                                                                                                                                                                                    node/Remotenode16 uncordoned
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully cordoned Remotenode16
                                                                                                                                                                                                                                    [Pipeline] sh
                                                                                                                                                                                                                                    + kubectl uncordon Remotenode17
                                                                                                                                                                                                                                    node/Remotenode17 uncordoned
                                                                                                                                                                                                                                    [Pipeline] echo
                                                                                                                                                                                                                                    Successfully cordoned Remotenode17
                                                                                                                                                                                                                                    [Pipeline] }
                                                                                                                                                                                                                                    [Pipeline] // stage
                                                                                                                                                                                                                                    [Pipeline] }
                                                                                                                                                                                                                                    [Pipeline] // node
                                                                                                                                                                                                                                    [Pipeline] End of Pipeline
                                                                                                                                                                                                                                    Finished: SUCCESS

                                                                                                                                                                                                                                    -----------------------------------------------------------------

                                                                                                                                                                                                                                    Reference:

                                                                                                                                                                                                                                    https://kubernetes.io/docs/home/

                                                                                                                                                                                                                                    Thursday, April 13, 2023

                                                                                                                                                                                                                                    IBM Spectrum Symphony and LSF with Apache Hadoop

                                                                                                                                                                                                                                    IBM Spectrum Symphony (formerly known as IBM Platform Symphony) is a high-performance computing (HPC) and grid computing software platform that enables organizations to process large amounts of data and run compute-intensive applications at scale. It provides a distributed computing infrastructure that can be used for a wide range of data-intensive workloads, such as scientific simulations, financial modeling, and big data analytics. IBM Spectrum Symphony is a parallel services middleware and cluster manager. It is widely used in banks for risk analytics, data analytics in a shared, multi-user, multi-application, multi-job environment. IBM Spectrum Symphony also works with IBM Spectrum LSF (for batch workloads) in the same cluster to allow both batch and parallel services workloads to share the same cluster.

                                                                                                                                                                                                                                    Some of the key features of IBM Spectrum Symphony include:

                                                                                                                                                                                                                                    1. Distributed computing: The platform allows organizations to distribute computing workloads across a large number of nodes, which can be located in different data centers or cloud environments.
                                                                                                                                                                                                                                    2. Resource management: IBM Spectrum Symphony provides a resource management framework that allows organizations to allocate and manage compute, storage, and network resources more efficiently.
                                                                                                                                                                                                                                    3. High availability: The platform is designed to provide high availability and fault tolerance, ensuring that applications can continue to run even if individual nodes or components fail.
                                                                                                                                                                                                                                    4. Performance optimization: IBM Spectrum Symphony includes a range of performance optimization features, such as load balancing and data caching, which can help organizations to achieve faster processing times and better overall performance.
                                                                                                                                                                                                                                    5. Support for multiple programming languages: The platform supports a wide range of programming languages, including Java, Python, and C++, which makes it easy for developers to build and deploy applications on the platform.

                                                                                                                                                                                                                                    IBM Spectrum LSF (Load Sharing Facility) is another software platform that is often used in conjunction with IBM Spectrum Symphony to manage and optimize workloads in a distributed computing environment. LSF provides a range of features for resource management, workload scheduling, and job prioritization, which can help organizations to improve performance and efficiency.

                                                                                                                                                                                                                                    When used together, IBM Spectrum Symphony and IBM Spectrum LSF can provide a comprehensive solution for managing and optimizing large-scale distributed computing environments. IBM Spectrum Symphony provides the distributed computing infrastructure and application management capabilities, while IBM Spectrum LSF provides the workload management and optimization features.

                                                                                                                                                                                                                                    Some of the key features of LSF that complement IBM Spectrum Symphony include:
                                                                                                                                                                                                                                    1. Advanced job scheduling: LSF provides sophisticated job scheduling capabilities, allowing organizations to prioritize and schedule jobs based on a wide range of criteria, such as resource availability, job dependencies, and user priorities.
                                                                                                                                                                                                                                    2. Resource allocation: LSF can manage the allocation of resources, ensuring that jobs are run on the most appropriate nodes and that resources are used efficiently.
                                                                                                                                                                                                                                    3. Job monitoring: LSF provides real-time monitoring of job progress and resource usage, allowing organizations to quickly identify and resolve issues that may impact performance.
                                                                                                                                                                                                                                    4. Integration with other tools: LSF can be integrated with a wide range of other HPC tools and applications, including IBM Spectrum Symphony, providing a seamless workflow for managing complex computing workloads.
                                                                                                                                                                                                                                    Integrating LSF with Hadoop can help organizations to optimize the use of their resources and achieve better performance when running Hadoop workloads.  
                                                                                                                                                                                                                                    Apache Hadoop ("Hadoop") is a framework for large-scale distributed data storage and processing on computer clusters that uses the Hadoop Distributed File System ("HDFS") for the data storage and MapReduce programming model for the data processing. Since MapReduce workloads might only represent a small fraction of overall workload, but typically requires their own standalone environment, MapReduce is difficult to support within traditional HPC clusters. However, HPC clusters typically use parallel file systems that are sufficient for initial MapReduce workloads, so you can run MapReduce workloads as regular parallel jobs running in an HPC cluster environment. Use the IBM Spectrum LSF integration with Apache Hadoop to submit Hadoop MapReduce workloads as regular LSF parallel jobs.
                                                                                                                                                                                                                                    To run your Hadoop application through LSF, submit it as an LSF job. Once the LSF job starts to run, the Hadoop connector script (lsfhadoop.sh) automatically provisions an open source Hadoop cluster within LSF allocated resources, then submits actual MapReduce workloads into this Hadoop cluster. Since each LSF Hadoop job has its own resource (cluster), the integration provides a multi-tenancy environment to allow multiple users to share the common pool of HPC cluster resources. LSF is able to collect resource usage of MapReduce workloads as normal LSF parallel jobs and has full control of the job life cycle. After the job is complete, LSF shuts down the Hadoop cluster.

                                                                                                                                                                                                                                    By default, the Apache Hadoop integration configures the Hadoop cluster with direct access to shared file systems and does not require HDFS. This allows you to use existing file systems in your HPC cluster without having to immediately invest in a new file system. Through the existing shared file system, data can be stored in common share locations, which avoids the typical data stage-in and stage-out steps with HDFS.

                                                                                                                                                                                                                                    The general steps to integrate LSF with Hadoop:
                                                                                                                                                                                                                                    1. Install and configure LSF: The first step is to install and configure LSF on the Hadoop cluster. This involves setting up LSF daemons on the cluster nodes and configuring LSF to work with the Hadoop Distributed File System (HDFS).
                                                                                                                                                                                                                                    2. Configure Hadoop for LSF: Hadoop needs to be configured to use LSF as its resource manager. This involves setting the yarn.resourcemanager.scheduler.class property in the Hadoop configuration file to com.ibm.platform.lsf.yarn.LSFYarnScheduler.
                                                                                                                                                                                                                                    3. Configure LSF for Hadoop: LSF needs to be configured to work with Hadoop by setting up the necessary environment variables and resource limits. This includes setting the LSF_SERVERDIR and LSF_LIBDIR environment variables to the LSF installation directory and configuring LSF resource limits to ensure that Hadoop jobs have access to the necessary resources.
                                                                                                                                                                                                                                    4. Submit Hadoop jobs to LSF: Hadoop jobs can be submitted to LSF using the yarn command-line tool with the -Dmapreduce.job.submithostname and -Dmapreduce.job.queuename options set to the LSF submit host and queue, respectively.
                                                                                                                                                                                                                                    5. Monitor Hadoop jobs in LSF: LSF provides a web-based user interface and command-line tools for monitoring and managing Hadoop jobs running on the cluster. This allows users to monitor job progress, resource usage, and other metrics, and to take corrective action if necessary.

                                                                                                                                                                                                                                    LSF can be used as a standalone workload management software for Hadoop clusters, without the need for IBM Spectrum Symphony. LSF provides advanced job scheduling and resource management capabilities, which can be used to manage and optimize Hadoop workloads running on large HPC clusters. By integrating LSF with Hadoop, organizations can ensure that Hadoop jobs have access to the necessary resources and are scheduled and managed efficiently, improving overall performance and resource utilization.

                                                                                                                                                                                                                                    In addition,  IBM Spectrum Symphony provides additional capabilities beyond workload management, such as distributed computing infrastructure, data movement, and integration with other data center software. If an organization requires these additional capabilities, they may choose to use IBM Spectrum Symphony alongside LSF for even greater benefits. But LSF can be used independently as a workload manager for Hadoop clusters.

                                                                                                                                                                                                                                    Submitting LSF jobs to a Hadoop cluster involves creating an LSF job script that launches the Hadoop job and then submitting the job to LSF using the bsub command. . LSF will then schedule the job to run on the cluster. To submit LSF jobs to a Hadoop cluster, you need to follow these general steps:
                                                                                                                                                                                                                                    1. Write the Hadoop job: First, you need to write the Hadoop job that you want to run on the cluster. This can be done using any of the Hadoop APIs, such as MapReduce, Spark, or Hive.
                                                                                                                                                                                                                                    2. Create the LSF job script: Next, you need to create an LSF job script that will launch the Hadoop job on the cluster. This script will typically include the Hadoop command to run the job, along with any necessary environment variables, resource requirements, and other LSF-specific settings.
                                                                                                                                                                                                                                    3. Submit the LSF job: Once the job script is ready, you can submit it to LSF using the bsub command. This will add the job to the LSF queue and wait for available resources to run the job.
                                                                                                                                                                                                                                    4. Monitor the job: LSF provides several tools for monitoring and managing jobs running on the cluster, such as the bjobs command and the LSF web interface. You can use 
                                                                                                                                                                                                                                    Example 1:  bsub command that can be used to submit a Hadoop job to an LSF-managed Hadoop cluster:

                                                                                                                                                                                                                                    bsub -J my_hadoop_job -oo my_hadoop_job.out -eo my_hadoop_job.err -R "rusage[mem=4096]" -q hadoop_queue hadoop jar my_hadoop_job.jar input_dir output_dir

                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -J: Specifies a name for the job. In this case, we're using "my_hadoop_job" as the job name.

                                                                                                                                                                                                                                    -oo: Redirects the standard output of the job to a file. In this case, we're using "my_hadoop_job.out" as the output file.

                                                                                                                                                                                                                                    -eo: Redirects the standard error of the job to a file. In this case, we're using "my_hadoop_job.err" as the error file.

                                                                                                                                                                                                                                    -R: Specifies resource requirements for the job. In this case, we're requesting 4 GB of memory (mem=4096) for the job.

                                                                                                                                                                                                                                    -q: Specifies the LSF queue to submit the job to. In this case, we're using the "hadoop_queue" LSF queue.

                                                                                                                                                                                                                                    After the bsub command options, we specify the Hadoop command to run the job (hadoop jar my_hadoop_job.jar) and the input and output directories for the job (input_dir and output_dir). This will submit the Hadoop job to LSF, which will then schedule and manage the job on the Hadoop cluster.  For more details please refer these links.

                                                                                                                                                                                                                                    Example 2:  How to submit a Hadoop job using bsub command with LSF? 

                                                                                                                                                                                                                                    bsub -q hadoop -J "Hadoop Job" -n 10 -o hadoop.log -hadoop /path/to/hadoop/bin/hadoop jar /path/to/hadoop/examples.jar pi 10 1000

                                                                                                                                                                                                                                    This command will submit a Hadoop job to the LSF scheduler and allocate resources as necessary based on the job's requirements.

                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q hadoop specifies that the job should be submitted to the Hadoop queue.
                                                                                                                                                                                                                                    -J "Hadoop Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 10 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o hadoop.log specifies the name of the output log file.
                                                                                                                                                                                                                                    -hadoop specifies that the command that follows should be executed on a Hadoop cluster.
                                                                                                                                                                                                                                    /path/to/hadoop/bin/hadoop specifies the path to the Hadoop executable.
                                                                                                                                                                                                                                    jar /path/to/hadoop/examples.jar pi 10 1000 specifies the command to run the Hadoop job, which in this case is the pi example program with 10 mappers and 1000 samples.

                                                                                                                                                                                                                                    Example 3: How to submit a wordcount MapReduce job using bsub with LSF ?

                                                                                                                                                                                                                                    bsub -q hadoop -J "MapReduce Job" -n 10 -o mapreduce.log -hadoop /path/to/hadoop/bin/hadoop jar /path/to/hadoop/examples.jar wordcount /input/data /output/data

                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q hadoop specifies that the job should be submitted to the Hadoop queue.
                                                                                                                                                                                                                                    -J "MapReduce Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 10 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o mapreduce.log specifies the name of the output log file.
                                                                                                                                                                                                                                    -hadoop specifies that the command that follows should be executed on a Hadoop cluster.
                                                                                                                                                                                                                                    /path/to/hadoop/bin/hadoop specifies the path to the Hadoop executable.
                                                                                                                                                                                                                                    jar /path/to/hadoop/examples.jar wordcount /input/data /output/data specifies the command to run the MapReduce job, which in this case is the wordcount example program with input data in /input/data and output data in /output/data.

                                                                                                                                                                                                                                    Example 4: How to submit a terasort MapReduce job using bsub with LSF?

                                                                                                                                                                                                                                    bsub -q hadoop -J "MapReduce Job" -n 20 -o mapreduce.log -hadoop /path/to/hadoop/bin/hadoop jar /path/to/hadoop/examples.jar terasort -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=50 /input/data /output/data
                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q hadoop specifies that the job should be submitted to the Hadoop queue.
                                                                                                                                                                                                                                    -J "MapReduce Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 20 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o mapreduce.log specifies the name of the output log file.
                                                                                                                                                                                                                                    -hadoop specifies that the command that follows should be executed on a Hadoop cluster.
                                                                                                                                                                                                                                    /path/to/hadoop/bin/hadoop specifies the path to the Hadoop executable.
                                                                                                                                                                                                                                    jar /path/to/hadoop/examples.jar terasort -Dmapred.map.tasks=100 -Dmapred.reduce.tasks=50 /input/data /output/data specifies the command to run the MapReduce job, which in this case is the terasort example program with input data in /input/data and output data in /output/data, and specific configuration parameters to control the number of map and reduce tasks.

                                                                                                                                                                                                                                    Example 5: How to submit a grep MapReduce job using bsub with LSF?

                                                                                                                                                                                                                                    bsub -q hadoop -J "MapReduce Job" -n 30 -o mapreduce.log -hadoop /path/to/hadoop/bin/hadoop jar /path/to/hadoop/examples.jar grep -input /input/data -output /output/data -regex "example.*"
                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q hadoop specifies that the job should be submitted to the Hadoop queue.
                                                                                                                                                                                                                                    -J "MapReduce Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 30 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o mapreduce.log specifies the name of the output log file.
                                                                                                                                                                                                                                    -hadoop specifies that the command that follows should be executed on a Hadoop cluster.
                                                                                                                                                                                                                                    /path/to/hadoop/bin/hadoop specifies the path to the Hadoop executable.
                                                                                                                                                                                                                                    jar /path/to/hadoop/examples.jar grep -input /input/data -output /output/data -regex "example.*" specifies the command to run the MapReduce job, which in this case is the grep example program with input data in /input/data, output data in /output/data, and a regular expression pattern to search for.

                                                                                                                                                                                                                                    Example 6: How to submit a non MapReduce hadoop job using bsub with LSF?

                                                                                                                                                                                                                                    bsub -q hadoop -J "Hadoop Job" -n 10 -o hadoopjob.log -hadoop /path/to/hadoop/bin/hadoop fs -rm -r /path/to/hdfs/directory

                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q hadoop specifies that the job should be submitted to the Hadoop queue.
                                                                                                                                                                                                                                    -J "Hadoop Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 10 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o hadoopjob.log specifies the name of the output log file.
                                                                                                                                                                                                                                    -hadoop specifies that the command that follows should be executed on a Hadoop cluster.
                                                                                                                                                                                                                                    /path/to/hadoop/bin/hadoop fs -rm -r /path/to/hdfs/directory specifies the command to run the Hadoop job, which in this case is to remove a directory in HDFS at /path/to/hdfs/directory.
                                                                                                                                                                                                                                    This command will submit a non-MapReduce Hadoop job to the LSF scheduler and allocate resources as necessary based on the job's requirement


                                                                                                                                                                                                                                    Example 7: If you have a Hadoop cluster with YARN and Spark installed, you can submit Spark jobs to the cluster using bsub as shown in the example.

                                                                                                                                                                                                                                    bsub -q normal -J "Spark Job" -n 20 -o sparkjob.log /path/to/spark/bin/spark-submit --class com.example.MyApp --master yarn --deploy-mode cluster /path/to/my/app.jar arg1 arg2
                                                                                                                                                                                                                                    where:
                                                                                                                                                                                                                                    -q normal specifies that the job should be submitted to the normal queue.
                                                                                                                                                                                                                                    -J "Spark Job" specifies a name for the job.
                                                                                                                                                                                                                                    -n 20 specifies the number of cores to use for the job.
                                                                                                                                                                                                                                    -o sparkjob.log specifies the name of the output log file.
                                                                                                                                                                                                                                    /path/to/spark/bin/spark-submit specifies the path to the spark-submit script.
                                                                                                                                                                                                                                    --class com.example.MyApp specifies the main class of the Spark application.
                                                                                                                                                                                                                                    --master yarn --deploy-mode cluster specifies the mode to run the application in.
                                                                                                                                                                                                                                    /path/to/my/app.jar arg1 arg2 specifies the path to the application jar file and its arguments.

                                                                                                                                                                                                                                    The above example does not explicitly require Hadoop to be installed or used. However, it assumes that the Spark cluster is running in YARN mode, which is typically used in a Hadoop cluster. In general, Spark can be run in various modes, including standalone, YARN, and Mesos. There are various other parameters and configurations that can be specified. Some examples include:
                                                                                                                                                                                                                                    --num-executors: Specifies the number of executor processes to use for the job.
                                                                                                                                                                                                                                    --executor-cores: Specifies the number of cores to allocate per executor.
                                                                                                                                                                                                                                    --executor-memory: Specifies the amount of memory to allocate per executor.
                                                                                                                                                                                                                                    --driver-memory: Specifies the amount of memory to allocate for the driver process.
                                                                                                                                                                                                                                    --queue: Specifies the YARN queue to submit the job to.
                                                                                                                                                                                                                                    --files: Specifies a comma-separated list of files to be distributed with the job.
                                                                                                                                                                                                                                    --archives: Specifies a comma-separated list of archives to be distributed with the job.

                                                                                                                                                                                                                                    These parameters can be used to fine-tune the resource allocation and performance of Spark jobs in a Hadoop cluster. Additionally, there are other options that can be used to configure the behavior of the Spark application itself, such as --conf to specify Spark configuration options and --jars to specify external JAR files to be used by the application

                                                                                                                                                                                                                                    Here is an example LSF configuration file (lsf.conf) that includes settings for running Spark applications:
                                                                                                                                                                                                                                    # LSF Configuration File
                                                                                                                                                                                                                                    # Spark settings
                                                                                                                                                                                                                                    LSB_JOB_REPORT_MAIL=N
                                                                                                                                                                                                                                    LSB_DEFAULTGROUP=spark
                                                                                                                                                                                                                                    LSB_DEFAULTJOBGROUP=spark
                                                                                                                                                                                                                                    LSB_JOB_ACCOUNTING_INTERVAL=60
                                                                                                                                                                                                                                    LSB_SUB_LOGLEVEL=3
                                                                                                                                                                                                                                    LSB_JOB_PROLOGUE="/opt/spark/current/bin/load-spark-env.sh"
                                                                                                                                                                                                                                    LSB_JOB_WRAPPER="mpirun -n 1 $LSF_BINDIR/lsb.wrapper $LSB_BINARY_NAME"
                                                                                                                                                                                                                                    LSB_HOSTS_TASK_MODEL=cpu


                                                                                                                                                                                                                                    An example Spark configuration file (spark-defaults.conf) that includes settings for running Spark applications using LSF:
                                                                                                                                                                                                                                    # Spark Configuration File
                                                                                                                                                                                                                                    # LSF settings
                                                                                                                                                                                                                                    spark.master=yarn
                                                                                                                                                                                                                                    spark.submit.deployMode=cluster
                                                                                                                                                                                                                                    spark.yarn.queue=default
                                                                                                                                                                                                                                    spark.executor.instances=2
                                                                                                                                                                                                                                    spark.executor.memory=2g
                                                                                                                                                                                                                                    spark.executor.cores=2
                                                                                                                                                                                                                                    spark.driver.memory=1g
                                                                                                                                                                                                                                    spark.driver.cores=1
                                                                                                                                                                                                                                    spark.yarn.am.memory=1g
                                                                                                                                                                                                                                    spark.yarn.am.cores=1
                                                                                                                                                                                                                                    spark.yarn.maxAppAttempts=2
                                                                                                                                                                                                                                    spark.eventLog.enabled=true
                                                                                                                                                                                                                                    spark.eventLog.dir=hdfs://namenode:8020/spark-event-logs
                                                                                                                                                                                                                                    spark.history.fs.logDirectory=hdfs://namenode:8020/spark-event-logs
                                                                                                                                                                                                                                    spark.scheduler.mode=FAIR
                                                                                                                                                                                                                                    spark.serializer=org.apache.spark.serializer.KryoSerializer

                                                                                                                                                                                                                                    This configuration file sets several parameters for running Spark applications on a YARN cluster managed by LSF, including specifying the number of executor instances, executor memory, and executor cores, as well as setting the queue and memory allocation for the Spark ApplicationMaster.



                                                                                                                                                                                                                                    Using LSF as the scheduler for Hadoop can provide better resource utilization, job scheduling, queuing, integration with other workloads, and monitoring and management capabilities than the built-in YARN scheduler. This can help improve the performance, scalability, and efficiency of Hadoop clusters, especially in large, complex environments.
                                                                                                                                                                                                                                    1. Better resource utilization: LSF has advanced resource allocation and scheduling algorithms that can improve resource utilization in Hadoop clusters. This can lead to better performance and reduced infrastructure costs.
                                                                                                                                                                                                                                    2. Better job scheduling: LSF has more advanced job scheduling features than YARN, such as support for job dependencies, job preemption, and priority-based job scheduling. This can help optimize job execution and reduce waiting times.
                                                                                                                                                                                                                                    3. Advanced queuing: LSF allows for more flexible and advanced queuing mechanisms, including job prioritization and preemption, multiple queues with different priorities, and customizable scheduling policies.
                                                                                                                                                                                                                                    4. Integration with other workloads: LSF is a general-purpose job scheduler that can be used to manage a wide range of workloads, including Hadoop, MPI, and other distributed computing frameworks. This allows for better integration and coordination of workloads on the same infrastructure.
                                                                                                                                                                                                                                    5. Advanced monitoring and management: LSF provides more advanced monitoring and management tools than YARN, including web-based interfaces, command-line tools, and APIs for job management, resource monitoring, and performance analysis.
                                                                                                                                                                                                                                    LSF is a versatile job scheduler that can be used for a wide range of workloads, including batch and real-time scheduling. While LSF is often used for batch scheduling workloads, it can also be used for real-time scheduling workloads like Apache Kafka, thanks to its advanced scheduling capabilities and integration capabilities with other distributed computing frameworks.

                                                                                                                                                                                                                                    LSF has advanced scheduling capabilities that can help optimize the allocation of resources for real-time workloads, including support for job prioritization, preemption, and multiple queues with different priorities. This can help ensure that real-time workloads are allocated the necessary resources in a timely and efficient manner.

                                                                                                                                                                                                                                    Furthermore, LSF has integration capabilities with other distributed computing frameworks like Apache Kafka. For example, LSF can be used to manage the resource allocation and scheduling of Kafka brokers, consumers, and producers. This can help optimize the performance and scalability of Kafka clusters.

                                                                                                                                                                                                                                    Examples for applications with real time scheduling:
                                                                                                                                                                                                                                    1. A major financial services company uses Hadoop and LSF to process real-time financial data. LSF is used to manage the allocation of compute resources for Hadoop, including managing the cluster's memory, CPU, and disk resources. This setup enables the company to process real-time financial data with low latency and high throughput.
                                                                                                                                                                                                                                    2. A large e-commerce company uses Hadoop and LSF to process large volumes of customer data in real-time. LSF is used to schedule and manage jobs across multiple Hadoop clusters, optimizing the allocation of resources to ensure that real-time processing is prioritized. This setup enables the company to personalize customer experiences and deliver targeted marketing campaigns in real-time.
                                                                                                                                                                                                                                    3. A global telecommunications company uses Hadoop and LSF to process real-time data from its network infrastructure. LSF is used to manage job scheduling and resource allocation, ensuring that data is processed quickly and efficiently. This setup enables the company to monitor and optimize network performance in real-time, providing a better customer experience.

                                                                                                                                                                                                                                    Overall, the combination of Hadoop and LSF can provide a powerful and flexible platform for processing both historical as well as real-time data in production environments. By leveraging the advanced resource management and scheduling capabilities of LSF, organizations can optimize performance, reduce latency, and improve the overall efficiency of their Hadoop clusters.

                                                                                                                                                                                                                                    Reference:

                                                                                                                                                                                                                                    Tuesday, April 4, 2023

                                                                                                                                                                                                                                    Linux Test Harness : avocado and op-test framework

                                                                                                                                                                                                                                    A Test Harness, also known as a testing framework or testing tool, is a software tool or library that provides a set of functions, APIs, or interfaces for writing, organizing, and executing tests. Test harnesses provide a structured way to write tests and automate the testing process. 

                                                                                                                                                                                                                                    Linux avocado test framework and Linux op-test framework are both open-source testing frameworks designed for testing and validating Linux-based systems. Both frameworks are widely used in the Linux community and have a strong user base. The choice between the two depends on the specific testing needs and requirements of the user.

                                                                                                                                                                                                                                    The Linux avocado test framework is a modular and extensible testing framework that allows users to write and run tests for different levels of the Linux stack, including the kernel, user space, and applications. It provides a wide range of plugins and tools for testing, including functional, performance, and integration testing. The framework is easy to install and use and supports multiple test runners and reporting formats.

                                                                                                                                                                                                                                    On the other hand, the Linux op-test framework is a set of Python libraries and utilities that automate the testing of hardware and firmware components in Linux-based systems. It provides a high-level Python API for interacting with hardware and firmware interfaces, as well as a set of pre-built tests for validating various hardware components such as CPU, memory, and storage. The framework is highly flexible and customizable, allowing users to create their own tests and integrate with other testing tools and frameworks.

                                                                                                                                                                                                                                    While both frameworks are designed for testing Linux-based systems, the Linux avocado test framework provides a broad range of testing capabilities across different levels of the Linux stack, while the Linux op-test framework focuses specifically on automating hardware and firmware testing. The choice between the two depends on the specific testing needs and requirements of the user.

                                                                                                                                                                                                                                    The Linux avocado test framework provides a plugin called "avocado-vt" which can be used to run tests that require a reboot between different test stages. This plugin enables the framework to run destructive tests, like kernel crash dump (kdump) testing, that require the system to be rebooted multiple times.

                                                                                                                                                                                                                                    Similarly, the Linux op-test framework also provides support for testing scenarios that require system reboot. The framework includes a "reboot" library that allows users to reboot the system under test and wait for it to come back up before continuing with the test. This library can be used to test scenarios like kdump and fadump that require system reboot.

                                                                                                                                                                                                                                    The community maintained avocado tests repository:

                                                                                                                                                                                                                                    Avocado is a set of tools and libraries to help with automated testing. One can call it a test framework with benefits. Native tests are written in Python and they follow the unittest pattern, but any executable can serve as a test.

                                                                                                                                                                                                                                    This repository contains a collection of miscellaneous tests and plugins for the Linux Avocado test framework that cover a wide range of functional, performance, and integration testing scenarios. The tests are designed to be modular and easy to use, and can be integrated with the Avocado test framework to extend its capabilities.

                                                                                                                                                                                                                                    https://github.com/avocado-framework-tests/avocado-misc-tests

                                                                                                                                                                                                                                    How to run avocado misc tests :

                                                                                                                                                                                                                                    To run the Avocado Misc Tests, you first need to install the Linux Avocado test framework on your system. Once you have installed the framework, you can clone the Avocado Misc Tests repository from GitHub by running the following command in a terminal:

                                                                                                                                                                                                                                    git clone https://github.com/avocado-framework-tests/avocado-misc-tests.git 

                                                                                                                                                                                                                                    git clone git@github.com:avocado-framework-tests/avocado-misc-tests.git

                                                                                                                                                                                                                                    # git clone git@github.com:avocado-framework-tests/avocado-misc-tests.git
                                                                                                                                                                                                                                    Cloning into 'avocado-misc-tests'...
                                                                                                                                                                                                                                    remote: Enumerating objects: 18087, done.
                                                                                                                                                                                                                                    remote: Counting objects: 100% (451/451), done.
                                                                                                                                                                                                                                    remote: Compressing objects: 100% (239/239), done.
                                                                                                                                                                                                                                    remote: Total 18087 (delta 242), reused 368 (delta 208), pack-reused 17636
                                                                                                                                                                                                                                    Receiving objects: 100% (18087/18087), 6.15 MiB | 16.67 MiB/s, done.
                                                                                                                                                                                                                                    Resolving deltas: 100% (11833/11833), done.
                                                                                                                                                                                                                                    #

                                                                                                                                                                                                                                    This repository is dedicated to host any tests written using the Avocado API. It is being initially populated with tests ported from autotest client tests repository, but it's not limited by that.

                                                                                                                                                                                                                                    After cloning the repository, you can navigate to the avocado-misc-tests directory and run the tests using the avocado run command. For example, to run all the tests in the network category, you can run the following command:

                                                                                                                                                                                                                                    cd avocado-misc-tests
                                                                                                                                                                                                                                    avocado run network/

                                                                                                                                                                                                                                    This will run all the tests in the network category. You can also run individual tests by specifying the path to the test file, like this:

                                                                                                                                                                                                                                    avocado run network/test_network_ping.py

                                                                                                                                                                                                                                    This will run the test_network_ping.py test in the network category.

                                                                                                                                                                                                                                    Before running the tests, you may need to configure the Avocado framework to use the appropriate test runner, test environment, and plugins for your system. You can find more information on how to configure and use the Avocado framework in the official documentation: 

                                                                                                                                                                                                                                    https://avocado-framework.readthedocs.io/en/latest/

                                                                                                                                                                                                                                    $ avocado run  avocado-misc-tests/generic/stress.py
                                                                                                                                                                                                                                    JOB ID     : 0018adbc07c5d90d242dd6b341c87972b8f77a0b
                                                                                                                                                                                                                                    JOB LOG    : $HOME/avocado/job-results/job-2016-01-18T15.32-0018adb/job.log
                                                                                                                                                                                                                                    TESTS      : 1
                                                                                                                                                                                                                                     (1/1) avocado-misc-tests/generic/stress.py:Stress.test: PASS (62.67 s)
                                                                                                                                                                                                                                    RESULTS    : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0
                                                                                                                                                                                                                                    JOB HTML   : $HOME/avocado/job-results/job-2016-01-18T15.32-0018adb/html/results.html
                                                                                                                                                                                                                                    TIME       : 62.67 s

                                                                                                                                                                                                                                    There are a few more interesting things about the Avocado test framework and its usability and use cases:

                                                                                                                                                                                                                                    1. Flexible test design: The Avocado test framework is designed to be flexible and adaptable to a wide range of testing scenarios. It supports various test types, including functional, integration, performance, and stress tests, and can be used to test software at different levels of abstraction, from system-level to individual components. Avocado also provides a wide range of plugins and interfaces for integrating with other tools and frameworks, making it easy to customize and extend its capabilities.
                                                                                                                                                                                                                                    2. Easy to use: Avocado is designed to be easy to use, even for users who are new to testing or have limited programming experience. It uses a simple YAML-based syntax for defining tests and test plans, and provides a user-friendly command-line interface for running tests and viewing results. Avocado also includes detailed documentation and tutorials to help users get started quickly.
                                                                                                                                                                                                                                    3. Scalability and distributed testing: Avocado supports distributed testing across multiple systems, making it easy to scale up testing to handle large workloads. It includes a built-in job scheduler for managing test execution across multiple systems, and can be integrated with various cloud-based services for running tests in the cloud.
                                                                                                                                                                                                                                    4. Community support: Avocado is an open-source project maintained by a vibrant community of developers and testers. The community provides regular updates and bug fixes, and is actively involved in improving the usability and functionality of the framework. The Avocado community also provides support through various channels, including GitHub, mailing lists, and IRC.
                                                                                                                                                                                                                                    5. Use cases: Avocado is used by various organizations and companies for testing different types of software, including operating systems, virtualization platforms, container platforms, and cloud services. It is particularly well-suited for testing complex, distributed systems that require a high degree of automation and scalability. Some of the organizations that use Avocado include Red Hat, IBM, Intel, and Huawei.

                                                                                                                                                                                                                                    License

                                                                                                                                                                                                                                    Except where otherwise indicated in a given source file, all original contributions to Avocado are licensed under the GNU General Public License version 2 (GPLv2) or any later version. By contributing you agree that these contributions are your own (or approved by your employer) and you grant a full, complete, irrevocable copyright license to all users and developers of the Avocado project, present and future, pursuant to the license of the project.

                                                                                                                                                                                                                                    ================

                                                                                                                                                                                                                                    The community maintained, op-tests repository:

                                                                                                                                                                                                                                    https://github.com/open-power/op-test

                                                                                                                                                                                                                                    git clone git@github.com:open-power/op-test.git

                                                                                                                                                                                                                                    # git clone git@github.com:open-power/op-test.git
                                                                                                                                                                                                                                    Cloning into 'op-test'...
                                                                                                                                                                                                                                    remote: Enumerating objects: 8716, done.
                                                                                                                                                                                                                                    remote: Counting objects: 100% (623/623), done.
                                                                                                                                                                                                                                    remote: Compressing objects: 100% (275/275), done.
                                                                                                                                                                                                                                    remote: Total 8716 (delta 416), reused 480 (delta 347), pack-reused 8093
                                                                                                                                                                                                                                    Receiving objects: 100% (8716/8716), 23.89 MiB | 23.39 MiB/s, done.
                                                                                                                                                                                                                                    Resolving deltas: 100% (6488/6488), done.
                                                                                                                                                                                                                                    #

                                                                                                                                                                                                                                                   ./op-test -c machine.conf --run testcases.RunHostTest --host-cmd ls

                                                                                                                                                                                                                                                  Testcase: https://github.com/open-power/op-test/blob/master/testcases/RunHostTest.py

                                                                                                                                                                                                                                                  where machine.conf :

                                                                                                                                                                                                                                                  [op-test]
                                                                                                                                                                                                                                                  bmc_type=OpenBMC /EBMC_PHYP/FSP_PHYP
                                                                                                                                                                                                                                                  bmc_username=abc
                                                                                                                                                                                                                                                  bmc_ip=w39
                                                                                                                                                                                                                                                  bmc_username=root
                                                                                                                                                                                                                                                  bmc_password=0penBmc
                                                                                                                                                                                                                                                  hmc_ip=a.b.c.d
                                                                                                                                                                                                                                                  hmc_username=hmcuser
                                                                                                                                                                                                                                                  hmc_password=hmcpasswd123
                                                                                                                                                                                                                                                  host_ip=x.y.x.k
                                                                                                                                                                                                                                                  host_user=hostuser
                                                                                                                                                                                                                                                  host_password=hostpasswd123
                                                                                                                                                                                                                                                  system_name=power10
                                                                                                                                                                                                                                                  lpar_name=lpar_name_1
                                                                                                                                                                                                                                                  lpar_prof=default_profile

                                                                                                                                                                                                                                                  Pre-requisites for op-tests: 
                                                                                                                                                                                                                                                  1) yum install sshpass 
                                                                                                                                                                                                                                                  2) pip3 install pexpect
                                                                                                                                                                                                                                                  3) echo "set enable-bracketed-paste off" > .inputrc ; export INPUTRC=$PWD/.inputrc

                                                                                                                                                                                                                                                  Listed are some interesting things about the op-test framework and its use cases:
                                                                                                                                                                                                                                                  1. Testing hardware systems: The op-test framework is designed for testing hardware systems, particularly servers, using the OpenPOWER architecture. It includes a wide range of tests that cover different aspects of hardware functionality, such as power management, CPU, memory, and I/O.
                                                                                                                                                                                                                                                  2. Integration with OpenBMC: The op-test framework integrates with the OpenBMC project, an open-source implementation of the Baseboard Management Controller (BMC) firmware that provides out-of-band management capabilities for servers. This integration allows users to control and monitor server hardware using the OpenBMC interface, and to run tests on the hardware using the op-test framework.
                                                                                                                                                                                                                                                  3. UEFI and firmware testing: The op-test framework includes support for testing UEFI firmware and other low-level system components, such as the Hostboot bootloader. This allows users to test the system firmware and ensure that it is functioning correctly.
                                                                                                                                                                                                                                                  4. Easy to use: The op-test framework is designed to be easy to use, even for users who are not familiar with hardware testing. It uses a simple command-line interface and provides detailed documentation and tutorials to help users get started quickly.
                                                                                                                                                                                                                                                  5. Scalability: The op-test framework is designed to be scalable and can be used to test multiple systems in parallel. This makes it suitable for testing large server farms and data centers.
                                                                                                                                                                                                                                                  6. Community support: The op-test framework is an open-source project with an active community of developers and testers. The community provides regular updates and bug fixes, and is actively involved in improving the usability and functionality of the framework. The op-test community also provides support through various channels, including GitHub, mailing lists, and IRC.
                                                                                                                                                                                                                                                  7. Use cases: The op-test framework is used by various organizations and companies for testing hardware systems, including server manufacturers, data center operators, and cloud service providers. Some of the organizations that use the op-test framework include IBM, Google, and Rackspace.
                                                                                                                                                                                                                                                  How to contribute to op-test framework open source community :

                                                                                                                                                                                                                                                  1) mkdir kdump_xive_off_check

                                                                                                                                                                                                                                                  2) cd kdump_xive_off_check

                                                                                                                                                                                                                                                  3) git clone git@github.com:SACHIN-PB/op-test.git

                                                                                                                                                                                                                                                      Fork the repository from master : https://github.com/open-power/op-test

                                                                                                                                                                                                                                                       NOTE: In Git, forking a repository means creating a copy of the original repository into your own GitHub account. 
                                                                                                                                                                                                                                                     This is typically done when you want to contribute to an open-source project or collaborate with other developers.

                                                                                                                                                                                                                                                  4) git config user.email

                                                                                                                                                                                                                                                  5) git config user.name

                                                                                                                                                                                                                                                  NOTE: To get proper username and email . Please do the following setup at /root directory
                                                                                                                                                                                                                                                  # cat .gitconfig
                                                                                                                                                                                                                                                  [user]
                                                                                                                                                                                                                                                          email = sachin@linux.XYZ.com
                                                                                                                                                                                                                                                          name = Sachin P B
                                                                                                                                                                                                                                                  #

                                                                                                                                                                                                                                                  6) git branch

                                                                                                                                                                                                                                                  7) git remote  -v
                                                                                                                                                                                                                                                      origin  git@github.com:SACHIN-PB/op-test.git (fetch)
                                                                                                                                                                                                                                                      origin  git@github.com:SACHIN-PB/op-test.git (push)

                                                                                                                                                                                                                                                  8) git remote add upstream git@github.com:open-power/op-test.git

                                                                                                                                                                                                                                                  9) git remote  -v
                                                                                                                                                                                                                                                        origin  git@github.com:SACHIN-PB/op-test.git (fetch)
                                                                                                                                                                                                                                                        origin  git@github.com:SACHIN-PB/op-test.git (push)
                                                                                                                                                                                                                                                        upstream        git@github.com:open-power/op-test.git (fetch)
                                                                                                                                                                                                                                                        upstream        git@github.com:open-power/op-test.git (push)

                                                                                                                                                                                                                                                  10) git checkout -b "kdump_xive_off_check"

                                                                                                                                                                                                                                                  11) git branch

                                                                                                                                                                                                                                                  12) vi testcases/PowerNVDump.py

                                                                                                                                                                                                                                                  13) git diff

                                                                                                                                                                                                                                                  14) git status

                                                                                                                                                                                                                                                  15) git add testcases/PowerNVDump.py

                                                                                                                                                                                                                                                  16) git status

                                                                                                                                                                                                                                                  17) git commit -s

                                                                                                                                                                                                                                                  18) git branch

                                                                                                                                                                                                                                                  19) git push origin kdump_xive_off_check
                                                                                                                                                                                                                                                  Enumerating objects: 7, done.
                                                                                                                                                                                                                                                  Counting objects: 100% (7/7), done.
                                                                                                                                                                                                                                                  Delta compression using up to 16 threads
                                                                                                                                                                                                                                                  Compressing objects: 100% (4/4), done.
                                                                                                                                                                                                                                                  Writing objects: 100% (4/4), 880 bytes | 880.00 KiB/s, done.
                                                                                                                                                                                                                                                  Total 4 (delta 3), reused 0 (delta 0), pack-reused 0
                                                                                                                                                                                                                                                  remote: Resolving deltas: 100% (3/3), completed with 3 local objects.
                                                                                                                                                                                                                                                  remote:
                                                                                                                                                                                                                                                  remote: Create a pull request for 'kdump_xive_off_check' on GitHub by visiting:
                                                                                                                                                                                                                                                  remote:      https://github.com/SACHIN-PB/op-test/pull/new/kdump_xive_off_check
                                                                                                                                                                                                                                                  remote:
                                                                                                                                                                                                                                                  To github.com:SACHIN-PB/op-test.git
                                                                                                                                                                                                                                                   * [new branch]      kdump_xive_off_check -> kdump_xive_off_check
                                                                                                                                                                                                                                                  #

                                                                                                                                                                                                                                                  20) Create PR using the link created at step 19 and request for the review
                                                                                                                                                                                                                                                  Example https://github.com/open-power/op-test/pull/7XYZ4:

                                                                                                                                                                                                                                                  21) You can update your PR by running these commands 
                                                                                                                                                                                                                                                  git commit --amend
                                                                                                                                                                                                                                                  git push -f origin kdump_xive_off_check
                                                                                                                                                                                                                                                  ======================

                                                                                                                                                                                                                                                  Reference:
                                                                                                                                                                                                                                                  1)  https://github.com/open-power/op-test/blob/master/testcases/RunHostTest.py
                                                                                                                                                                                                                                                  2)  https://github.com/avocado-framework-tests/avocado-misc-tests
                                                                                                                                                                                                                                                  3)  https://avocado-framework.readthedocs.io/en/latest/

                                                                                                                                                                                                                                                  Wednesday, March 29, 2023

                                                                                                                                                                                                                                                  Linux security and kernel Lockdown - kernel image access prevention feature

                                                                                                                                                                                                                                                  Linux has a long history of security-focused development and has been used in many high-security environments, such as military and government organizations. Linux is highly customizable, which allows administrators to tailor security configurations to their specific needs. For example, security modules like SELinux and AppArmor can be configured to enforce highly granular access control policies. Many Linux distributions include security-focused features, such as hardening patches and secure boot support, by default.The open-source nature of Linux allows for community-driven development and auditing, which can help to uncover security vulnerabilities and improve the overall security of the system. Linux containers, such as Docker and Kubernetes, have become increasingly popular in recent years and offer a more secure alternative to traditional virtualization solutions. Linux is widely used in cloud environments and has many built-in features for secure cloud deployments, such as network isolation and encryption. It is constantly being updated and improved with new security features and bug fixes, making it one of the most secure operating systems available.

                                                                                                                                                                                                                                                  The Kernel Lockdown feature is designed to prevent both direct and indirect access to a running kernel image, attempting to protect against unauthorized modification of the kernel image and to prevent access to security and cryptographic data located in kernel memory, whilst still permitting driver modules to be loaded. This is security feature, the Linux Security Module (LSM, nicknamed “lockdown”). It does promise to bring additional security to one of the most widely-used and hardened kernels on the market. The lockdown feature’s aim is to restrict various pieces of kernel functionality.  There are two modes available to the lockdown module: Integrity and Confidentiality. When in Integrity mode, kernel features which would allow userland code to modify the running kernel are disabled. When in Confidentiality mode, userland code to extract confidential information from the kernel will be disabled.First off, it will restrict access to kernel features that may allow arbitrary code execution by way of code supplied by any application or service outside of the kernel (aka “userland”). The new feature will also block processes from reading/writing to /dev/mem and /dev/kmem memory, as well as block access to opening /dev/port (as a means to prevent raw ioport access). Other features include:
                                                                                                                                                                                                                                                  1. Enforcing kernel module signatures.
                                                                                                                                                                                                                                                  2. Prevents even the root account from modifying the kernel code.
                                                                                                                                                                                                                                                  3. Kexec reboot (in case secure boot being enabled does not keep the secure boot mode in new kernel).
                                                                                                                                                                                                                                                  4. Lockdown of hardware that could potentially generate direct memory addressing (DMA).
                                                                                                                                                                                                                                                  5. Lockdown of KDADDIO, KDDELIO, KDENABIO and KDDISABIO console ioctls.
                                                                                                                                                                                                                                                  where
                                                                                                                                                                                                                                                  • The KDADDIO, KDDELIO, KDENABIO, and KDDISABIO console ioctls are used to manage console input/output (I/O) on Linux systems. Here's a brief overview of each of these ioctls:
                                                                                                                                                                                                                                                  • KDADDIO: This ioctl is used to add a new input/output device to the console. When a new device is added using KDADDIO, it can be used to send input to or receive output from the console.
                                                                                                                                                                                                                                                  • KDDELIO: This ioctl is used to remove an input/output device from the console. When a device is removed using KDDELIO, it is no longer able to send input to or receive output from the console.
                                                                                                                                                                                                                                                  • KDENABIO: This ioctl is used to enable input/output from a specific device on the console. When a device is enabled using KDENABIO, it can be used to send input to or receive output from the console.
                                                                                                                                                                                                                                                  • KDDISABIO: This ioctl is used to disable input/output from a specific device on the console. When a device is disabled using KDDISABIO, it is no longer able to send input to or receive output from the console.
                                                                                                                                                                                                                                                  NOTE: The "KD" in these console ioctls stands for Keyboard Display. The term "keyboard display" is used to refer to the console on a computer system, which includes the keyboard and screen used to interact with the system

                                                                                                                                                                                                                                                  If a prohibited or restricted feature is accessed or used, the kernel will emit a message that looks like:

                                                                                                                                                                                                                                                          Lockdown: X: Y is restricted, see man kernel_lockdown.7

                                                                                                                                                                                                                                                  where X indicates the process name and Y indicates what is restricted. On an EFI-enabled x86 or arm64 machine, lockdown will be automatically enabled if the system boots in EFI Secure Boot mode.

                                                                                                                                                                                                                                                  Coverage: When lockdown is in effect, a number of features are disabled or have their use restricted.  This includes special device files and kernel services that allow direct access of the kernel image:

                                                                                                                                                                                                                                                                /dev/mem
                                                                                                                                                                                                                                                                /dev/kmem
                                                                                                                                                                                                                                                                /dev/kcore
                                                                                                                                                                                                                                                                /dev/ioports
                                                                                                                                                                                                                                                                BPF
                                                                                                                                                                                                                                                                kprobes

                                                                                                                                                                                                                                                  and the ability to directly configure and control devices, so as to prevent the use of a device to access or modify a kernel image:The use of module parameters that directly specify hardware parameters to drivers through the kernel command line or when loading a module.

                                                                                                                                                                                                                                                  The term "lockdown" refers to a set of security features in the Linux kernel that are designed to prevent even privileged users, such as the root user, from bypassing certain security restrictions. These features are intended to provide an additional layer of protection against malicious software and unauthorized access to sensitive information.

                                                                                                                                                                                                                                                  There are two main components to the lockdown feature:

                                                                                                                                                                                                                                                  Integrity measurement: This feature prevents changes to the kernel's security settings, such as disabling secure boot or loading unsigned kernel modules, even by users with root privileges.

                                                                                                                                                                                                                                                  Confidentiality protection: This feature prevents user space processes from accessing certain sensitive information, such as kernel memory or hardware resources, even if the processes are running with root privileges.

                                                                                                                                                                                                                                                  The lockdown feature is a powerful tool for enhancing the security of Linux systems, particularly in high-security environments or those where data privacy is a top concern. However, it can also limit the flexibility of the system, so it's important to carefully consider the trade-offs before enabling this feature.

                                                                                                                                                                                                                                                  ---------------------------------------------------------

                                                                                                                                                                                                                                                  The lockdown feature and SELinux are both security features in the Linux kernel, but they serve different purposes and work independently of each other.

                                                                                                                                                                                                                                                  SELinux is a mandatory access control (MAC) system that enforces a set of security policies to determine what processes and users can access specific resources, such as files or network ports. It operates by labeling resources with a security context and assigning labels to users and processes. The security policies defined in SELinux are enforced by the kernel and can prevent unauthorized access and other security breaches.

                                                                                                                                                                                                                                                  The lockdown feature, on the other hand, is designed to prevent even privileged users, including those with root privileges, from bypassing certain security restrictions. It achieves this by restricting access to certain kernel features and preventing modifications to the kernel's security settings.

                                                                                                                                                                                                                                                  When both SELinux and the lockdown feature are enabled, they work together to provide a comprehensive security solution. SELinux enforces mandatory access controls to restrict access to resources, while the lockdown feature ensures that even privileged users cannot bypass certain security restrictions. This can help to prevent security breaches caused by malicious software or unauthorized access to sensitive information.

                                                                                                                                                                                                                                                  The combination of SELinux and the lockdown feature provides a powerful security solution for Linux systems. It's important to carefully configure and manage these features to ensure that they do not interfere with normal system operations or cause unintended consequences.

                                                                                                                                                                                                                                                  The idea of effectively rendering the root account less capable of working with a system (on a kernel level), might be considered (to some) a disservice to Linux (and Linux administrators). However, in the realm of business, absolute security is a necessity — especially on machines that house sensitive business/customer data. When the root account is under a form of strict lockdown, malicious code would be significantly more challenging to run rampant on a system. This could lead to fewer data breaches. And because the kernel developers are making the lockdown feature “optional,” it is possible for enterprise admins to enable the feature on production machines that store such sensitive data. Conversely, on standard desktop machines (or developer machines) the feature can remain disabled.

                                                                                                                                                                                                                                                  Linux kernel has several security features built into it to protect against various types of security threats. Some of the additional security features of Linux kernel include:

                                                                                                                                                                                                                                                  1) AppArmor: AppArmor is a mandatory access control (MAC) system that restricts the capabilities of individual applications or processes. It can be used to enforce security policies that limit the actions of individual applications, such as restricting access to certain files or network resources.

                                                                                                                                                                                                                                                  2) Control Groups (cgroups): cgroups provide a way to organize and manage system resources, such as CPU, memory, and I/O bandwidth, among different processes. This helps to prevent individual processes from monopolizing system resources, which can improve system performance and stability.

                                                                                                                                                                                                                                                  3) Kernel SamePage Merging (KSM): KSM allows multiple identical memory pages to be merged into a single page, reducing memory usage and improving system performance. However, this feature also presents a potential security risk, as it could allow an attacker to create a malicious page that looks like a legitimate page, thereby bypassing memory protection measures.

                                                                                                                                                                                                                                                  4) Executable Space Protection: Executable Space Protection is a security feature that prevents execution of code from memory pages that are marked as data or stack. This helps to prevent buffer overflow and other types of attacks that rely on executing code in memory regions that are not intended for code execution.

                                                                                                                                                                                                                                                  5) Secure Boot: Secure Boot is a security feature that ensures that only trusted software is executed during system boot-up. It uses cryptographic signatures to verify the authenticity of boot loaders and other critical components of the system, preventing unauthorized or malicious software from running at boot time.

                                                                                                                                                                                                                                                  6) Address Space Layout Randomization (ASLR): This feature randomizes the memory layout of user space programs, making it more difficult for attackers to exploit vulnerabilities in the program's code.

                                                                                                                                                                                                                                                  7) Seccomp: This feature provides a mechanism for filtering system calls that can be made by a process, allowing administrators to restrict the system calls that can be made by certain programs.

                                                                                                                                                                                                                                                  8) Trusted Platform Module (TPM): This is a hardware-based security feature that provides a secure storage area for cryptographic keys and other sensitive data. It can be used to enhance the security of system booting, disk encryption, and other security-related functions.

                                                                                                                                                                                                                                                  9)  SELinux similar to AppArmor. These are two popular security modules that provide mandatory access control (MAC) enforcement in the Linux kernel. They use security policies to define what resources, such as files and network ports, can be accessed by which processes and users.

                                                                                                                                                                                                                                                  -----------------------------

                                                                                                                                                                                                                                                  The Linux kernel includes a variety of cryptography algorithms that can be used to provide secure communication, storage, and other security-related functions. Here are some of the key cryptography algorithms and features in the Linux kernel:

                                                                                                                                                                                                                                                  1) Advanced Encryption Standard (AES): AES is a symmetric encryption algorithm that is widely used for data encryption. The Linux kernel includes an implementation of AES that can be used by applications and other kernel subsystems.

                                                                                                                                                                                                                                                  2) RSA: RSA is an asymmetric encryption algorithm that is used for digital signatures and key exchange. The Linux kernel includes an implementation of RSA that can be used by applications and other kernel subsystems.

                                                                                                                                                                                                                                                  3) SHA: SHA (Secure Hash Algorithm) is a family of cryptographic hash functions that are used for digital signatures, data integrity checking, and other security-related functions. The Linux kernel includes implementations of several SHA algorithms, including SHA-1, SHA-256, and SHA-512.

                                                                                                                                                                                                                                                  4) Random Number Generation: Random number generation is a critical component of many cryptographic functions. The Linux kernel includes several sources of entropy that are used to generate high-quality random numbers for use in cryptography algorithms.

                                                                                                                                                                                                                                                  5) Cryptographic API: The Linux kernel includes a Cryptographic API that provides a standard interface for using cryptographic functions in kernel modules and applications. The API includes support for a wide range of cryptographic algorithms and features, including those listed above.

                                                                                                                                                                                                                                                  6) Filesystem Encryption: The Linux kernel includes support for encrypting filesystems using the dm-crypt subsystem. This allows for encrypted storage of sensitive data and can be used to protect against data theft in the event of a system breach.

                                                                                                                                                                                                                                                  7) IPSec: IPSec is a protocol suite for securing IP communications, including VPNs and other types of network connections. The Linux kernel includes support for IPSec, which can be used to secure network communications between Linux systems.

                                                                                                                                                                                                                                                  Linux kernel has a wide range of built-in security features that can help to protect against various types of security threats. Linux-based security offers a wide range of benefits, including customizability, community-driven development, and a long history of use in high-security environments. These factors have helped to make Linux a popular choice for organizations looking to enhance the security of their systems and data.

                                                                                                                                                                                                                                                  Reference:

                                                                                                                                                                                                                                                  https://man7.org/linux/man-pages/man7/kernel_lockdown.7.html