Friday, June 28, 2019

Enable PRE- and POST-execution processing feature of Spectrum LSF in large scale Cluster

An HPC cluster consists of hundreds or thousands of compute servers that are networked together. InfiniBand is pervasively used in high-performance computing (HPC) to remove data exchange bottlenecks, delivering very high throughput and very low latency. As HPC becomes more mainstream and embraced by enterprise users, there is a need for assurances that performance is optimized

Users schedule their jobs to run on HPC cluster by submitting them through Spectrum LSF.  IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments.

One of the important task before  execution of application is preparing jobs inorder to get best performance on the given HPC cluster. Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity. Note the system's usable memory and configure your job script to maximize performance. Next, prepare/tune  the nodes (or servers)  to desired values. This can be done by pre-execution processing. For example , Set the CPU statically to highest frequency ( or any requirement for that matter). After , execution of LSF job , you could set it back to previously set values by post-execution processing.
source

Configuration to enable pre- and post-execution processing :



The pre- and post-execution processing feature is enabled by defining at least one of the parameters in the list below at the application or queue level, or by using the -E option of the bsub command to specify a pre-execution command. In some situations, specifying a queue-level or application-level pre-execution command can have advantages over requiring users to use bsub -E. For example, license checking can be set up at the queue or application level so that users do not have to enter a pre-execution command every time they submit a job.

The following example illustrates how job-based pre- and post-execution processing works at the queue or application level for setting the environment prior to job execution and for transferring resulting files after the job runs.

source
Host-based pre- and post-execution processing is different from job-based pre- and post-execution processing in that it is intended for parallel jobs (you can also use this feature for sequential jobs) and is executed on all execution hosts, as opposed to only the first execution host. The purpose of this is to set up the execution hosts(or servers) before all job-based pre-execution and other pre-processing which depend on host-based preparation, and clean up execution hosts after job-based post execution and other post-processing.

There are two ways to enable host-based pre- and post-execution processing for a job:
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.queues.
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.applications.
     
Lets take the example of queue level configurations with HOST_PRE_EXEC/HOST_POST_EXEC :

LSF queue (QUEUE_NAM=Queue_pre_post) set with HOST_PRE_EXEC and HOST_POST_EXEC .

where HOST_PRE_EXEC points to the " pre_setup_perf.sh" .Similarly, HOST_POST_EXEC points to "post_setup_perf.sh" and set previous values back on the server after Performance testing.

Modify the configuration file "lsb.queues" . For example
------------------------------------------------------------------------------

Begin Queue
QUEUE_NAME   = Queue_pre_post
PRIORITY     = 40
INTERACTIVE  = NO
FAIRSHARE    = USER_SHARES[[default,1]]
HOSTS        = server1 server2          # hosts on which jobs in this queue can run
EXCLUSIVE    = Y
HOST_PRE_EXEC     = /home/sachinpb/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC    = /home/sachinpb/post_setup_perf.sh >> /tmp/post.out
DESCRIPTION  = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue

----------------------------------------------------------------------------------

After modification, please run the command.


badmin reconfigure
-------------------------------------------------------------------------------


Check the queue status:


[sachinpb@server1 ~]$ bqueues -l Queue_pre_post
QUEUE: Queue_pre_post
  -- For P8 performance jobs, running only if hosts are lightly loaded.
PARAMETERS/STATISTICS
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV PJOBS
 40    0  Open:Active       -    -    -    -     0     0     0     0     0    0     0
Interval for a host to accept two jobs is 0 seconds
SCHEDULING PARAMETERS
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -
SCHEDULING POLICIES:  FAIRSHARE  EXCLUSIVE  NO_INTERACTIVE
USER_SHARES:  [default, 1]
SHARE_INFO_FOR: Queue_pre_post/

USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
sachinpb          1       0.326      0        0       350.7        0       0.000
USERS: all
HOSTS:  server1 server2
HOST_PRE_EXEC:  /home/sachinpb/sachin/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC:  /home/sachinpb/sachin/post_setup_perf.sh >> /tmp/post.out

[sachinpb@server1 ~]$
---------------------------------------------------------------------------------

HOST_PRE_EXEC=command (in lsb.queues):
  •     Enables host-based pre-execution processing at the queue level.
  •     The pre-execution command runs on all execution hosts before the job starts.
  •     If the HOST_PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.
  •     The HOST_PRE_EXEC command uses the same environment variable values as the job.
  •     The HOST_PRE_EXEC command can only be used for host-based pre- and post-execution processing.

HOST_POST_EXEC=command (in lsb.queues):
  •     Enables host-based post-execution processing at the queue level.
  •     The HOST_POST_EXEC command uses the same environment variable values as the job.
  •     The post-execution command for the queue remains associated with the job. The original post-execution command runs even if the job is requeued or if the post-execution command for the queue is changed after job submission.
  •     Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.
  •     The post-execution command runs after the job finishes, even if the job fails.
  •     Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.
  •     The HOST_POST_EXEC command can only be used for host-based pre- and post-execution processing.

-------------------------------------------------------
Now submit the LSF job  as shown :

[sachinpb@server1 sachin]$ bsub  -q Queue_pre_post -n 8 -R "span[ptile=4]" < myjob.script
bsub> Job <19940> is submitted to queue <Queue_pre_post>.
[sachinpb@server1 ]$ bjobs
JOBID   USER    STAT  QUEUE          FROM_HOST   EXEC_HOST    JOB_NAME     SUBMIT_TIME
19940 sachinpb  RUN   Queue_pre_post  server1   server2     myjob.script  Jun 28 05:16
                                                                                  server2
                                                                                  server2
                                                                                  server2
                                                                                  server1
                                                                                  server1
                                                                                  server1
                                                                                  server1
[sachinpb@server1 ]$
---------------------------------------------------------------------------

Logs from each server available at " $SACHIN_HOME/logs" for both pre- and post-execute processing as shown below. Check for log files after completion of LSF jobID 19940:

There should be 4  log files -:
  • There are 2 pre-check-logs  executed on server1 and server2
  • Similarly, there are  2 post-check-logs  executed on server1 and server2

Example:

[sachinpb@server1 logs]$ ls -alsrt
 8 -rw-rw-r-- 1  sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server2.out
 8 -rw-r--r-- 1    sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server1.out
 8 -rw-r--r-- 1   sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server1.out
 8 -rw-rw-r-- 1  sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server2.out
[sachinpb@server1 logs]$


---------------------------------------------------------------------------

pre-exec script:

[sachinpb@server1]$ cat pre_setup_perf.sh
#!/bin/bash
echo "Start Pre-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh pre_check | tee $SACHIN_HOME/logs/preScript_${DATE}_${HOST}.out

echo "End of Pre-execution script on $(hostname)"
[sachinpb@server1 ]$

----------------------------------------------------------------------------

post-exec script:

[sachinpb@server1 ]$ cat post_setup_perf.sh
#!/bin/bash
echo "Start Post-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh post_check | tee /$SACHIN_HOME/logs/postScript_${DATE}_${HOST}.out
echo "End of Post-execution script on $(hostname)"
[sachinpb@server1 ]$

-----------------------------------------------------------------------------

NOTE: Similarly,  you could do this in PRE_EXEC/POST_EXEC=command (in lsb.applications, lsb.queues) & HOST_PRE_EXEC/HOST-POST-EXEC=command (in lsb.applications) as per the application requirements.

I hope this blog helped in understanding how to configure pre- and post processing feature of Spectrum LSF. 

Saturday, June 8, 2019

Hewlett Packard Enterprise to Acquire Supercomputer Pioneer Cray



Supercomputers have long been a mainstay of military and intelligence agencies, used for chores ranging from cracking codes to designing nuclear weapons. They have many civilian uses as well, like predicting weather, creating new drugs and simulating the effect of crashes on auto designs.Computing firms are racing to reach exascale performance, or a quintillion operations per second.

Hewlett Packard Enterprise will pay about $1.3 billion to acquire Cray, which has designed some of the most powerful computer systems in use. It will also enable HPE to start selling supercomputer components to corporate clients and others. HPE will integrate Cray’s supercomputer technology into its product portfolio and build an HPC-as-a-service and AI cloud services on HPE GreenLake. It’s synergistic. Because of this acquisition, HPE has access to one of the only differentiated stacks in the market and can now compete more aggressively in those five vertical markets. The combination of Cray and HPE creates an industry leader in the fast-growing High-Performance Computing (HPC) and AI markets and creates a number of opportunities that neither company would likely be able to capture on their own

Interestingly, Cray is actually the second supercomputer manufacturer picked up by HPE over its lifetime; the company also picked up the remaining assets of Silicon Graphics back in 2016. On November 1, 2016, Hewlett Packard Enterprise completed its acquisition of SGI (formerly Silicon Graphics ) for $275 million.

HPE is already the HPC server market leader, but by adding Cray into the fold, HPE strengthens its position against its two chief rivals, Dell EMC and IBM. Currently, IBM is the maker of the world’s two fastest supercomputers, Summit and Sierra. In 2018, HPE ranked first in HPC server sales with 34.8% market share, followed by Dell EMC with 20.8% and IBM with 7.1% of the market. All other HPC players, including Lenovo, Atos, Inspur, and Sugon, have single-digit market share, including Cray with 2.3%.

Cray, based in Seattle, traces its lineage to a company founded in 1972 in Minnesota by the computer designer Seymour Cray (“The father of supercomputing,”). That company was bought in 1996 by Silicon Graphics; it was sold in 2000 to Tera Computer, which adopted the Cray name.  Cray was impressively successful for a small company, but it’s more difficult as a standalone company against the bigger companies. Cray has morphed into an integrator and scale-out specialist, combining processors from the likes of Intel, AMD, and NVIDIA into supercomputers, and applying their own software, I/O, and interconnect technologies. The company is currently working with AMD on a US government-backed project to build the world's first exascale supercomputer at Oak Ridge National Laboratory. The system will be capable of a quintillion calculations per second.

Frontier Supercomputer at ORNL , partnership with AMD


Cray is currently contracted to build two of the world’s fastest supercomputers for two US Department of Energy Labs: the Oak Ridge National Laboratory and the Argonne National Laboratory. Both systems, one called Frontier being built in partnership with AMD and one called Aurora with Intel, are promised to bring so-called “exascale” performance, with raw performance power of the excess of 1.5 exfaflops, or a quintillion calculations per second. Both systems will support the converged use of analytics, AI, and HPC at extreme scale, using Cray’s new Shasta system architecture, software stack and Slingshot interconnect.

Aurora Supercomputer at ANL , partnership with with Intel

What worries some officials in the United States is the rapid rise of suppliers based in China. One of them, Lenovo, which bought former IBM hardware operations, led the rankings with 140 supercomputers installed. Two others, Inspur and Sugon, were second with 84 and third with 54, respectively. With the acquisition, HPE will get Cray’s current generation XC and CS supercomputers and the next-generation Shasta supercomputer that features a new high-speed interconnect, code-named Slingshot. Cray’s products also include high-performance storage, a full HPC system software stack, and data analytics and AI solutions that are fully integrated.

source

Most HPC hardware vendors today sell commoditized cluster stacks comprising of Infiniband or Ethernet-based networking. But IBM and HPE are the only HPC vendors that can differentiate themselves: IBM with its Power9 processors, and now HPE with Cray’s Slingshot and ClusterStor HPC storage hardware. Cray gives HPE a unique, technological edge,” he said. “Slingshot provides HPE with an extremely high-bandwidth network for HPC and AI workloads. It enables new network topologies that are required for extreme-scale HPC and AI. Supercomputers of this scale can be massively beneficial to data-intensive industries like astronomy, climate science, medicine, neuroscience, and physics. Increasingly, these systems can be used in artificial intelligence research, which, in turn, can help accelerate many other areas of scientific inquiry. That said, a supercomputer like Aurora or Frontier tends only to be built and financed by the government for, at least initially, military applications.

Thankfully, over time, these upcoming exascale supercomputers will likely be freed from the military apparatus and put to work divining new insights from data. According to HPE, the acquisition of Cray is primarily to help it gain an edge in AI research and the hardware required to train ever-larger neural nets. The company was also trying to make further inroads into the enterprise market, and HPE has experience and enterprise customers at a time when many businesses are recognizing the need to invest in HPC for AI, data analytics, and other business operations, such as marketing and ERP. The requirements in the commercial markets are pushing up into the HPC space. Cray was trying to get into the commercial market too, but they didn’t have the experience to do that, and now they are part of a company that has a big position in the commercial markets. For a lot of enterprises this is a new type of compute, but they have to start embracing it because workloads increasingly require MPC (Massively Parallel Computing). You can’t compete anymore without doing AI, big data analytics, and increasingly, simulation and modeling. They don’t necessarily want to purchase the hardware and put all that together. It’s complex. So if you can get it with a cloud-like consumption model as HPE GreenLake offers, then you turn it into an OpEx investment.


Broadly speaking, major acquisitions and mergers in the supercomputing space are rare events. Due to their ever-increasing price tag, only a small number of world-class supercomputers are sold each year. And due to these prices the buyers are often governments, which inevitably gives supercomputer construction a nationalistic element to it. None the less, because costs are increasing – Frontier is the US’s most expensive system yet at over $500M for the system alone. HPE’s plans include using Cray’s technologies to improve HPE GreenLake, the company’s HPC-as-a-Service offering