Friday, June 28, 2019

Enable PRE- and POST-execution processing feature of Spectrum LSF in large scale Cluster

An HPC cluster consists of hundreds or thousands of compute servers that are networked together. InfiniBand is pervasively used in high-performance computing (HPC) to remove data exchange bottlenecks, delivering very high throughput and very low latency. As HPC becomes more mainstream and embraced by enterprise users, there is a need for assurances that performance is optimized

Users schedule their jobs to run on HPC cluster by submitting them through Spectrum LSF.  IBM® Spectrum LSF (formerly IBM® Platform™ LSF®) is a complete workload management solution for demanding HPC environments.

One of the important task before  execution of application is preparing jobs inorder to get best performance on the given HPC cluster. Select the most appropriate queue for each job and provide accurate wall-clock times in your job script. This will help us fit your job into the earliest possible run opportunity. Note the system's usable memory and configure your job script to maximize performance. Next, prepare/tune  the nodes (or servers)  to desired values. This can be done by pre-execution processing. For example , Set the CPU statically to highest frequency ( or any requirement for that matter). After , execution of LSF job , you could set it back to previously set values by post-execution processing.
source

Configuration to enable pre- and post-execution processing :



The pre- and post-execution processing feature is enabled by defining at least one of the parameters in the list below at the application or queue level, or by using the -E option of the bsub command to specify a pre-execution command. In some situations, specifying a queue-level or application-level pre-execution command can have advantages over requiring users to use bsub -E. For example, license checking can be set up at the queue or application level so that users do not have to enter a pre-execution command every time they submit a job.

The following example illustrates how job-based pre- and post-execution processing works at the queue or application level for setting the environment prior to job execution and for transferring resulting files after the job runs.

source
Host-based pre- and post-execution processing is different from job-based pre- and post-execution processing in that it is intended for parallel jobs (you can also use this feature for sequential jobs) and is executed on all execution hosts, as opposed to only the first execution host. The purpose of this is to set up the execution hosts(or servers) before all job-based pre-execution and other pre-processing which depend on host-based preparation, and clean up execution hosts after job-based post execution and other post-processing.

There are two ways to enable host-based pre- and post-execution processing for a job:
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.queues.
  • Configure HOST_PRE_EXEC and HOST_POST_EXEC in lsb.applications.
     
Lets take the example of queue level configurations with HOST_PRE_EXEC/HOST_POST_EXEC :

LSF queue (QUEUE_NAM=Queue_pre_post) set with HOST_PRE_EXEC and HOST_POST_EXEC .

where HOST_PRE_EXEC points to the " pre_setup_perf.sh" .Similarly, HOST_POST_EXEC points to "post_setup_perf.sh" and set previous values back on the server after Performance testing.

Modify the configuration file "lsb.queues" . For example
------------------------------------------------------------------------------

Begin Queue
QUEUE_NAME   = Queue_pre_post
PRIORITY     = 40
INTERACTIVE  = NO
FAIRSHARE    = USER_SHARES[[default,1]]
HOSTS        = server1 server2          # hosts on which jobs in this queue can run
EXCLUSIVE    = Y
HOST_PRE_EXEC     = /home/sachinpb/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC    = /home/sachinpb/post_setup_perf.sh >> /tmp/post.out
DESCRIPTION  = For P8 performance jobs, running only if hosts are lightly loaded.
End Queue

----------------------------------------------------------------------------------

After modification, please run the command.


badmin reconfigure
-------------------------------------------------------------------------------


Check the queue status:


[sachinpb@server1 ~]$ bqueues -l Queue_pre_post
QUEUE: Queue_pre_post
  -- For P8 performance jobs, running only if hosts are lightly loaded.
PARAMETERS/STATISTICS
PRIO NICE STATUS          MAX JL/U JL/P JL/H NJOBS  PEND   RUN SSUSP USUSP  RSV PJOBS
 40    0  Open:Active       -    -    -    -     0     0     0     0     0    0     0
Interval for a host to accept two jobs is 0 seconds
SCHEDULING PARAMETERS
           r15s   r1m  r15m   ut      pg    io   ls    it    tmp    swp    mem
 loadSched   -     -     -     -       -     -    -     -     -      -      -
 loadStop    -     -     -     -       -     -    -     -     -      -      -
SCHEDULING POLICIES:  FAIRSHARE  EXCLUSIVE  NO_INTERACTIVE
USER_SHARES:  [default, 1]
SHARE_INFO_FOR: Queue_pre_post/

USER/GROUP   SHARES  PRIORITY  STARTED  RESERVED  CPU_TIME  RUN_TIME   ADJUST
sachinpb          1       0.326      0        0       350.7        0       0.000
USERS: all
HOSTS:  server1 server2
HOST_PRE_EXEC:  /home/sachinpb/sachin/pre_setup_perf.sh >> /tmp/pre.out
HOST_POST_EXEC:  /home/sachinpb/sachin/post_setup_perf.sh >> /tmp/post.out

[sachinpb@server1 ~]$
---------------------------------------------------------------------------------

HOST_PRE_EXEC=command (in lsb.queues):
  •     Enables host-based pre-execution processing at the queue level.
  •     The pre-execution command runs on all execution hosts before the job starts.
  •     If the HOST_PRE_EXEC command exits with a non-zero exit code, LSF requeues the job to the front of the queue.
  •     The HOST_PRE_EXEC command uses the same environment variable values as the job.
  •     The HOST_PRE_EXEC command can only be used for host-based pre- and post-execution processing.

HOST_POST_EXEC=command (in lsb.queues):
  •     Enables host-based post-execution processing at the queue level.
  •     The HOST_POST_EXEC command uses the same environment variable values as the job.
  •     The post-execution command for the queue remains associated with the job. The original post-execution command runs even if the job is requeued or if the post-execution command for the queue is changed after job submission.
  •     Before the post-execution command runs, LSB_JOBEXIT_STAT is set to the exit status of the job. The success or failure of the post-execution command has no effect on LSB_JOBEXIT_STAT.
  •     The post-execution command runs after the job finishes, even if the job fails.
  •     Specify the environment variable $USER_POSTEXEC to allow UNIX users to define their own post-execution commands.
  •     The HOST_POST_EXEC command can only be used for host-based pre- and post-execution processing.

-------------------------------------------------------
Now submit the LSF job  as shown :

[sachinpb@server1 sachin]$ bsub  -q Queue_pre_post -n 8 -R "span[ptile=4]" < myjob.script
bsub> Job <19940> is submitted to queue <Queue_pre_post>.
[sachinpb@server1 ]$ bjobs
JOBID   USER    STAT  QUEUE          FROM_HOST   EXEC_HOST    JOB_NAME     SUBMIT_TIME
19940 sachinpb  RUN   Queue_pre_post  server1   server2     myjob.script  Jun 28 05:16
                                                                                  server2
                                                                                  server2
                                                                                  server2
                                                                                  server1
                                                                                  server1
                                                                                  server1
                                                                                  server1
[sachinpb@server1 ]$
---------------------------------------------------------------------------

Logs from each server available at " $SACHIN_HOME/logs" for both pre- and post-execute processing as shown below. Check for log files after completion of LSF jobID 19940:

There should be 4  log files -:
  • There are 2 pre-check-logs  executed on server1 and server2
  • Similarly, there are  2 post-check-logs  executed on server1 and server2

Example:

[sachinpb@server1 logs]$ ls -alsrt
 8 -rw-rw-r-- 1  sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server2.out
 8 -rw-r--r-- 1    sachinpb sachinpb  4657 Jun 28 05:16 preScript_28-Jun-05_16_server1.out
 8 -rw-r--r-- 1   sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server1.out
 8 -rw-rw-r-- 1  sachinpb sachinpb  4653 Jun 28 05:17 postScript_28-Jun-05_17_server2.out
[sachinpb@server1 logs]$


---------------------------------------------------------------------------

pre-exec script:

[sachinpb@server1]$ cat pre_setup_perf.sh
#!/bin/bash
echo "Start Pre-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh pre_check | tee $SACHIN_HOME/logs/preScript_${DATE}_${HOST}.out

echo "End of Pre-execution script on $(hostname)"
[sachinpb@server1 ]$

----------------------------------------------------------------------------

post-exec script:

[sachinpb@server1 ]$ cat post_setup_perf.sh
#!/bin/bash
echo "Start Post-execution script on $(hostname)"
HOST=`hostname`
DATE=$(date +%d-%b-%H_%M)
sudo /home/sachinpb/sachin/tune_this_server.sh post_check | tee /$SACHIN_HOME/logs/postScript_${DATE}_${HOST}.out
echo "End of Post-execution script on $(hostname)"
[sachinpb@server1 ]$

-----------------------------------------------------------------------------

NOTE: Similarly,  you could do this in PRE_EXEC/POST_EXEC=command (in lsb.applications, lsb.queues) & HOST_PRE_EXEC/HOST-POST-EXEC=command (in lsb.applications) as per the application requirements.

I hope this blog helped in understanding how to configure pre- and post processing feature of Spectrum LSF. 

No comments:

Post a Comment