Wednesday, January 25, 2017

Spectrum LSF : accounting Logs for resource usage statistics





 The bacct command uses the current lsb.acct file for its output. Displays a summary of accounting statistics for all finished jobs (with a DONE or EXIT status) submitted by the user who invoked the command, on all hosts, projects, and queues in the LSF system. bacct displays statistics for all jobs logged in the current Platform LSF accounting log file: LSB_SHAREDIR/cluster_name/logdir/lsb.acct.

The lsb.acct file is the batch job log file of LSF. The master batch daemon
mbatchd generates a record for each job completion or failure. The record is
appended to the job log file lsb.acct

The file is located at  LSB_HOMESHAREDIR/cluster_name/logdir

[root@c61f4s20 conf]# bacct
/lsf_home/work/CI_cluster1/logdir/lsb.acct: No such file or directory
[root@c61f4s20 conf]#

NOTE: Create a file "lsb.acct"  at LSB_HOMESHAREDIR/cluster_name/logdir
and restart LSF daemons to take your modifications.
----------------------------------------
[lsfadmin@localhost logdir]$ pwd
/lsf_home/work/CI_cluster1/logdir
[lsfadmin@localhost logdir]$ ls -alsrt lsb.acct
44 -rw-r--r-- 1 lsfadmin lsfadmin 41182 Jan 25 01:32 lsb.acct
[lsfadmin@localhost logdir]$
----------------------------------------


[lsfadmin@localhost logdir]$ head -2 lsb.acct
"JOB_FINISH" "10.1" 1484136455 106 1002 33554438 1 1484136355 0 0 1484136355 "lsfadmin" "normal" "" "" "" "c61f4s20" "/lsf_home/conf" "" "" "" "1484136355.106" 1 "c712f6n07" 1 "c712f6n07" 64 250.0 "" "sleep 100" 0.505916 0.029839 9088 0 -1 0 0 931 0 0 0 256 -1 0 0 0 18 7 -1 "" "default" 0 1 "" "" 0 12288 352256 "" "" "" "" 0 "" 0 "" -1 "/lsfadmin" "" "" "" -1 "" "" 5136 "" 1484136355 "" "" 2 1032 "0" 1033 "0" 0 -1 0 12288 "select[type == any] order[r15s:pg] " "" -1 "" -1 0 "" 0 0 "" 100 "/lsf_home/conf" 0 "" 0.000000 0.00 0.00 0.00 0.00 1 "c712f6n07" -1 0 0 0
"JOB_FINISH" "10.1" 1484136493 107 1002 33554438 1 1484136393 0 0 1484136393 "lsfadmin" "normal" "" "" "" "c61f4s20" "/lsf_home/conf" "" "" "" "1484136393.107" 1 "localhost" 1 "localhost" 64 250.0 "" "sleep 100" 0.516205 0.018902 9088 0 -1 0 0 932 0 0 0 256 -1 0 0 0 18 7 -1 "" "default" 0 1 "" "" 0 12288 352256 "" "" "" "" 0 "" 0 "" -1 "/lsfadmin" "" "" "" -1 "" "" 5136 "" 1484136393 "" "" 2 1032 "0" 1033 "0" 0 -1 0 12288 "select[type == any] order[r15s:pg] " "" -1 "" -1 0 "" 0 0 "" 100 "/lsf_home/conf" 0 "" 0.000000 0.00 0.00 0.00 0.00 1 "localhost" -1 0 0 0
[lsfadmin@localhost logdir]$

-------------------------------
1) Submit a  job:

[lsfadmin@localhost ~]$ bsub
bsub> sleep 400
bsub> Job <232> is submitted to default queue <normal>.

2) Check status

[lsfadmin@localhost ~]$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
232     lsfadmi RUN   normal     localhost   c712f6n07   sleep 400  Jan 25 01:25

3) Verify the statistics  on jobs submitted by the USER  called lsfadmin  by using  bacct command
[lsfadmin@localhost ~]$ bacct

Accounting information about jobs that are:
  - submitted by users lsfadmin,
  - accounted on all projects.
  - completed normally or exited
  - executed on all hosts.
  - submitted to all queues.
  - accounted on all service classes.
------------------------------------------------------------------------------

SUMMARY:      ( time unit: second )
 Total number of done jobs:      69      Total number of exited jobs:    26
 Total CPU time consumed:     146.0      Average CPU time consumed:     1.5
 Maximum CPU time of a job:     5.0      Minimum CPU time of a job:     0.0
 Total wait time in queues: 1181375.0
 Average wait time in queue:12435.5
 Maximum wait time in queue:681365.0      Minimum wait time in queue:    0.0
 Average turnaround time:     12738 (seconds/job)
 Maximum turnaround time:    681365      Minimum turnaround time:         2
 Average hog factor of a job:  0.00 ( cpu time / turnaround time )
 Maximum hog factor of a job:  0.02      Minimum hog factor of a job:  0.00
 Average expansion factor of a job:  12435.08 ( turnaround time / run time )
 Maximum expansion factor of a job:  681365.00
 Minimum expansion factor of a job:  1.00
 Total Run time consumed:     28773      Average Run time consumed:     302
 Maximum Run time of a job:    1000      Minimum Run time of a job:       0
 Total throughput:             0.29 (jobs/hour)  during  330.25 hours
 Beginning time:       Jan 11 07:07      Ending time:          Jan 25 01:22

[lsfadmin@localhost ~]$

NOTE: custom made Statistics not reported by bacct but of interest to individual system administrators  can be generated by directly using awk or perl to process the lsb.acct file.

Reference:
1) http://www.ibm.com/support/knowledgecenter/SSJSMF_9.1.2/lsf/workbookhelp/wb_usecase_lsf.html
2) https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.3/lsf_command_ref/bacct.1.html
3) https://www.ibm.com/support/knowledgecenter/SSETD4_9.1.2/lsf_admin/job_exit_info_view_logged.html
4) http://www-01.ibm.com/support/docview.wss?uid=isg3T1013490

Tuesday, January 24, 2017

Spectrum LSF 10.1 Installation and Job Submission


Load Sharing Facility (or simply LSF) is a workload management platform, job scheduler, for distributed HPC environments. It can be used to execute batch jobs on networked Unix and Windows systems on many different architectures. LSF was based on the Utopia research project at the University of Toronto.

IBM Platform Computing is now renamed to IBM Spectrum Computing to complement IBM’s Spectrum Storage family of software-defined offerings. The IBM Platform LSF product is now IBM Spectrum LSF.
IBM Spectrum LSF 10.1 is available as the following offering packages: 
1) IBM Spectrum LSF Community Edition 10.1, 
2) IBM Spectrum LSF Suite for Workgroups 10.1, and 
3) IBM Spectrum LSF Suite for HPC 10.1.

LSF provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress. Jobs always run according to host load and site policies.


LSF daemons and processes

Multiple LSF processes run on each host in the cluster. The type and number of processes that are running depends on whether the host is a master host or a compute host.


LSF hosts run various daemon processes, depending on their role in the cluster.


DaemonRole
mbatchdJob requests and dispatch
mbschdJob scheduling
sbatchdJob execution
resJob execution
limHost information
pimJob process information
elimDynamic load indexes


Installation Steps :

Step 1 : Create installation directory: 
[root@localhost LSF_installation_files]# pwd
/root/LSF_installation_files
[root@localhost LSF_installation_files]#

Step 2 : Untar the package 
----------------
[root@localhost LSF_installation_files]# ls
lsfce10.1-x86_64  lsfce10.1-x86_64.tar.gz
[root@localhost LSF_installation_files]#


[root@localhost LSF_installation_files]# gunzip -c lsfce10.1-x86_64.tar.gz | tar xvf -
lsfce10.1-x86_64/
lsfce10.1-x86_64/pmpi/
lsfce10.1-x86_64/pmpi/platform_mpi-09.01.02.00u.x64.bin
lsfce10.1-x86_64/lsf/
lsfce10.1-x86_64/lsf/lsf10.1_lsfinstall_linux_x86_64.tar.Z
lsfce10.1-x86_64/lsf/lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z
lsfce10.1-x86_64/pac/
lsfce10.1-x86_64/pac/pac10.1_basic_linux-x64.tar.Z
[root@localhost LSF_installation_files]#

/root/LSF_installation_files/lsfce10.1-x86_64/lsf/
[root@localhost lsf]# ls
lsf10.1_linux2.6-glibc2.3-x86_64.tar.Z  lsf10.1_lsfinstall_linux_x86_64.tar.Z
[root@localhost lsf]# zcat lsf10.1_lsfinstall_linux_x86_64.tar.Z | tar xvf -

----------------------

[root@localhost lsf10.1_lsfinstall]# pwd
/root/LSF_installation_files/lsfce10.1-x86_64/lsf/lsf10.1_lsfinstall
[root@localhost lsf10.1_lsfinstall]# ls
conf_tmpl  install.config  lap         lsf_unix_install.pdf  patchlib   README      rpm      slave.config
hostsetup  instlib         lsfinstall  patchinstall          pversions  rhostsetup  scripts
[root@localhost lsf10.1_lsfinstall]#

---------------------------
=========================
2. Use lsfinstall
========================
The installation program for IBM Spectrum LSF Version 10.1 is lsfinstall.
Use the lsfinstall script to install a new LSF Version 10.1 cluster.

------------------------
2.1 Steps
------------------------
1. Edit lsf10.1_lsfinstall/install.config to specify the options
   for your cluster. Uncomment the options you want and replace the
   example values with your own settings.
2. Run lsf10.1_lsfinstall/lsfinstall -f install.config
3. Read the following files generated by lsfinstall:
   o  lsf10.1_lsfinstall/lsf_getting_started.html to find out how
      to set up your LSF hosts, start LSF, and test your new LSF cluster
   o  lsf10.1_lsfinstall/lsf_quick_admin.html to learn more about
      your new LSF cluster

--------------------------------------------------------------------------
Start install script :


[root@localhost lsf10.1_lsfinstall]# ./lsfinstall -f install.config


Cron scheduler - cron.daily, cron.weekly, cron.monthly

Cron is a daemon that can be used to schedule the execution of recurring tasks according to a combination of the time, day of the month, month, day of the week, and week.

To check the RPM package for cron: 


[root@ ~]# rpm -qa | grep cron
cronie-anacron-1.4.11-14.el7_2.1.ppc64le
crontabs-1.11-6.20121102git.el7.noarch
cronie-1.4.11-14.el7_2.1.ppc64le

[root@~]#
-------------------------------------------------

To check the status of cron daemon:

[root@~]# /sbin/service crond status
Redirecting to /bin/systemctl status  crond.service
● crond.service - Command Scheduler
   Loaded: loaded (/usr/lib/systemd/system/crond.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2016-11-30 15:05:45 EST; 1 months 24 days ago
 Main PID: 4633 (crond)
   CGroup: /system.slice/crond.service
           └─4633 /usr/sbin/crond -n

Nov 30 15:05:45 hostname systemd[1]: Started Command Scheduler.
Nov 30 15:05:45 hostname systemd[1]: Starting Command Scheduler...
Nov 30 15:05:45 hostname crond[4633]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 71% if used.)
Nov 30 15:05:45 hostname  crond[4633]: (CRON) INFO (running with inotify support)
[root@ ~]#

------------------------------------------------------
The main configuration file for cron, /etc/crontab, contains the following lines:

[root@~]# cat /etc/crontab
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed


[root@~]#
-------------------------------------------------------------
minute   hour   day   month   dayofweek   command
Each line in the /etc/crontab file represents a task and has the format:


  • minute — any integer from 0 to 59
  • hour — any integer from 0 to 23
  • day — any integer from 1 to 31 (must be a valid day if a month is specified)
  • month — any integer from 1 to 12 (or the short name of the month such as jan or feb)
  • dayofweek — any integer from 0 to 7, where 0 or 7 represents Sunday (or the short name of the week such as sun or mon)
  • command — the command to execute (the command can either be a command such as ls /proc >> /tmp/proc or the command to execute a custom script)
As shown in the /etc/crontab file, it uses the run-parts script to execute the scripts in the /etc/cron.hourly/etc/cron.daily/etc/cron.weekly, and /etc/cron.monthly directories on an hourly, daily, weekly, or monthly basis respectively. The files in these directories should be shell scripts.

# record the memory usage of the system every monday 
# at 3:30AM in the file /tmp/meminfo
30 3 * * mon cat /proc/meminfo >> /tmp/meminfo
# run custom script the first day of every month at 4:10AM
10 4 1 * * /root/scripts/backup.sh
If a cron task needs to be executed on a schedule other than hourly, daily, weekly, or monthly, it can be added to the /etc/cron.d directory.
------------------------------------

How to test your cron jobs instantly :


[root@ jenkins]# crontab -l

* * * * * /etc/cron.weekly/test.sh &>>/tmp/cron_debug_log.log

[root@cjenkins]# 

 tail -f /tmp/cron_debug_log.log

will run your command for every minute and logged 

---------------------------------
[root@ ]# run-parts /etc/cron.weekly -v
/etc/cron.weekly/test.sh:
[root@ ]#
------------------------------------------

RHEL / CentOS  find out cron timings for /etc/cron.{daily,weekly,monthly}/
-----------------------------------------------------------------------------------------------
 cat /etc/anacrontab
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL=/bin/sh
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY=45
# the jobs will be started during the following hours only
START_HOURS_RANGE=3-22

#period in days   delay in minutes   job-identifier   command
1       5       cron.daily              nice run-parts /etc/cron.daily
7       25      cron.weekly             nice run-parts /etc/cron.weekly
@monthly 45     cron.monthly            nice run-parts /etc/cron.monthly
------------------------------------------------------------------------------------------------

Cron also offers some special strings, which can be used in place of the five time-and-date fields:

Reference:

1) https://www.cyberciti.biz/faq/linux-when-does-cron-daily-weekly-monthly-run/
2) https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/3/html/System_Administration_Guide/ch-autotasks.html
3) https://help.ubuntu.com/community/CronHowto
4)https://www.digitalocean.com/community/tutorials/how-to-schedule-routine-tasks-with-cron-and-anacron-on-a-vps