Friday, October 18, 2019

How to use System Tap - Who killed my process




In computing, SystemTap (stap) is a scripting language and tool for dynamically instrumenting running production Linux kernel-based operating systems. System administrators can use SystemTap to extract, filter and summarize data in order to enable diagnosis of complex performance or functional problems.

SystemTap consists of free and open-source software and includes contributions from Red Hat, IBM, Intel, Hitachi, Oracle, and other community members


Installation : yum install systemtap systemtap-runtime
 


To determine which process is sending the signal to application/process, it is necessary to trace the signals through the Linux kernel. 

Script 1:  An example script that will monitor SIGKILL and SIGTERM send to the myApp_mtt process

cat my-systemtap_SIGKILL_SIGTERM.stp
--------------------------------------------------------------------- 
#! /usr/bin/env stap
#
# This systemtap script will monitor for SIGKILL and SIGTERM signals send to
# a process named "myApp_mtt".script show process tree of process
# which tried to kill "myApp_mtt"
#

probe signal.send {
  if ((sig_name == "SIGKILL" || sig_name == "SIGTERM") && pid_name == "myApp_mtt") {
    printf("%10d   %-34s   %-10s   %5d   %-7s   %s pid: %d, tid:%d uid:%d ppid:%d\n",
             gettimeofday_s(), tz_ctime(gettimeofday_s()), pid_name, sig_pid, sig_name, execname(), pid(), tid(), uid(), ppid());

    cur_proc = task_current();
    parent_pid = task_pid(task_parent (cur_proc));

    while (parent_pid != 0) {
        printf ("%s (%d),%d,%d -> ", task_execname(cur_proc), task_pid(cur_proc), task_uid(cur_proc),task_gid (cur_proc));
        cur_proc = task_parent(cur_proc);
        parent_pid = task_pid(task_parent (cur_proc));
    }
  }
}

probe begin {
  printf ("\nSACHIN P B: Investigating a murder mistery of Mr. myApp_mtt\n");
  printf("systemtap script started at: %s\n\n", tz_ctime(gettimeofday_s()));
  printf("%50s%-18s\n",
    "",  "Signaled Process");
  printf("%-10s   %-34s   %-10s   %5s   %-7s   %s\n",
    "Epoch", "Time of Signal", "Name", "PID", "Signal", "Signaling Process Name");
  printf("---------------------------------------------------------------");
  printf("---------------------------------------------------------------");
  printf("\n");
}

probe end {
  printf("\n");
}
----------------------------------------------------------
Script 2:  Sample Shell script to send signals SIGTERM/SIGKILL

cat I_am_killer-007.sh
#!/bin/bash
echo "I am going to kill Mr.myApp_mtt sooner.....wait and watch"
sleep 20
pkill -SIGTERM myApp_mtt     ----> CASE1
pkill _SIGKILL myApp_mtt     ----->  CASE 2
echo "Done !!!.......Catch me if you can !"
-----------------------------------------------------
CASE 1:  Test  SIGTERM
Step 1 : Lets start systemtap as shown below:
[root@myhostname sachin]# stap my-systemtap_SIGKILL_SIGTERM.stp
SACHIN P B: Investigating a murder mistery of Mr. myApp_mtt
systemtap script started at: Thu Oct 17 18:34:54 2019 EDT

                                                  Signaled Process
Epoch        Time of Signal                       Name           PID   Signal    Signaling Process Name
------------------------------------------------------------------------------------------------------------------------------
waits here to print logs  when  SIGTERM and SIGKILL  caught

+++++++++++++++++++++++++++++++++++
Step 2 : Lets start our application myApp_mtt
[root@myhostname sachin]# ./myApp_mtt &
[1] 114583
[root@myhostname sachin]#

[root@myhostname sachin]#  ps -ef | grep myApp_mtt | grep -v grep
root     114583  80054  0 19:04 pts/8    00:00:00 ./myApp_mtt
[root@myhostname sachin]#
++++++++++++++++++++++++++++++++++
Step 3: Lets kill this application sending SIGTERM

[root@myhostname sachin]# ./I_am_killer-007.sh
I am going to kill Mr.myApp_mtt sooner.....wait and watch
+++++++++++++++++++++++++++++++++++++
Step 4: Verify PID/PPID  of process that sends SIGKILL
[root@myhostname sachin]#  ps -ef | grep I_am_killer-007.sh | grep -v grep
root     122566  79450  0 19:05 pts/7    00:00:00 /bin/bash ./I_am_killer-007.sh
[root@myhostname sachin]#
+++++++++++++++++++++++++++++++++++++
Step 5: Check for completion:
[root@myhostname sachin]# ./I_am_killer-007.sh
I am going to kill Mr.myApp_mtt sooner.....wait and watch
Done !!!.......Catch me if you can !
[root@myhostname sachin]#
[root@myhostname sachin]#  ps -ef | grep I_am_killer-007.sh | grep -v grep
root     122566  79450  0 19:05 pts/7    00:00:00 /bin/bash ./I_am_killer-007.sh
[root@myhostname sachin]#
[1]+  Terminated              ./myApp_mtt
[root@myhostname sachin]#
++++++++++++++++++++++++++++++++++++++
Step 6: Check  system tap logs -that should match pid of parent process and killer.
[root@myhostname sachin]# stap my-systemtap_SIGKILL_SIGTERM.stp
SACHIN P B: Investigating a murder mistery of Mr. myApp_mtt
systemtap script started at: Thu Oct 17 19:03:53 2019 EDT

                                                  Signaled Process
Epoch        Time of Signal                       Name           PID   Signal    Signaling Process Name
------------------------------------------------------------------------------------------------------------------------------
1571353566   Thu Oct 17 19:06:06 2019 EDT         myApp_mtt       114583   SIGTERM   pkill pid: 124080, tid:124080 uid:0 ppid:122566
pkill (124080),0,0 -> I_am_killer-007 (122566),0,0 -> bash (79450),0,0 -> su (79449),0,0 -> sudo (78656),0,0 -> bash (78202),560045,100 -> sshd (78200),560045,100 -> sshd (77624),0,0 -> sshd (11405),0,0 ->



+++++++++++++++++++++++++++++++++++++++++++
CASE 2 : Test SIGKILL
Step 1 : Now , You  change the script  to send signal SIGKILL  to myApp_mtt.
[root@myhostname sachin]# cat I_am_killer-007.sh
#!/bin/bash
echo "I am going to kill Mr.myApp_mtt sooner.....wait and watch"
sleep 20
pkill -SIGKILL myApp_mtt
echo "Done !!!.......Catch me if you can !"
++++++++++++++++++++++++++++++++++++++++++++

Step 2: Verify PID/PPID  of process that sends SIGKILL
[root@myhostname sachin]#  ./myApp_mtt &
[2] 151008
[root@myhostname sachin]# ps -ef | grep myApp_mtt | grep -v grep
root     151008 150421  0 03:07 pts/45   00:00:00 ./myApp_mtt
[root@myhostname sachin]#  ps -ef | grep I_am_killer-007.sh | grep -v grep
root     151027 150627  0 03:07 pts/4    00:00:00 /bin/bash ./I_am_killer-007.sh
[root@myhostname sachin]#
[root@myhostname sachin]# ./I_am_killer-007.sh
I am going to kill Mr.myApp_mtt sooner.....wait and watch
Done !!!.......Catch me if you can !
[root@myhostname sachin]#
[1]   Killed                  ./myApp_mtt
[root@myhostname sachin]#
+++++++++++++++++++++++++++++++++++++++++++
Step 3: Check systemtap logs for SIGKILL signal and to know the process that killed   myApp_mtt
[root@myhostname sachin]# stap my-systemtap_SIGKILL_SIGTERM.stp
SACHIN P B: Investigating a murder mistery of Mr. myApp_mtt
systemtap script started at: Fri Oct 18 03:07:34 2019 EDT

                                                  Signaled Process
Epoch        Time of Signal                       Name           PID   Signal    Signaling Process Name
------------------------------------------------------------------------------------------------------------------------------
1571382496   Fri Oct 18 03:08:16 2019 EDT         myApp_mtt       151008   SIGKILL   pkill pid: 151049, tid:151049 uid:0 ppid:151027
pkill (151049),0,0 -> I_am_killer-007 (151027),0,0 -> bash (150627),0,0 -> su (150626),0,0 -> sudo (150623),0,0 -> bash (150589),560045,100 -> sshd (150588),560045,100 -> sshd (150582),0,0 -> sshd (11405),0,0 ->


Conclusion : We caught the killer (I_am_killer-007) who sent SIGNAL (SIGTERM/SIGKILL) to process/application

++++++++++++++++++++++++++++++++++++++++++++++++++++++
Reference:
1) https://sourceware.org/systemtap/SystemTap_Beginners_Guide/
2) https://www.thegeekdiary.com/how-to-find-which-process-is-killing-myApp_mtt-with-sigkill-or-sigterm-on-linux/
3) https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html-single/systemtap_language_reference/index
4)http://epic-alfa.kavli.tudelft.nl/share/doc/systemtap-client-2.7/examples/network/connect_stat.stp