==============================================================
Linux Kernel State Tracer (LKST)
--------------------------------

Date:   June. 28, 2002

Version: 1.2

Authors: Kazuyoshi Serizawa <serizawa@sdl.hitachi.co.jp>
         Tetsuhito Nakamura <tetsu@crl.hitachi.co.jp>
         Keisuke Hatasaki   <keisukeh@crl.hitachi.co.jp>
         Takahiro Tsurukubo <tsurukubo.t@jp.fujitsu.com>
         Yasunori Goto      <y-goto@jp.fujitsu.com>
         Satoshi Sakamoto   <s-sakamoto@jp.fujitsu.com>
         Takao Indoh        <indou.takao@jp.fujitsu.com>
        
==============================================================
COPYRIGHT (C) HITACHI,LTD. 2001-2002 ALL RIGHTS RESERVED. 
COPYRIGHT (C) FUJITSU,LTD. 2001-2002 ALL RIGHTS RESERVED. 
==============================================================

Introduction
-------------
    Linux Kernel State Tracer(LKST) records information as trace data
    about events in the Linux Kernel. It records various events like 
    process context switch, send signal, exception, memory allocation,
    send packet, and so on.
    
    This program is a facility for analyzing trouble of the Linux Kernel.
    It becomes possible that developers investigate troubles by this facility
    without stopping important(used for Mission Critical purposes) server.
    
    Functions of present Linux could not offer enough information for 
    developers to solve the trouble of the system. 
    
       - Standard Kernel:When the kernel is panic, it displays only the
                         registers and stack information. If developers
                         want more information, it is necessary to insert 
                         printk() in the kernel and recompile it. 
       - LKCD           :LKCD store a dump when the system crash.
                         But, if developers use it only, they cannot 
                         investigate the sequence of events till kernel panic.
         (LKCD is a.k.a. Linux Kernel Crash Dumps)
    
    Event information of the kernel is always recorded as trace data 
    on the server under running by using LKST. So, these problems can be
    solved. 
        
    Moreover, LKST has some features. 
      - It is possible to change dynamically which events are recorded.
        Developers can obtain information about the events which they concern
        only. And, It reduces the overhead of components which is not related
        with trouble.
      - It is possible to change each handlers related each events.
        A default handler of all events is just recording the events.
        But, if it is necessary, default handler can be changed.
        This function can be used as following,
          +Notify user processes when the handler detects abnormal status.
          +Change a maskset,which controls what type of events should be
           recorded, dynamically by the handler.
           ->  The system can usually run with a few events trace for the cause
               of good performance. And, when the handler detects abnormal
               status, it can change a maskset to get more detail information.


Release Notes
-------------
   - Performance improvement of default handler.
   - Performance improvement of log format tool.
   - lkstdump script.
   - Support devfs to make /dev/lkst.
   - Update commands.
   - Change some APIs spec.
   - Fix some bugs.

Installation
------------
   See file of "Install" or "Install_rpm".


How To Use
----------
   Please refer the online manual for details of each command.

  1) Display the present kernel trace data
      It is necessary to execute 2 commands.
      
      a. Get a log buffer from kernel
         Enter the command as follows. Then LKST log buffer is saved to
          <logfile> file.
         
             % lkstbuf read -f logfile
         
      b. Display the trace data
         Enter the command as follows. Trace data is written to stdout.
         This command can be specified various options to filter the traced
         record.

             % lkstbuf print -f logfile

        <output example>
        -------------------------------------------------------------------
        event_type=context_switch
                cpu=02, pid=00001008
                time=Tue Mar 26 18:48:53.143274134 2002
                arg1=0xd68a4000 0x00000000 : pointer to task_struct(prev)
                arg2=0xf2dde000 0x00000000 : pointer to task_struct(next)
                arg3=0x00000001 0x0000000a : process state/process count

        event_type=mem_get_freepage
                cpu=02, pid=00001008
                time=Tue Mar 26 18:48:53.143270345 2002
                arg1=0xecc0f000 0x00000000 : pointer to page
                arg2=0x000001f0 0x00000000 : type of page(gfp_mask)
                arg3=0x00000000 0x00000000 : the number of page(order)
                arg4=0xc0168197 0x00000000 : call address

        event_type=mem_malloc
                cpu=02, pid=00001008
                time=Tue Mar 26 18:48:53.143268419 2002
                arg1=0x00000018 0x00000000 : size
                arg2=0x000001f0 0x00000000 : flags
                arg3=0xdfe3e5a0 0x00000000 : objp
                arg4=0xc0168578 0x00000000 : call address
        ...
        ...
        -------------------------------------------------------------------

             
  2) Change which events are recorded.
     The traced event types can be changed by doing as follows.
     
     a.Get a maskset file.
         At first, please get a maskset file which control what event types
         should be recorded by LKST.
         
           % lkstm read -m id -d | grep 0x > maskset_file
         
         "id" is the decimal number which is used as an identifier of the
          maskset. Id 0, 1, and 2 are the reserved maskset; Id 0 does not
          record any event types, id 1 records all of the event types, and 
          id 2 records the default event types.
         
     b.Edit the maskset file.
         Next, edit the saved maskset_file.
         In default, event of spinlock() is not recorded, so id of
         event handler for event 0x080 is 0x0ff.
         (Event type number of spin_lock() is 0x080.)

            :               :
          0x07f           0x001
          0x080           0x0ff               <<<
          0x081           0x0ff               
            :               :

         To record the event, please change the value "0x0ff" to "0x001"
         at right side of the line which contains its event number.

            :               :
          0x07f           0x001
          0x080           0x001               <- change
          0x081           0x0ff               
            :               :
             
         
         Oppositely, If one event is not wanted to record, 
         please delete the line which is written it's event number.
         ex) event type of system call entry ( number = 0x030).
         
            :               :
          0x02f           0x001
          0x030           0x0ff               <- change
          0x031           0x001               
            :               :
         
         All the of Event type numbers are written in lkst-events-1.05.pdf
	 that is available from download page.
         
     c.Write the new maskset.
         The new maskset is written when the next command is executed.
         
           % lkstm write -m id -f maskset_file
         
         "id" can be specified from 4 to 254. 
         Please repeat the operation from a. to c., if you want to make more 
         masksets. 
         
         If you want to confirm the state of the maskset, maskset_read 
         have to be executed alone. 
         
           % lkstm read -m id
         
     d.Change maskset
         New maskset becomes effective if maskset_set command is executed,
         and the traced event types are changed. 
         
           % lkstm set -m id 

     e.Confirm which maskset is currently selected.
         If you want to know which maskset has been selected,
         use lkst_status command. 

           % lkst status

          <output example>
        -------------------------------------------------------------------
        <Current status>
        number of masksets      : 4
        number of event-handlers: 4
        number of buffers       : 5
        current_maskset_id      : 2
        current_buffer_id (cpu: 000): 1
        current_buffer_id (cpu: 001): 2
        current_buffer_id (cpu: 002): 3
        current_buffer_id (cpu: 003): 4
        -------------------------------------------------------------------

         ID of the selected maskset is shown to the right of 
         "current_maskset_id" in the forth line. In this example, it is 2.
         And the number of right of "number of masksets" is total number of
         the masksets. In this example, it is 4.
         cf. Other information means as follows.
            "number of event-handlers" :number of registered event handlers
            "number of buffers"        :number of created buffers
            "current_buffer_id"        :ID of currently selected buffers for
                                        all CPUs


  3) Invoke LKST logging daemon
     LKST logging daemon continues to output the containts of event
     buffer to log files.
     Please refer the online manual of lkstlogd(8) for details.
     Name of the files are /var/log/lkst/sebufCC.S, CC represents
     cpu id corresponds to the file and S represents a sequence
     number (described later).
     If directory /var/log/lkst does not exist, please make it as follows.

           % mkdir /var/log/lkst

     This is an example of command to invoke the daemon,

           % lkstlogd -a -b 2097152 -l 10485760 -n 2

     LKST logging daemon will stop and restart to write the file
     when the daemon receives a signal SIGUSR1.
     To send the signal, this is convenient,

           % kill -USR1 `cat /var/run/lkstlogd.pid`

     The size of each file are limited as specified by -l option,
     so old records would be overwritten. To avoid overwitten,
     LKST logging daemon can change files. Said sequence number
     will be increased on this change and to discriminate new 
     set of files and old files. To change file, 

           % kill -USR2 `cat /var/run/lkstlogd.pid`

     examples/lkst_watch_daemon is an example of this feature.
     This script watch unexpected termination of another daemon,
     then change the files.


  4) Dump logs in panic()
     Please do "3) Install addon tools" in "Install".
     And just do,

                 % insmod lkst_mod_panicdump.o maskset_id_base=2

     A parameter "maskset_id_base" specifies which maskset to
     be used as a template of the new maskset.
     For testing, this module can call panic(). See 6)-c below.


  5) How to display stack trace on each event
     a.Insmod the stack trace module.
	  Please do "3) Install addon tools" in "Install".
	  And do,

                 % insmod lkst_mod_stacktrace maskset_id_base=2

       A parameter "maskset_id_base" specifies a template.

     b.Execute special formatter
          Execute lkst_mod_stacktrace_read.pl. 
	  
                 % lkstbuf read > logfile
                 % perl lkst_mod_stacktrace_read.pl \
		     -s /boot/System.map-`uname -r` -k /proc/ksyms -d logfile

	  All of the options can be omitted. The default value for -s and -k
	  are the same as those values of above command line. If option -d
	  is omitted, this formatter reads from current event buffers.

          <output example>
        -------------------------------------------------------------------
        0x00057f24: 0000269005494693 [mc], \
         type:        INT_HARDWARE_ENTRY(0x0010), pid: 0000000000, cpu: 000 \
          0x00000000 0x00000000 0x00000000 0x00000001 \
          0x00000000 0x00000000 0x00000000 0x00000000 \
            <LKST_ETYPE_INT_HARDWARE_ENTRY_HEADER+98> <default_idle+0> <defau
        lt_idle+0> <call_do_IRQ+5> <default_idle+0> <default_idle+0> <default
	_idle+44> <cpu_idle+82> 
        
        0x00057f26: 0000269005511255 [mc], \
         type:       INT_TASKLETHI_ENTRY(0x0012), pid: 0000000000, cpu: 000 \
          0x00000000 0xc01257d0 0x00000000 0x00000000 \
          0x00000000 0x00000000 0x00000000 0x00000000 \
            <LKST_ETYPE_INT_TASKLETHI_ENTRY_HEADER+60> <bh_action+0> <do_soft
        irq_Rsmp_f0a529b7+123> <LKST_ETYPE_INT_HARDWARE_ENTRY_HEADER+459> <de
        fault_idle+0> <default_idle+0> <call_do_IRQ+5> <default_idle+0> 
        
        0x00057f28: 0000269005513862 [mc], \
         type:              INT_BH_ENTRY(0x0016), pid: 0000000000, cpu: 000 \
          0x00000000 0x00000000 0x00000000 0xc012a390 \
          0x00000000 0x00000000 0x00000000 0x00000000 \
            <LKST_ETYPE_INT_BH_ENTRY_HEADER+71> <timer_bh+0> <LKST_ETYPE_INT_
	TASKLETHI_ENTRY_HEADER+74> <do_softirq_Rsmp_f0a529b7+123> <LKST_ETYPE
	_INT_HARDWARE_ENTRY_HEADER+459> <default_idle+0> <default_idle+0> <ca
	ll_do_IRQ+5> 
        -------------------------------------------------------------------

	NOTE: Contents of stack trace might include some unrelated address.


  6) Change event handlers
     As described on the "Introduction", users can register additional 
     event handler in order to add function more than recording the events.
     Please refer addons/lkst_mod_panicdump.c as an example.

     a.Register new event-handler.
     b.Change maskset.
          When lkst_mod_panicdump.o is loaded, the initialiser in
	  the module will do them automatically.
	  The initialiser will register lkst_evhandler_panicdump()
	  as a new eventhandler that ID is 128, and create a 
	  new maskset named "maskset_panicdump".

     c.Control event handler.
	  The control function of the event handler can invoke panic() 
	  for testing.

                 % echo panic | lksteh ctrl -e 128 -f -

	  The control function compares the argument and "panic",
	  if correspond, the call panic("LKST PANIC TEST");
                
  7) Add user buffer
     If a new user buffer is added, data can be recorded in the added 
     buffer.
     It becomes possible that a lot of data can be recorded without 
     overflowing the standard buffer.

     Please refer exapmles/lkst_test_run.
     This is an example of switching the buffer where data is recorded
     when the specified command is executed. 

           % cd examples
           % sh lkst_test_run 65536 ps aux

     The first argument is the size of new buffer, and latter arguments
     are command to execute.
     
  8) How to use LKST as an event counter
     a.Insmod the counter module.
          Make addons/lkst_mod_event_count.o, and insmod it.
     b.Execute command.
          Execute addons/lkst_mod_event_count_read, and then
          the total of each event counter is displayed.

     NOTE:
       After insmod, lkst_mod_event_count_read can be executed only once.

  9) How to insert trace point
     a.Define new event type in lkst_events.h as follows.

         LKST_ETYPE_DEF(event-ID, priority, hook-type, event-name, description)

           event-ID    --- Value of the event type(0x000..0xfff)

                              0x000-0x0ff: preset
                              0x100-0x1ff: user
                              0x200-0xeff: reserved
                              0xf00-0xfff: system

           priority    --- Priority of the event type(0x00..0xff)

           hook-type   --- Type of hook header for Kernel Hooks
                           Specify either the following according to the insertion
                           location of the HOOK macro.

                           - NORMAL: If you insert HOOK macro in the kernel,
                                     use this type.

                           - MODULE: If you insert HOOK macro in the module,
                                     use this type.

                           - INLINE: If you insert HOOK macro in the in-line
                                     function of the kernel, use this type.
                                     If you insert the same HOOK macro
                                     in the two or more places, use this.

                           NOTE: If you insert HOOK macro in the in-line
                                 function of the module, use MODULE type.
                                                 ~~~~~~

           event-name  --- Mnemonic of the event type
           description --- Description of the event type

          (example)
              LKST_ETYPE_DEF(0x100, 0x0A, NORMAL, NEW_EVENT, NEW_EVENT)


     b.Insert following HOOK macro where you want to add the trace point.

         Add following sentence to the file to be added the trace point.
         #include <linux/lkst.h>

         - LKST_HOOK_INLINE(event-name, argument1, argument2, argument3, argument4)

             The case of inserting HOOK macro in the in-line function or the
             macro function. If you insert the same HOOK macros in the two or more
             places, use this.

         - LKST_HOOK(event-name, argument1, argument2, argument3, argument4)

             The case except the above-mentioned. 

          * event-name   : It should be the same as what defined by the 
                           LKST_ETYPE_DEF macro.
          * argument1..4 : argument for HOOK.


     c.When the trace point is in the module, the macro for declaration of HOOK
       header is put in it. And the macro for the initialization and 
       termination of HOOK is inserted respectively in the function of the 
       module-initialization and module-termination.

         Declaration macro for HOOK header
           LKST_HOOK_DECLARE(event-name)        -- Declaration for LKST_HOOK
           LKST_HOOK_DECLARE_INLINE(event-name) -- Declaration for LKST_HOOK_INLINE

         Initialization macro
           LKST_HOOK_INIT_MODULE(event-name, ret-variable)

         Termination macro
           LKST_HOOK_TERMINATE_MODULE(event-name, ret-variable)

          * event-name   : It should be the same as what defined by the 
                           LKST_ETYPE_DEF macro.
          * ret-valiable : variable which receives return value.

       See sample program "examples/lkst_mod_trc.c".

  10) How to extract LKST trace data from LKCD dump file
     a.Create dump file.
       See LKCD documents which can be downloaded at
       http://lkcd.sourceforge.net/.

       <example>
          (i) Save system memory to dump device.
              Enter the command as follows.

            % lkcd config
            % echo 1 > /proc/sys/kernel/sysrq

            Hit Sysrq+C, then start saving system memory to dump device,
            and rebooting automatically. 

            NOTE:
            At default, system memory is saved at swap disk.
            So if swapon runs in booting your system, system memory saved at
            swap disk is collapsed.
            Please save system memory at another disk except swap disk,
            or comment out the line of swapon in the rc.sysinit.

          (ii) Crate dump file.
              Enter the command as follows. Dump file is created at
              /var/log/dump/N (N is an integer of 0 or more.).

            % lkcd save

     b.Run lcrash command
          (i) Enter the command as follows. lcrash command starts and prompt is
              shown.

            % cd /var/log/dump/N
            (N is an integer of 0 or more.)
            % lcrash -n N
            map = map.N, dump = dump.N, outfile = stdout, kerntypes = kerntypes
            .N

            Please wait...
                    Initializing vmdump access ... Done.
                    Loading system map ................................. Done.
                    Loading type info (Kerntypes) ... Done.
                    Initializing arch specific data ... Done.
                    Loading ksyms from dump ....... Done.
            lkstdumplib being initialized
            >>

          (ii) Run lkstdump sub-command
               Enter the command as follows. Trace data is written to file
               <logfile>.

            >> lkstdump logfile
            buffer[0] write(1024 records)
            buffer[1] write(1024 records)
            buffer[2] write(1024 records)
            buffer[3] write(1024 records)
            buffer[4] write(1024 records)

              <logfile>'s file format is the same as the file created by
              "lkstbuf read", so you can see <logfile> by using "lsktbuf print".


Known Bugs
---------
     - GCC 2.96, which is a kernel compiler in RedHat, with -O2 option generates
       wrong code around Kernel Hooks that caused the kernel panic. To avoid this
       problem, you use the GCC 2.95, which is a standard kernel compiler.

     - The time data in the event records written by 
       IOCTL(LKST_IOC_BUFFER_READ) and lkst_buffer_read do not conform to the
       draft of POSIX standard 1003.25. 

     - logformat.pl needs large amount of time for sorting large log file.
       (logformat.pl will be deleted. Please use "lsktbuf print")

     - A lot of overwritten events may be read when using lkstlogd or
       lkstbuf read command.

     - Buffer link structure is left in ring structure when lkstlogd is
       terminated by using SIGTERM signal while writing log files.

     - Deleting buffers or masksets before stopping lkst event tracing may
       occur memory violation.

Future plans
------------
     - Performance improvement.
     - Adding more extensions for analyzing various faults.
     - Add connectivity to Event Logging (http://evlog.sourceforge.net/).


Reporting Problems and Discussing LKST
--------------------------------------
email:  lkst-develop@lists.sourceforge.net
visit:  https://sourceforge.net/projects/lkst/
        or http://oss.hitachi.co.jp/sdl/english/kstracer_main.html


Licence
--------
COPYRIGHT (C) HITACHI,LTD. 2001-2002 ALL RIGHTS RESERVED. 
COPYRIGHT (C) FUJITSU,LTD. 2001-2002 ALL RIGHTS RESERVED. 

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.


Change History
--------------
1.2	28th of June, 2002	Release v1.2
                                  Performance improvement of default handler.
                                  Performance improvement of log format tool.
                                  lkstdump script.
                                  Support devfs to make /dev/lkst.
                                  Update commands.
                                  Change some APIs spec.
                                  Fix some bugs.
1.1	14th of May, 2002	Release v1.1
				  Replace LKCD patch with the official one.
1.01	26th of April, 2002	Release v1.01
				  Update lkstlogd command.
				  Add trace data of exception events.
				  Fix some bugs.
1.0	12th of April, 2002	Release v1.0
				  Add kernel crash dump function by using LKCD.
				  Update some command.
				  Add log header to log file.
				  Fix some bugs.
0.9	31st of March, 2002	Release v0.9 for some enhancements.
				  Add LKST logging daemon,
				  Replace trace point hook to use Kernel Hooks,
				  Add some add-ons and exaples.
0.1a	18th of January, 2002	Fix bug of the racing between 
				  etrc_evhandler_default() and 
				  etrc_buffer_{delete,create}().
				Add some minor modification, 
				  clearfy some comments.
0.1	14th of December, 2001	First public drop
