Tracing Kernel Problems

When a machine mis-behaves, a smart place to look is near the end of the file:
    /var/log/messages
which may contain clues about why the systems went down.  If the machine crashed, you may also be able to find this type of message by connecting a video monitor to the machine in question and seeing if it is displayed on the screen (it's generally smart to look for these things before rebooting the machine in question).  Here is an interesting entry:



May 24 10:36:25 n038 kernel: Unable to handle kernel paging request at virtual address 4857525042000000
May 24 10:36:25 n038 kernel: swapper(0): Oops 1
May 24 10:36:25 n038 kernel: pc = [<fffffc000031c7dc>] ps = 0007
May 24 10:36:25 n038 kernel: rp = [<fffffc000031c594>] sp = fffffc0000303e48
May 24 10:36:25 n038 kernel: r0=0 r1=fffffc0000000000 r2=4857525042000000 r3=0
May 24 10:36:25 n038 kernel: r8=7
May 24 10:36:25 n038 kernel: r16=7 r17=1 r18=0 r19=0
May 24 10:36:25 n038 kernel: r20=fffffc0005befe58 r21=200 r22=1f r23=18d42f606e
May 24 10:36:25 n038 kernel: r24=3000000000000000 r25=a r26=fffffc000031c594 r27=5
May 24 10:36:25 n038 kernel: r28=fffffc000046ddf8 r29=fffffc0000494fe0 r30=fffffc0000303e48
May 24 10:36:25 n038 kernel: Code: e4600001  b4430008  e4400001 <b4620000> b7e10008  b7e10000  47ff041c  479c0410  00000035
May 24 10:36:25 n038 kernel: Aiee, killing interrupt handler
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046b3d8, next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046b3c0, next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046bce0, next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: idle task may not sleep
May 24 10:36:25 n038 last message repeated 4 times
May 24 10:36:25 n038 kernel:   CIA machine check: reason for machine-check unknown (0x311708)


These types of entries may be easily traced.  The message above shows that the program swapper (process number 0) made an invalid paging request and gives stack traceback information.  In particular it shows various registers at the time that the invalid request was made, such as the program counter, stack pointer, etc. [The method of tracking these down to a particular line of code is documented in the file  /usr/src/linux/Documentation/oops-tracing.txt.]

Here is how:

 

Page author: Bruce Allen