Tracing Kernel Problems
When a machine mis-behaves, a smart place to look is near the end of the
file:
/var/log/messages
which may contain clues about why the systems went down. If the
machine crashed, you may also be able to find this type of message by connecting
a video monitor to the machine in question and seeing if it is displayed
on the screen (it's generally smart to look for these things before rebooting
the machine in question). Here is an interesting entry:
May 24 10:36:25 n038 kernel: Unable to handle kernel paging request
at virtual address 4857525042000000
May 24 10:36:25 n038 kernel: swapper(0): Oops 1
May 24 10:36:25 n038 kernel: pc = [<fffffc000031c7dc>] ps =
0007
May 24 10:36:25 n038 kernel: rp = [<fffffc000031c594>] sp =
fffffc0000303e48
May 24 10:36:25 n038 kernel: r0=0 r1=fffffc0000000000 r2=4857525042000000
r3=0
May 24 10:36:25 n038 kernel: r8=7
May 24 10:36:25 n038 kernel: r16=7 r17=1 r18=0 r19=0
May 24 10:36:25 n038 kernel: r20=fffffc0005befe58 r21=200 r22=1f
r23=18d42f606e
May 24 10:36:25 n038 kernel: r24=3000000000000000 r25=a r26=fffffc000031c594
r27=5
May 24 10:36:25 n038 kernel: r28=fffffc000046ddf8 r29=fffffc0000494fe0
r30=fffffc0000303e48
May 24 10:36:25 n038 kernel: Code: e4600001 b4430008
e4400001 <b4620000> b7e10008 b7e10000 47ff041c 479c0410
00000035
May 24 10:36:25 n038 kernel: Aiee, killing interrupt handler
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046b3d8,
next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046b3c0,
next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: kfree of non-kmalloced memory: fffffc000046bce0,
next= 0000000000000000, order=0
May 24 10:36:25 n038 kernel: idle task may not sleep
May 24 10:36:25 n038 last message repeated 4 times
May 24 10:36:25 n038 kernel: CIA machine check: reason
for machine-check unknown (0x311708)
These types of entries may be easily traced. The message above
shows that the program swapper (process number 0) made an invalid
paging request and gives stack traceback information. In particular
it shows various registers at the time that the invalid request was made,
such as the program counter, stack pointer, etc. [The method of tracking
these down to a particular line of code is documented in the file
/usr/src/linux/Documentation/oops-tracing.txt.]
Here is how:
-
Make note of the address at which an error took place (in our case, program
counter = pc = 00031c7dc,calling function = stack pointer
= sp = 0000303e48). The "Code" line I think shows a stack
trace though I am not quite sure...
-
You should verify that the kernel which generated the fault is the same
one that we have access to for debugging. I have kept a copy of the current
kernel with the symbol table still attached, in /home/vmlinux, with
the stripped and compressed copy in /home/vmlinux.gz. On the problem
machine, do:
diff /boot/vmlinux.gz /home/vmlinux.gz
(if diff reports no differences, then the saved kernel on /home/vmlinux
is the same as the running kernel.)
-
You need to find the function names associated with these addresses in
the kernel.
To do this, first get a listing of the kernel functions/addresses by
using the nm command:
nm /home/vmlinux | sort | less
scroll though this output until you find the addresses in question:
...
fffffc0000300000 A swapper_pg_dir
fffffc0000310000 T __start
...
fffffc000031c410 t timer_bh..ng
fffffc000031c888 T do_timer
...
-
Now you can disassemble the probem-causing function using the debugger
on the non-striped vmlinux:
gdb /home/vmlinux
(gdb) disassemble do_timer
Dump of assembler code for function do_timer:
0xfffffc000031c888 <do_timer>:
27bb0018 ldah gp,24(t12)
0xfffffc000031c88c <do_timer+4>:
23bdeea0 lda gp,-4448(gp)
0xfffffc000031c890 <do_timer..ng>:
23defff0 lda sp,-16(sp)
At this point, you can read the source code for the do_timer function
and locate the source code line which caused the problem. I would
document that here, except that the problem above is a "fake' one -- I
will illustrate it for a real problem when one occurs.
Page author: Bruce Allen