[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <200712202047.56745.m.kozlowski@tuxland.pl>
Date: Thu, 20 Dec 2007 20:47:55 +0100
From: Mariusz Kozlowski <m.kozlowski@...land.pl>
To: Matt Mackall <mpm@...enic.com>
Cc: David Miller <davem@...emloft.net>, akpm@...ux-foundation.org,
linux-kernel@...r.kernel.org, sparclinux@...r.kernel.org
Subject: Re: 2.6.24-rc5-mm1: problems with cat /proc/kpageflags
Hello,
> > > Actually, you may only need these two:
> > >
> > > > maps4-add-proc-kpagecount-interface.patch
> > > > maps4-add-proc-kpageflags-interface.patch
> >
> > Yes these two were enough, and exporting fs/proc/base.c's
> > mem_lseek().
> >
> > As hard as I try, I can't reproduce this at all. I tried
> > both on my workstation and my niagara boxes.
>
> That's good to know, I was having a very hard time imagining how the
> kpagecount code could be going south.
>
> > It must be other needle in the 30MB+ -mm haystack. :-(
I'm afraid you are wrong. Eariler kernel are affected as well. At reading your mail I was
thinking of applying those two patches to 2.6.24-rc5 and do bisection on the rest of -mm series.
Unfortunately clean 2.6.24-rc5 with these two patches is affected as well (new processes
stuck in D state etc). So I tried vanilla 2.6.23 patched by these two patches (and
mem_lseek export from fs/proc/base.c). Now at least I got a trace produced by 'cat /proc/kpagecount'
which you can find below. Also, in spite of the oops, the box doesn't get locked (as with -mm)
and is still usable.
[ 126.060976] TSTATE: 0000009980009603 TPC: 0000000000428a84 TNPC: 0000000000428a88 Y: 00000000 Not tainted
[ 126.063486] TPC: <cpu_idle+0x2c/0xe0>
[ 126.065986] g0: 0000000000000009 g1: 0000048000004000 g2: 000000000000000f g3: 00000000007204c0
[ 126.068636] g4: 00000000007244c0 g5: fffff8007f878000 g6: 00000000007204c0 g7: 0000000000724958
[ 126.071232] o0: 0000000000000001 o1: 00000000007204c8 o2: 0000000000000001 o3: 0000000000000000
[ 126.073924] o4: 6000000000000000 o5: 000000000078f140 sp: 00000000007239b1 ret_pc: 0000000000428a78
[ 126.076569] RPC: <cpu_idle+0x20/0xe0>
[ 126.079185] l0: 0000000000720000 l1: 0000000000000002 l2: 0000000000000001 l3: 000000000075d400
[ 126.081934] l4: 000000000075d400 l5: fffff80080015b10 l6: fffff80080005b08 l7: 0000000000000001
[ 126.084637] i0: 0000000000000001 i1: 0000000000720094 i2: 0000000000000000 i3: 0000000000000000
[ 126.087375] i4: 00000000007204c0 i5: 0000000000000002 i6: 0000000000723a71 i7: 0000000000665a24
[ 126.090135] I7: <rest_init+0x6c/0x80>
[ 145.121228] Unable to handle kernel NULL pointer dereference
[ 145.124515] tsk->{mm,active_mm}->context = 0000000000000d41
[ 145.127778] tsk->{mm,active_mm}->pgd = fffff800bd8d2000
[ 145.127801] \|/ ____ \|/
[ 145.127808] "@'/ .. \`@"
[ 145.127815] /_| \__/ |_\
[ 145.127821] \__U_/
[ 145.127831] cat(3111): Oops [#1]
[ 145.127849]
[ 145.127853] =================================
[ 145.127861] [ INFO: inconsistent lock state ]
[ 145.127873] 2.6.23 #1
[ 145.127880] ---------------------------------
[ 145.127891] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
[ 145.127906] cat/3111 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 145.127918] (regdump_lock){+...}, at: [<00000000004281d0>] __show_regs+0x18/0x320
[ 145.127951] {in-hardirq-W} state was registered at:
[ 145.127960] [<0000000000669780>] _spin_lock+0x28/0x40
[ 145.127983] [<00000000004281d0>] __show_regs+0x18/0x320
[ 145.128000] [<00000000004284e4>] show_regs+0xc/0x20
[ 145.128016] [<00000000005ac9d8>] sysrq_handle_showregs+0x20/0x40
[ 145.128041] [<00000000005ac7fc>] __handle_sysrq+0x84/0x160
[ 145.128060] [<00000000005ac8f8>] handle_sysrq+0x20/0x40
[ 145.128078] [<00000000005a4f08>] kbd_event+0x670/0xb60
[ 145.128110] [<00000000005ea0c0>] input_event+0x1e8/0x560
[ 145.128140] [<00000000005efa2c>] sunkbd_interrupt+0x114/0x140
[ 145.128167] [<00000000005e6270>] serio_interrupt+0x38/0xa0
[ 145.128186] [<00000000005b2e58>] sunsu_kbd_ms_interrupt+0xa0/0x140
[ 145.128212] [<000000000049f6f8>] handle_IRQ_event+0x20/0x80
[ 145.128251] [<000000000049f808>] __do_IRQ+0xb0/0x140
[ 145.128268] [<000000000042f48c>] handler_irq+0x94/0xc0
[ 145.128306] [<0000000000426f30>] sunos_sys_table+0x560/0x728
[ 145.128324] [<0000000000428a78>] cpu_idle+0x20/0xe0
[ 145.128341] [<0000000000665a24>] rest_init+0x6c/0x80
[ 145.128375] [<000000000076ec24>] start_kernel+0x2ec/0x340
[ 145.128405] [<000000000066599c>] tlb_fixup_done+0xa0/0xbc
[ 145.128425] [<0000000000000000>] 0x8
[ 145.128443] irq event stamp: 1209
[ 145.128451] hardirqs last enabled at (1209): [<0000000000404b74>] __handle_softirq_continue+0x20/0x24
[ 145.128480] hardirqs last disabled at (1207): [<0000000000474494>] __do_softirq+0xbc/0x140
[ 145.128506] softirqs last enabled at (1208): [<00000000004744dc>] __do_softirq+0x104/0x140
[ 145.128526] softirqs last disabled at (1203): [<00000000004745a0>] do_softirq+0x88/0xa0
[ 145.128546]
[ 145.128551] other info that might help us debug this:
[ 145.128562] no locks held by cat/3111.
[ 145.128570]
[ 145.128574] stack backtrace:
[ 145.128582] Call Trace:
[ 145.128590] [00000000004907a0] print_usage_bug+0x148/0x160
[ 145.128624] [00000000004917f4] mark_lock+0x6dc/0x780
[ 145.128641] [000000000049286c] __lock_acquire+0x734/0x12a0
[ 145.128659] [0000000000493430] lock_acquire+0x58/0x80
[ 145.128676] [0000000000669780] _spin_lock+0x28/0x40
[ 145.128691] [00000000004281d0] __show_regs+0x18/0x320
[ 145.128706] [0000000000429ba0] die_if_kernel+0x68/0x2c0
[ 145.128722] [0000000000452ab0] unhandled_fault+0x78/0xe0
[ 145.128749] [0000000000452d14] do_sparc64_fault+0x17c/0x620
[ 145.128765] [000000000040798c] sparc64_realfault_common+0x18/0x20
[ 145.128787] [fffff800bdca3e80] 0xfffff800bdca3e88
[ 145.128799] [000000000050affc] proc_reg_read+0x64/0xa0
[ 145.128828] [00000000004ccb4c] vfs_read+0x74/0x120
[ 145.128856] [00000000004ccf4c] sys_read+0x34/0x60
[ 145.128872] [0000000000406314] linux_sparc_syscall+0x3c/0x44
[ 145.128898] [0000000000012ff4] 0x12ffc
[ 145.128915] TSTATE: 0000004411009603 TPC: 00000000005119ac TNPC: 00000000005119b0 Y: 00000000 Not tainted
[ 145.128940] TPC: <kpagecount_read+0x94/0xe0>
[ 145.128951] g0: 0000000000000000 g1: 0000000000000058 g2: 0000000000000000 g3: 0000000000028008
[ 145.128966] g4: fffff800bfc3a460 g5: fffff8007f878000 g6: fffff800bdca0000 g7: 0000000000000000
[ 145.128982] o0: 0000000000000001 o1: 0000000000000001 o2: 000000000050afe4 o3: 0000000000000000
[ 145.128997] o4: 0000000000000002 o5: 0000000000b80320 sp: fffff800bdca3391 ret_pc: fffff800bdca3e80
[ 145.129013] RPC: <0xfffff800bdca3e88>
[ 145.129023] l0: fffff800bfc3a460 l1: 0000000000669d3c l2: 0000000000000001 l3: 000000000075d400
[ 145.129039] l4: 000000000075d400 l5: fffff80080015b10 l6: fffff80080005b08 l7: 0000000000000001
[ 145.129054] i0: 0000000000028010 i1: 0000000000028000 i2: 0000000000001ff8 i3: 0000000000000002
[ 145.129070] i4: 0000000000000058 i5: 0000000000000000 i6: fffff800bdca3451 i7: 000000000050affc
[ 145.129088] I7: <proc_reg_read+0x64/0xa0>
[ 145.129119] Caller[000000000050affc]: proc_reg_read+0x64/0xa0
[ 145.129139] Caller[00000000004ccb4c]: vfs_read+0x74/0x120
[ 145.129156] Caller[00000000004ccf4c]: sys_read+0x34/0x60
[ 145.129173] Caller[0000000000406314]: linux_sparc_syscall+0x3c/0x44
[ 145.129193] Caller[0000000000012ff4]: 0x12ffc
[ 145.129205] Instruction DUMP: 82070002 02c04003 86063ff8 <ce406008> cef0e000 82100000 8610001b b406bff8 80a06000
> Have we seen a config for the broken machine? Perhaps that'll help us
> make a guess..
Please find it attached (version 2.6.23).
The box is sun ultra 60 with 2 cpus.
# lspci
0000:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus Module
0000:00:01.0 Bridge: Sun Microsystems Computer Corp. EBUS (rev 01)
0000:00:01.1 Ethernet controller: Sun Microsystems Computer Corp. Happy Meal (rev 01)
0000:00:03.0 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)
0000:00:03.1 SCSI storage controller: LSI Logic / Symbios Logic 53c875 (rev 14)
0001:00:00.0 Host bridge: Sun Microsystems Computer Corp. Psycho PCI Bus Module
# cat /proc/cpuinfo
cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
prom : OBP 3.17.0 1998/10/23 11:26
type : sun4u
ncpus probed : 2
ncpus active : 2
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 000000001ad1c43b
Cpu2ClkTck : 000000001ad1c43b
MMU Type : Spitfire
State:
CPU0: online
CPU2: online
# cat /proc/meminfo
MemTotal: 1015648 kB
MemFree: 961840 kB
Buffers: 5680 kB
Cached: 18096 kB
SwapCached: 0 kB
Active: 22440 kB
Inactive: 10288 kB
SwapTotal: 497992 kB
SwapFree: 497992 kB
Dirty: 32 kB
Writeback: 0 kB
AnonPages: 9168 kB
Mapped: 4288 kB
Slab: 10368 kB
SReclaimable: 4008 kB
SUnreclaim: 6360 kB
PageTables: 424 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 1005816 kB
Committed_AS: 27272 kB
VmallocTotal: 4194304 kB
VmallocUsed: 136 kB
VmallocChunk: 4194168 kB
# cat /proc/interrupts
CPU0 CPU2
0: 24567 16248 <NULL> timer
1: 0 0 sun4u PSYCHO_PCIERR
2: 0 0 sun4u PSYCHO_UE
3: 0 0 sun4u PSYCHO_CE
8: 291 0 sun4u su(kbd)
9: 0 0 sun4u su(mouse)
14: 1061 0 sun4u eth0
15: 2034 0 sun4u sym53c8xx
16: 0 30 sun4u sym53c8xx
17: 0 0 sun4u PSYCHO_PCIERR
I'll try earilier kernels and see what happens.
Regards,
Mariusz
View attachment "config-sparc64-2.6.23" of type "text/plain" (19369 bytes)
Powered by blists - more mailing lists