lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100705085550.GA26775@elte.hu>
Date:	Mon, 5 Jul 2010 10:55:50 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <axboe@...nel.dk>,
	Christoph Hellwig <hch@...radead.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: [regression] Crash in wb_clear_pending() (was: Linux 2.6.35-rc4)


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> [...] And filesystem writeback performance issues, etc. It's all good.

> Christoph Hellwig (11):
>       writeback: fix writeback completion notifications
>       writeback: queue work on stack in writeback_inodes_sb
>       writeback: enforce s_umount locking in writeback_inodes_sb
>       writeback: fix writeback_inodes_wb from writeback_inodes_sb
>       writeback: simplify wakeup_flusher_threads
>       writeback: simplify and split bdi_start_writeback
>       writeback: add missing requeue_io in writeback_inodes_wb
>       writeback: fix pin_sb_for_writeback

Hm, those writeback patches are not looking overly good here - the kernel 
crashes quite a lot during bootup:

BUG: unable to handle kernel paging request at 000000000528aa7e
IP: [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66
PGD 0 
Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
last sysfs file: /sys/class/net/eth0/broadcast
CPU 1 
Modules linked in:

Pid: 1637, comm: flush-8:0 Not tainted 2.6.35-rc4-tip-01075-g0c1880a-dirty #15306 A8N-E/System Product Name
RIP: 0010:[<ffffffff8112d514>]  [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66
RSP: 0018:ffff88003b899d90  EFLAGS: 00010286
RAX: 000000000528aa7e RBX: ffff880038e3d5f8 RCX: ffff88003b952000
RDX: ffff88003dd8c468 RSI: ffff88003dd8c420 RDI: ffff88003dd8c408
RBP: ffff88003b899da0 R08: ffffffff825bf870 R09: 0000000000000000
R10: ffff88003b899cf0 R11: 0000000000000002 R12: ffff88003dd8c408
R13: 0000000000000000 R14: ffff880038e3d5f8 R15: ffff88003dd8c468
FS:  00007f2d706b36f0(0000) GS:ffff880003200000(0000) knlGS:00000000f77bb6c0
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000528aa7e CR3: 0000000001d1e000 CR4: 00000000000006a0
DR0: ffffffff82ceebbc DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
Process flush-8:0 (pid: 1637, threadinfo ffff88003b898000, task ffff88003b952000)
Stack:
 ffff88003dd8c3a8 0000000000000000 ffff88003b899e30 ffffffff8112d5d7
<0> ffffffff8112d5de ffffffff823ae2a0 0000000000000000 ffff88003b899de0
<0> 00000000000009c4 ffff88003b899fd8 000000000000000a 0000000000000000
Call Trace:
 [<ffffffff8112d5d7>] wb_do_writeback+0x98/0x1d7
 [<ffffffff8112d5de>] ? wb_do_writeback+0x9f/0x1d7
 [<ffffffff8112d755>] bdi_writeback_task+0x3f/0xfc
 [<ffffffff810809a7>] ? bit_waitqueue+0x17/0xa9
 [<ffffffff810ee2d0>] ? bdi_start_fn+0x0/0xda
 [<ffffffff810ee346>] bdi_start_fn+0x76/0xda
 [<ffffffff810ee2d0>] ? bdi_start_fn+0x0/0xda
 [<ffffffff810805e9>] kthread+0x7f/0x87
 [<ffffffff8102ac04>] kernel_thread_helper+0x4/0x10
 [<ffffffff8105af79>] ? finish_task_switch+0x70/0xdc
 [<ffffffff818674c5>] ? _raw_spin_unlock_irq+0x4f/0x6c
 [<ffffffff81867994>] ? restore_args+0x0/0x30
 [<ffffffff8108056a>] ? kthread+0x0/0x87
 [<ffffffff8102ac00>] ? kernel_thread_helper+0x0/0x10
Code: ff 4e 28 0f 94 c0 84 c0 74 47 4c 8b 67 10 49 81 c4 10 02 00 00 4c 89 e7 e8 d5 92 73 00 48 8b 43 08 48 8b 13 4c 89 e7 48 89 42 08 <48> 89 10 48 b8 00 02 20 00 00 00 ad de 48 89 43 08 e8 4d a0 73 
RIP  [<ffffffff8112d514>] wb_clear_pending+0x3b/0x66

It appears to be a race of some sorts and is not reproducible consistently: it 
occurs with an about 2-10% likelyhood in my test runs and i've seen about ~5 
similar crashes so far. They appeared after you pulled those writeback 
patches. I had a long and successful testrun of 1000+ kernels with no crashes, 
shortly before those wb patches went upstream.

Given the small linear chance for the crash [i'd have to do a series of 100 
reboots per bisection point to be reasonably sure about goodness/badness] i'd 
rather not try to bisect it - but can try fix patches or can try specific 
reverts as well.

Config and full crashlog attached. (Note, the log is from 
tip-2.6.35-rc4-tip-01075-g0c1880a but the crash clearly came in from upstream, 
the patterns are consistent and nothing remotely related happened in -tip in 
that timeframe.)

Thanks,

	Ingo

View attachment "crash.log" of type "text/plain" (167493 bytes)

View attachment "config" of type "text/plain" (65831 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ