linux-kernel - Re: 2.6.25-$sha1: RIP call_for_each

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080430221250.GA10150@martell.zuzino.mipt.ru>
Date:	Thu, 1 May 2008 02:12:50 +0400
From:	Alexey Dobriyan <adobriyan@...il.com>
To:	Jens Axboe <jens.axboe@...cle.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: 2.6.25-$sha1: RIP call_for_each_cic+0x25/0x50

On Tue, Apr 29, 2008 at 11:06:05AM +0200, Jens Axboe wrote:
> On Tue, Apr 29 2008, Alexey Dobriyan wrote:
> > On Mon, Apr 28, 2008 at 11:55:09PM +0400, Alexey Dobriyan wrote:
> > > On Mon, Apr 28, 2008 at 02:04:13PM +0200, Jens Axboe wrote:
> > > > On Mon, Apr 28 2008, Andrew Morton wrote:
> > > > > On Mon, 28 Apr 2008 02:55:53 +0400 Alexey Dobriyan <adobriyan@...il.com> wrote:
> > > > > 
> > > > > > This happened while ~90 cross-compile jobs were running in parallel on
> > > > > > ext2/noatime partition (slowly -- much debugging was on)
> > > > > > 
> > > > > > 
> > > > > > general protection fault: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
> > > > > > CPU 0 
> > > > > > Modules linked in: ext2 nf_conntrack_irc ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables usblp uhci_hcd ehci_hcd usbcore sr_mod cdrom
> > > > > > Pid: 16483, comm: as Not tainted 2.6.25-c3bf9bc243092c53946fd6d8ebd6dc2f4e572d48 #1
> > > > > > RIP: 0010:[<ffffffff80307525>]  [<ffffffff80307525>] call_for_each_cic+0x25/0x50
> > > > > > RSP: 0018:ffff810170811e58  EFLAGS: 00010202
> > > > > > RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
> > > > > > RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff81010ff92000
> > > > > > RBP: ffff810170811e78 R08: 0000000000000001 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: ffff8100010069d8 R12: ffff810138ada300
> > > > > > R13: ffffffff803075b0 R14: ffff81017fcd2000 R15: ffff81010ff92168
> > > > > > FS:  00002ac3462426f0(0000) GS:ffffffff805d0000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > > > > > CR2: 00002ab602550000 CR3: 000000013609d000 CR4: 0000000000000660
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > > > > Process as (pid: 16483, threadinfo ffff810170810000, task ffff81010ff92000)
> > > > > > Stack:  ffff810170811e88 ffff810138ada300 0000000000000010 ffff81010ff92100
> > > > > >  ffff810170811e88 ffffffff80307580 ffff810170811ea8 ffffffff80302a55
> > > > > >  ffff81010ff92100 ffff810138ada300 ffff810170811ec8 ffffffff80302b1f
> > > > > > Call Trace:
> > > > > >  [<ffffffff80307580>] cfq_free_io_context+0x10/0x20
> > > > > >  [<ffffffff80302a55>] put_io_context+0x85/0x90
> > > > > >  [<ffffffff80302b1f>] exit_io_context+0x8f/0xb0
> > > > > >  [<ffffffff80235d19>] do_exit+0x549/0x780
> > > > > >  [<ffffffff80235f8e>] do_group_exit+0x3e/0xb0
> > > > > >  [<ffffffff80236012>] sys_exit_group+0x12/0x20
> > > > > >  [<ffffffff8020b6db>] system_call_after_swapgs+0x7b/0x80
> > > > > > 
> > > > > > 
> > > > > > Code: 84 00 00 00 00 00 55 48 89 e5 41 55 49 89 f5 41 54 49 89 fc 53 48 83 ec 08 e8 18 e1 f5 ff 49 8b 44 24 68 48 85 c0 74 1e 48 89 c3 <48> 8b 03 48 8d 73 88 4c 89 e7 0f 18 08 41 ff d5 48 8b 03 48 85 
> > > > > > RIP  [<ffffffff80307525>] call_for_each_cic+0x25/0x50
> > > > > >  RSP <ffff810170811e58>
> > > > > > ---[ end trace ca143223eefdc828 ]---
> > > > > > Fixing recursive fault but reboot is needed!
> > > > > cfq-iosched.c hasn't been altered (yet) so it might not be a regression.
> > 
> > 
> > > > It's not a regression, it's definitely in 2.6.25 as well. So that's a
> > > > bit scary, I've been looking over this stuff this morning but haven't
> > > > pin pointed anything yet.
> > > > 
> > > > Alexey, is this something that reproduces for you?
> > > 
> > > Not yet, second run of same workload went fine and I've never seen such
> > > oopses before.
> > 
> > And it oopses the very same way on the third run. as(1) again.
> > So if there are any debugging patches, let me know.
> 
> There seems to be a small race in the destructor path, can you see if
> this makes a difference?
> 
> diff --git a/block/blk-ioc.c b/block/blk-ioc.c
> index e34df7c..012f065 100644
> --- a/block/blk-ioc.c
> +++ b/block/blk-ioc.c
> @@ -41,8 +41,8 @@ int put_io_context(struct io_context *ioc)
>  		rcu_read_lock();
>  		if (ioc->aic && ioc->aic->dtor)
>  			ioc->aic->dtor(ioc->aic);
> -		rcu_read_unlock();
>  		cfq_dtor(ioc);
> +		rcu_read_unlock();
>  
>  		kmem_cache_free(iocontext_cachep, ioc);
>  		return 1;

This helps in sense that 3 times bulk cross-compiles finish to the end.
You'll hear me if another such oops will resurface.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/