linux-kernel - Re: [CRED bug?] 2.6.29-rc3 don't survive on stress workload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <19f34abd0902101956t2af01f9cifeab655c1f6625eb@mail.gmail.com>
Date:	Wed, 11 Feb 2009 04:56:14 +0100
From:	Vegard Nossum <vegard.nossum@...il.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	David Howells <dhowells@...hat.com>,
	Serge Hallyn <serue@...ibm.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Lee Schermerhorn <Lee.Schermerhorn@...com>
Subject: Re: [CRED bug?] 2.6.29-rc3 don't survive on stress workload

On Tue, Feb 10, 2009 at 8:28 AM, KOSAKI Motohiro
<kosaki.motohiro@...fujitsu.com> wrote:
>> That stack trace looks somewhat similar to the one in
>> http://lkml.org/lkml/2009/2/6/136
>>
>> If this is reproducible, maybe a patch like the one attached can help
>> pinpoint it?
>
> Thanks. I'll try it.
> please wait one night, it need to reproduce.

Wow, it seems that I was able to reproduce it (somewhat, somehow) too:

[13359.131495] ------------[ cut here ]------------
[13359.133489] kernel BUG at mm/slub.c:2750!
[13359.133489] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[13359.133489] last sysfs file: /sys/devices/pnp0/00:0d/id
[13359.133489] CPU 1
[13359.133489] Modules linked in:
[13359.133489] Pid: 917, comm: udevd Not tainted 2.6.29-rc3 #223
[13359.133489] RIP: 0010:[<ffffffff810b99c9>]  [<ffffffff810b99c9>] kfree+0x29/7
[13359.133489] RSP: 0000:ffff88003f187e28  EFLAGS: 00010246
[13359.133489] RAX: 0100000000000400 RBX: ffffffff8171fe00 RCX: 0000000000000086
[13359.133489] RDX: ffffe20000050ec8 RSI: 0000000000000085 RDI: ffffe20000050ec8
[13359.133489] RBP: ffff88003f187e38 R08: 0000000000000585 R09: ffffffff81819cb0
[13359.133489] R10: ffff88003e457b40 R11: ffff88003f187e98 R12: ffffffff81072144
[13359.133489] R13: 0000000000000001 R14: ffffffff818b13e0 R15: 000000000000000a
[13359.133489] FS:  0000000000000000(0000) GS:ffff88003f156f80(0063) knlGS:00000
[13359.218474] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
[13359.218474] CR2: 0000000043d6a0ac CR3: 000000003e407000 CR4: 00000000000006a0
[13359.218474] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[13359.239494] ------------[ cut here ]------------
[13359.239498] WARNING: at lib/kref.c:43 kref_get+0x27/0x30()
[13359.239501] Hardware name: 945P-A
[13359.239503] Modules linked in:
[13359.239508] Pid: 2463, comm: a.out Not tainted 2.6.29-rc3 #223
[13359.239511] Call Trace:
[13359.239521]  [<ffffffff8103c93a>] warn_slowpath+0xb6/0xf2
[13359.239529]  [<ffffffff810b5452>] ? alloc_pages_current+0xbe/0xc7
[13359.239536]  [<ffffffff810b734e>] ? get_partial_node+0x22/0x87
[13359.239540]  [<ffffffff810b9705>] ? __slab_alloc+0xd6/0x371
[13359.239547]  [<ffffffff8103238d>] ? set_next_entity+0x8a/0xda
[13359.239553]  [<ffffffff811b2f9b>] kref_get+0x27/0x30
[13359.239560]  [<ffffffff810465ce>] alloc_uid+0xe0/0x1d5
[13359.239568]  [<ffffffff8104b501>] set_user+0x2f/0x88
[13359.239574]  [<ffffffff8104b842>] sys_setreuid+0xcd/0x133
[13359.239579]  [<ffffffff8102d398>] sysenter_dispatch+0x7/0x27
[13359.239582] ---[ end trace 41e0e7b4a6e4140a ]---
[13359.218474] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[13359.346481] Process udevd (pid: 917, threadinfo ffff88003e456000, task ffff8)
  Booting 'Fedora Core (2.6.20.9)'

(spontaneous reboot)

The second BUG is the one from my patch:

        WARN_ON(atomic_read(&kref->refcount) <= 0);

This was a program that forked and did setreuid(0, 99999);
setreuid(99999, 0); in a loop (to alloc/free uids quickly).

My theory is that the reference counting for 'struct user_struct' is
wrong in the case that CONFIG_USER_SCHED=y (check out free_user() in
the two cases), but I don't know that for sure. What is the setting of
this config variable in your configuration?

Will refine my test program to see if I can trigger this immediately
and accurately.


Vegard

-- 
"The animistic metaphor of the bug that maliciously sneaked in while
the programmer was not looking is intellectually dishonest as it
disguises that the error is the programmer's own creation."
	-- E. W. Dijkstra, EWD1036
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/