linux-kernel - Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b647ffbd0807101316o97fe6d4p5b2cbcda472f2ae1@mail.gmail.com>
Date:	Thu, 10 Jul 2008 22:16:06 +0200
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	"Vegard Nossum" <vegard.nossum@...il.com>
Cc:	"Pekka Enberg" <penberg@...helsinki.fi>,
	"Christoph Lameter" <clameter@....com>,
	Yanmin <yanmin_zhang@...ux.intel.com>,
	"Rusty Russell" <rusty@...tcorp.com.au>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>,
	"Dhaval Giani" <dhaval@...ux.vnet.ibm.com>,
	"Gautham R Shenoy" <ego@...ibm.com>,
	"Heiko Carstens" <heiko.carstens@...ibm.com>, miaox@...fujitsu.com,
	"Lai Jiangshan" <laijs@...fujitsu.com>,
	"Avi Kivity" <avi@...ranet.com>, linux-kernel@...r.kernel.org
Subject: Re: v2.6.26-rc9: kernel BUG at kernel/sched.c:5858!

2008/7/10 Vegard Nossum <vegard.nossum@...il.com>:
> Okay, some more info on this one...
>
> On Thu, Jul 10, 2008 at 4:16 PM, Vegard Nossum <vegard.nossum@...il.com> wrote:
>> BUG: unable to handle kernel paging request at da87d000
>> IP: [<c01991c7>] kmem_cache_alloc+0xc7/0xe0
>> *pde = 28180163 *pte = 1a87d160
>> Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
>> Pid: 3850, comm: grep Not tainted (2.6.26-rc9-00059-gb190333 #5)
>> EIP: 0060:[<c01991c7>] EFLAGS: 00210203 CPU: 0
>> EIP is at kmem_cache_alloc+0xc7/0xe0
>> EAX: 00000000 EBX: da87c100 ECX: 1adad71a EDX: 6b6b6b6b
>> ESI: 00200282 EDI: da87d000 EBP: f60bfe74 ESP: f60bfe54
>>  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
>
> The register %ecx looks innocent but is very important here. The disassembly:
>
> mov    %edx,%ecx
> shr    $0x2,%ecx
> rep stos %eax,%es:(%edi) <-- the fault
>
> So %ecx has been loaded from %edx... which is 0x6b6b6b6b/POISON_FREE.
> (0x6b6b6b6b >> 2 == 0x1adadada.)
>
> %ecx is the counter for the memset, from here:
>
>        memset(object, 0, c->objsize);
>
> i.e. %ecx was loaded from c->objsize, so "c" must have been freed.
> Where did "c" come from? Uh-oh...
>
>        c = get_cpu_slab(s, smp_processor_id());
>
> This looks like it has very much to do with CPU hotplug/unplug. Is
> there a race between SLUB/hotplug since the CPU slab is used after it
> has been freed?

Good analysis.

[ quick look ]

Yeah, it's possible that a caller of kmem_cache_alloc() ->
slab_alloc() can be migrated on another CPU right after
local_irq_restore() and before memset(). The inital cpu can become
offline in the mean time (or a migration is a consequence of the CPU
going offline) so its 'kmem_cache_cpu' structure gets freed (
slab_cpuup_callback).

At some point of time the caller continues on another CPU having an
obsolete pointer...

does something like this help?

diff --git a/mm/slub.c b/mm/slub.c
index 1a427c0..315c392 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1628,9 +1628,11 @@ static __always_inline void *slab_alloc(struct
kmem_cache *s,
        void **object;
        struct kmem_cache_cpu *c;
        unsigned long flags;
+       unsigned int objsize;

        local_irq_save(flags);
        c = get_cpu_slab(s, smp_processor_id());
+       objsize = c->objsize;
        if (unlikely(!c->freelist || !node_match(c, node)))

                object = __slab_alloc(s, gfpflags, node, addr, c);
@@ -1643,7 +1645,7 @@ static __always_inline void *slab_alloc(struct
kmem_cache *s,
        local_irq_restore(flags);

        if (unlikely((gfpflags & __GFP_ZERO) && object))
-               memset(object, 0, c->objsize);
+               memset(object, 0, objsize);

        return object;
 }


>
>
> Vegard
>
> --
> "The animistic metaphor of the bug that maliciously sneaked in while
> the programmer was not looking is intellectually dishonest as it
> disguises that the error is the programmer's own creation."
>        -- E. W. Dijkstra, EWD1036
>



-- 
Best regards,
Dmitry Adamushko

View attachment "002-fix-slub-hotplug.patch" of type "text/x-diff" (730 bytes)