linux-kernel - Re: linux 2.4.32: kernel BUG in slab.c:1582

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20061228044300.GM24090@1wt.eu>
Date:	Thu, 28 Dec 2006 05:43:00 +0100
From:	Willy Tarreau <w@....eu>
To:	Sebastian Kuzminsky <seb@...hlab.com>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: linux 2.4.32: kernel BUG in slab.c:1582

Hi !

On Wed, Dec 20, 2006 at 12:21:59PM -0700, Sebastian Kuzminsky wrote:
> Hi folks, I've got an old PC (Pentium 133 MHz, 64 MB RAM, no swap)
> running Linux 2.4.32, and lately I've been getting kernel BUGs like this:
> 
>     kernel: kernel BUG at slab.c:1582!
>     kernel: invalid operand: 0000
>     kernel: CPU:    0
>     kernel: EIP:    0010:[kmem_cache_free+105/624]    Not tainted
>     kernel: EFLAGS: 00010293
>     kernel: eax: c10b7354   ebx: 00003ef0   ecx: 0002b450   edx: c100001c
>     kernel: esi: 000ad140   edi: c3ef0634   ebp: 00000023   esp: c10c3f4c
>     kernel: ds: 0018   es: 0018   ss: 0018
>     kernel: Process kswapd (pid: 4, stackpage=c10c3000)
>     kernel: Stack: c10b7354 c2c13e3c c3ef0634 c036dd3c 00000023 c01426ca c10b7354 c3ef0634
>     kernel:        0000003c 000001d0 00000009 c0232f18 c01429b5 00000379 c012a7b8 00000006
>     kernel:        000001d0 00000000 00000000 c0232f18 00000001 c10c2000 00000000 c012a947
>     kernel: Call Trace:    [prune_dcache+266/320] [shrink_dcache_memory+37/64] [try_to_free_pages_zone+104/208] [kswapd_balance_pgdat+87/160] [kswapd_balance+22/48]
>     kernel:   [kswapd+143/176] [_stext+0/48] [arch_kernel_thread+35/48] [kswapd+0/176]
>     kernel:
>     kernel: Code: 0f 0b 2e 06 7a c0 20 c0 9c 8f 04 24 fa 3b 1d 20 82 28 c0 89
> 
> 
> The system is a bit tight on memory, but /proc/meminfo reports MemFree +
> Buffers + Cached > 10 MB.
> 
> After kswapd stepped on the BUG, it happened several more times by
> different processes and different code paths, but always ending with this:
> 
>     Call Trace:    [prune_dcache+266/320] [shrink_dcache_memory+37/64] [try_to_free_pages_zone+104/208] [balance_classzone+76/560] [__alloc_pages+363/624]
> 
> 
> The BUG is this one:
> 
>     /**
>      * kmem_cache_free - Deallocate an object
>      * @cachep: The cache the allocation was from.
>      * @objp: The previously allocated object.
>      *
>      * Free an object which was previously allocated from this
>      * cache.
>      */
>     void kmem_cache_free (kmem_cache_t *cachep, void *objp)
>     {
>             unsigned long flags;
>     #if DEBUG
>             CHECK_PAGE(virt_to_page(objp));
>             if (cachep != GET_PAGE_CACHE(virt_to_page(objp)))
>                     BUG();
>     #endif
> 
>             local_irq_save(flags);
>             __kmem_cache_free(cachep, objp);
>             local_irq_restore(flags);
>     }
> 
> 
> So prune_dcache() gets called to free up some memory, but then it hands
> kmem_cache_free an inconsistent object to free?  Is this indicative of
> memory corruption?

Could be. Especially on such a machine, it was the era of unreliable
memory busses between RAM, chipsets and CPUs, unreliable cache sticks,
unreliable RAM sticks, and unreliable voltage regulators causing
lockups and cache corruption. I don't think the problem is related to
low memory because you should get OOM messages if this was the case.

Have you tried running memtest on this machine ? I would bet that it
could find errors. I do still run a P133 which stayed up for 2 years
before the mains was cut, but I've encountered many of them showing
frequent lockups in the past, particularly on those with a small
cache card plugged into a special slot close to the CPU. You may try
to disable the cache in the BIOS, because I suspect that performance
is less of a problem for you on such a machine than reliability.
Slightly increasing the CPU voltage might also compensate for an aging
regulator.

Best regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/