linux-kernel - Re: kernel 2.6.37 : oops in cleanup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 02 Feb 2011 15:53:27 +0100
From:	Eric Dumazet <eric.dumazet@...il.com>
To:	Yann Dupont <Yann.Dupont@...v-nantes.fr>
Cc:	linux-kernel@...r.kernel.org, netdev <netdev@...r.kernel.org>
Subject: Re: kernel 2.6.37 : oops in cleanup_once

Le mercredi 02 février 2011 à 14:08 +0100, Yann Dupont a écrit :
> Le 02/02/2011 12:24, Eric Dumazet a écrit :
> > Le mercredi 02 février 2011 à 11:52 +0100, Eric Dumazet a écrit :
> >> Le mercredi 02 février 2011 à 09:53 +0100, Yann Dupont a écrit :
> >>> Hello.
> >>> We recently upgraded one machine with vanilla 2.6.37, and experienced 2
> >>> kernel oops since. Each oops is after ~1 week of uptime.
> >>> The last oops was last night but we didn't had any trace.
> > oops, 2.6.37 "only"
> >
> >> Yes this is a known problem.
> >>
> >> Please try commit 3408404a4c2a4eead9d73b0bbbfe3f225b65f492
> >> (inetpeer: Use correct AVL tree base pointer in inet_getpeer())
> >>
> >> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3408404a4c2a4eead9d73b0bbbfe3f225b65f492
> >>
> >> I believe David will send it to stable team shortly, if not already
> >> done :)
> > Please ignore, this patch was for linux-2.6 tree, 2.6.37 was not
> > affected by the problem.
> >
> > So its another problem... Is there anything particular you do on this
> > machine ?
> >
> >
> >
> >
> Nothing really special there, we run a lot (20) of KVM guest (mainly 
> linux firewalls for lots of differents vlan), so we have a lot of 
> bridges vlan & tun/tap.
> Oh, and CONFIG_BRIDGE_IGMP_SNOOPING is set to n (because of  the other 
> bug already sent to netdev - more to come on next mail)
> 
> Hard to say if this BUG is new in 2.6.37. This host was running fine 
> with 2.6.34.2 since August 2010.
> Bisecting will be hard due to the time to trigger the bug (and the fact 
> that this machine is a production machine)
> 
> Anyway, I can test with a specific kernel version if you suspect something.
> 

I suspect a mem corruption from another layer (not inetpeer)

Unfortunately many kmem caches share the "64 bytes" cache.

Could you please add "slub_nomerge" on your boot command ?


This way, we can separate corruptions on each cache.


On your crash, one inetpeer contain garbage on unused_lists next/prev
pointers :

RCX: 0000000000000005
RDX: 0b000209f1beadde

Definitly something overwrote these values with non pointers values.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/