netdev - Re: [1/3] 2.6.21-rc6: known regressions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20070414062143.GA12707@elte.hu>
Date:	Sat, 14 Apr 2007 08:21:43 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Adrian Bunk <bunk@...sta.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Jeff Garzik <jgarzik@...ox.com>, netdev@...r.kernel.org,
	e1000-devel@...ts.sourceforge.net,
	Ayaz Abdulla <aabdulla@...dia.com>,
	Dave Jones <davej@...hat.com>,
	"David S. Miller" <davem@...emloft.net>, Greg KH <greg@...ah.com>
Subject: Re: [1/3] 2.6.21-rc6: known regressions

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> Note: Ingo also reports what looks like a memory corruption due to the 
> 6b6b6b6b pattern on presumably the same box.
> 
> The 6b6b6b6b pattern is POISON_FREE, implying some kind of slab 
> misuse, most likely a use-after-free, although possibly just due to 
> overrunning a slab into the next one or something like that.

unfortunately, while being at -rc6 based kernel #445 meanwhile, this 
incident was the only time i saw this problem. Note: while it's a 
CONFIG_SMP kernel, in that bootup i was using maxcpus=1:

   WARNING: maxcpus limit of 1 reached. Processor ignored.

so it's a pure UP problem. Plus i used PREEMPT_NONE. So this really must 
be something fundamental.

> What I'm leading up to is that I'm wondering if these mysterious 
> network driver bugs aren't due to the network drivers themselves, but 
> due to some higher-level problem. I think the hangs that Ingo sees 
> with forcedeth were preceded by mysterious and "impossible" NULL 
> pointer oopses. Ingo?

hm. I would tend to exclude networking, because the oops happened right 
during bootup (i saw it happen real time on the serial console), 
possibly before networking was brought up. It was udevd that crashed, 
and rarely does udevd do anything after its initial /dev hierarchy setup 
frenzy. (But this testbox boots very fast so it might have been near 
network bringup.)

note that i can pretty much freely force the forcedeth problem to occur 
on -rt [but all the reports i sent about it were done on a vanilla 
kernel]. I triggered that problem at least a couple of dozen times, and 
it _never_ caused any other effect besides the skb NULL dereference - or 
lately (with the latest forcedeth.c version), a pure forcedeth interface 
hang. That doesnt exclude networking driver badness, but makes it less 
likely.

to me this crash has the feeling of being sysfs related: not just 
because the crash itself is within sysfs:

 EIP is at module_put+0x19/0x2d

 [<c0104c44>] show_trace_log_lvl+0x19/0x2e
 [<c0104cf4>] show_stack_log_lvl+0x9b/0xa3
 [<c0104fdd>] show_registers+0x1c8/0x29a
 [<c01052d0>] die+0x119/0x1f0
 [<c03cd075>] do_page_fault+0x4e3/0x5b8
 [<c03cb7a4>] error_code+0x7c/0x84
 [<c019e832>] sysfs_release+0x55/0x76
 [<c0167c7f>] __fput+0xb9/0x15e
 [<c0167d3b>] fput+0x17/0x19
 [<c01658b2>] filp_close+0x52/0x5a
 [<c01660a3>] sys_close+0x76/0xad
 [<c0103dc0>] syscall_call+0x7/0xb

but also because udevd itself is _very_ sysfs intense - an in fact on 
this bzImage kernel it's perhaps the _only_ true sysfs activity that 
happens. (there are no loadable modules whatsoever, all drivers are 
built in)

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html