netdev - Re: 2.6.24-rc6-mm1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <64bb37e0801060230x6b392542la9556d72a184f306@mail.gmail.com>
Date:	Sun, 6 Jan 2008 11:30:48 +0100
From:	"Torsten Kaiser" <just.for.lkml@...glemail.com>
To:	"Jarek Poplawski" <jarkao2@...il.com>
Cc:	"Herbert Xu" <herbert@...dor.apana.org.au>,
	"Andrew Morton" <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, "Neil Brown" <neilb@...e.de>,
	"J. Bruce Fields" <bfields@...ldses.org>, netdev@...r.kernel.org,
	"Tom Tucker" <tom@...ngridcomputing.com>
Subject: Re: 2.6.24-rc6-mm1

On Jan 6, 2008 9:27 AM, Jarek Poplawski <jarkao2@...il.com> wrote:
> On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote:
> ...
> > So my personal conclusion would be, that someone is writing to memory
> > that he no longer owns. Most probably 0-bytes. (the complete_routine
> > got NULLed and the warning about dst->__refcnt being 0).
> >
> > Use-after-free or something else?
>
> I agree: your conclusion seems to be the most probable explanation for
> this. Then it could be really hard to solve this without bisection or
> something similar. But there is some probabability this something could
> try kfree later too, but simply this list debugging triggers earlier.

As for example in the case when it dies in ieee1394-thread the list is
so corrupted that it will die anyway.

But I might try this anyway, as I don't really have a better idee.

> > > > If you think some other slub_debug might catch it, I would try this...
>
> You can try to add "U" to these other slub_debug options. As a matter
> of fact, if your above diagnose is right, it seems you risk to damage
> your system or even the box with these tests, so if you want to
> continue, you should probably turn any possible debugging on (not in
> mm only).

I did not add U, because I thought that would only needed to trace memory leaks.
And I hoped that using P (poison) would catch any later use (after free).

> BTW, you've written that some debugging options seem to delay the bug.
> Since they often change sizes of some structures than such wrong
> writes could have some 'safer' offsets. So, this could really delay
> e.g. these list's bugs, but maybe this could also let to stay 'alive'
> to such wrong kfree?

I think this bug is highly timing dependent. Its not always the same
package that dies and as this is a SMP system I would guess two CPUs
using the same data will trigger this.
And using the poison-option will definitily slow the system down and
mess up the timings.

What also speaks against the 'safer' offsets is, that after adding my
notfreed-byte to skbuff the bug still triggered in the same way.

I'm currently looking at
http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12702.html
,trying to understand if this is relevant for me on x86_64.

Torsten
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html