[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <64bb37e0801060230x6b392542la9556d72a184f306@mail.gmail.com>
Date: Sun, 6 Jan 2008 11:30:48 +0100
From: "Torsten Kaiser" <just.for.lkml@...glemail.com>
To: "Jarek Poplawski" <jarkao2@...il.com>
Cc: "Herbert Xu" <herbert@...dor.apana.org.au>,
"Andrew Morton" <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, "Neil Brown" <neilb@...e.de>,
"J. Bruce Fields" <bfields@...ldses.org>, netdev@...r.kernel.org,
"Tom Tucker" <tom@...ngridcomputing.com>
Subject: Re: 2.6.24-rc6-mm1
On Jan 6, 2008 9:27 AM, Jarek Poplawski <jarkao2@...il.com> wrote:
> On Sat, Jan 05, 2008 at 03:52:32PM +0100, Torsten Kaiser wrote:
> ...
> > So my personal conclusion would be, that someone is writing to memory
> > that he no longer owns. Most probably 0-bytes. (the complete_routine
> > got NULLed and the warning about dst->__refcnt being 0).
> >
> > Use-after-free or something else?
>
> I agree: your conclusion seems to be the most probable explanation for
> this. Then it could be really hard to solve this without bisection or
> something similar. But there is some probabability this something could
> try kfree later too, but simply this list debugging triggers earlier.
As for example in the case when it dies in ieee1394-thread the list is
so corrupted that it will die anyway.
But I might try this anyway, as I don't really have a better idee.
> > > > If you think some other slub_debug might catch it, I would try this...
>
> You can try to add "U" to these other slub_debug options. As a matter
> of fact, if your above diagnose is right, it seems you risk to damage
> your system or even the box with these tests, so if you want to
> continue, you should probably turn any possible debugging on (not in
> mm only).
I did not add U, because I thought that would only needed to trace memory leaks.
And I hoped that using P (poison) would catch any later use (after free).
> BTW, you've written that some debugging options seem to delay the bug.
> Since they often change sizes of some structures than such wrong
> writes could have some 'safer' offsets. So, this could really delay
> e.g. these list's bugs, but maybe this could also let to stay 'alive'
> to such wrong kfree?
I think this bug is highly timing dependent. Its not always the same
package that dies and as this is a SMP system I would guess two CPUs
using the same data will trigger this.
And using the poison-option will definitily slow the system down and
mess up the timings.
What also speaks against the 'safer' offsets is, that after adding my
notfreed-byte to skbuff the bug still triggered in the same way.
I'm currently looking at
http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg12702.html
,trying to understand if this is relevant for me on x86_64.
Torsten
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists