netdev - Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <459EB627.5040606@candelatech.com>
Date:	Fri, 05 Jan 2007 12:33:43 -0800
From:	Ben Greear <greearb@...delatech.com>
To:	David Miller <davem@...emloft.net>
CC:	herbert@...dor.apana.org.au, dlstevens@...ibm.com, jarkao2@...pl,
	netdev@...r.kernel.org
Subject: Re: BUG: soft lockup detected on CPU#0! (2.6.18.2 plus hacks)

David Miller wrote:
> From: Herbert Xu <herbert@...dor.apana.org.au>
> Date: Thu, 04 Jan 2007 17:26:27 +1100
> 
>> David Stevens <dlstevens@...ibm.com> wrote:
>>>        You're right, I don't know whether it'll fix the problem Ben saw
>>> or not, but it looks like the original code can do a receive before the
>>> in_device is fully initialized, and that, of course, is bad.
>>>        If the device for ip_rcv() is not the same one we were
>>> initializing when the receive interrupted, then the patch should have
>>> no effect either way -- I don't think it'll hide other problems.
>>>        If it's hard to reproduce (which I guess is true), then you're
>>> right, no soft lockup doesn't really tell us if it's fixed or not.
>> Actually I missed your point that the multicast locks aren't even
>> initialised at that point.  So this does explain the soft lock-up
>> and therefore your patch is clearly the correct solution.
> 
> I agree too, therefore I've added David's patch as below.
> 
> I'll push this to the -stable branches as well.  This fix is
> correct even if it does not entirely clear up the soft lockup
> bug being discussed in this thread, but I think it will :-)

We were able to reproduce the problem twice on the un-patched 2.6.18.2 kernel in about
2 hours of our stress test yesterday.  I applied this patch (well, the
ipv4 part..the ipv6 won't apply to 2.6.18.2), and it has run the stress
test clean for a total of about 8 hours.

So, I do believe this was the problem we were hitting, and it seems fixed.

Thanks!
Ben

-- 
Ben Greear <greearb@...delatech.com>
Candela Technologies Inc  http://www.candelatech.com

-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html