[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4DDEAA3C.7020502@fb.com>
Date: Thu, 26 May 2011 12:30:04 -0700
From: Arun Sharma <asharma@...com>
To: Eric Dumazet <eric.dumazet@...il.com>
CC: Maximilian Engelhardt <maxi@...monizer.de>,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org,
StuStaNet Vorstand <vorstand@...sta.mhn.de>
Subject: Re: Kernel crash after using new Intel NIC (igb)
On 5/24/11 11:35 PM, Eric Dumazet wrote:
>> Another possibility is to do the list_empty() check twice. Once without
>> taking the lock and again with the spinlock held.
>>
>
> Why ?
>
Part of the problem is that I don't have a precise understanding of the
race condition that's causing the list to become corrupted.
All I know is that doing it under the lock fixes it. If it's slowing
things down, we do a check outside the lock (since it's cheap). But if
we get the wrong answer, we verify it again under the lock.
> list_del_init(&p->unused); (done under lock of course) is safe, you can
> call it twice, no problem.
Doing it twice is not a problem. But doing it when we shouldn't be doing
it could be the problem.
The list modification under unused_peers.lock looks generally safe. But
the control flow (based on refcnt) done outside the lock might have races.
Eg: inet_putpeer() might find the refcnt go to zero, but before it adds
it to the unused list, another thread may be doing inet_getpeer() and
set refcnt to 1. In the end, we end up with a node that's potentially in
use, but ends up on the unused list.
-Arun
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists