[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <571D14F8.6070306@stressinduktion.org>
Date: Sun, 24 Apr 2016 20:48:24 +0200
From: Hannes Frederic Sowa <hannes@...essinduktion.org>
To: David Miller <davem@...emloft.net>
Cc: eric.dumazet@...il.com, Valdis.Kletnieks@...edu,
netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: linux-next: zillions of lockdep whinges in
include/net/sock.h:1408
On 24.04.2016 20:38, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@...essinduktion.org>
> Date: Thu, 21 Apr 2016 15:49:37 +0200
>
>> On 21.04.2016 15:31, Eric Dumazet wrote:
>>> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@...edu wrote:
>>>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>>>> Hi,
>>>>>
>>>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>>>> to skip the CPU and PID so the list isn't quite so long):
>>>>>
>>>>> Thanks for the report. Can you give me some more details:
>>>>>
>>>>> Is this an nfs socket? Do you by accident know if this socket went
>>>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>>>> locks at that point but I fear that the lockdep reinitialization
>>>>> confused lockdep.
>>>>
>>>> It wasn't an NFS socket, as NFS wasn't even active at the time. I'm reasonably
>>>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>>>> udpv6_queue_rcv_skb were both implicated. I strongly suspect that pretty much
>>>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>>>> closed firefox, which is usually a heavy network hitter on my laptop.
>>>
>>>
>>> Looks like the following patch is needed, can you try it please ?
>>>
>>> Thanks !
>>>
>>> diff --git a/include/net/sock.h b/include/net/sock.h
>>> index d997ec13a643..db8301c76d50 100644
>>> --- a/include/net/sock.h
>>> +++ b/include/net/sock.h
>>> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>>> {
>>> struct sock *sk = (struct sock *)csk;
>>>
>>> - return lockdep_is_held(&sk->sk_lock) ||
>>> + return !debug_locks ||
>>> + lockdep_is_held(&sk->sk_lock) ||
>>> lockdep_is_held(&sk->sk_lock.slock);
>>> }
>>> #endif
>>
>> I would prefer to add debug_locks at the WARN_ON level, like
>> WARN_ON(debug_locks && !lockdep_sock_is_held(sk)), but I am not sure if
>> this fixes the initial splat.
>
> Can we finish this conversation out and come up with a final patch
> for this soon?
Eric's patch is worth to apply anyway, but I am not sure if it solves
the (fundamental) problem. I couldn't reproduce it with the exact next-
tag provided in the initial mail. All other reports also only happend
with linux-next and not net-next.
I hope I Valdis provides his config soon and I will continue my analysis
on this then.
Thanks,
Hannes
Powered by blists - more mailing lists