[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <476f1719-6fab-88fe-11ab-8e1d738275e3@cumulusnetworks.com>
Date: Tue, 21 Jun 2016 07:22:38 -0600
From: David Ahern <dsa@...ulusnetworks.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: chain_key collision with 4.7.0-rec2+
On 6/21/16 7:13 AM, Peter Zijlstra wrote:
> Plz as to not wrap dmesg output when pasting
not wrapped on my end -- the copy in Send folder in or the ouptut in
your response.
>
>> IPv4 one:
>
>> [ 189.171084] ======================
>> [ 189.171555] [chain_key collision ]
>> [ 189.172020] 4.7.0-rc2+ #310 Tainted: G W
>> [ 189.172729] ----------------------
>> [ 189.173192] sshd/1463: Hash chain already cached but the contents don't > match!
>> [ 189.174236] Held locks:depth: 6
>> [ 189.174707] class_idx:1253 -> chain_key:00000000000004e5 > (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
>> [ 189.176376] class_idx:26 -> chain_key:00000000009ca01a > (rcu_read_lock){......}, at: [<ffffffff814421fb>] rcu_lock_acquire+0x0/0x20
>> [ 189.178014] class_idx:898 -> chain_key:0000001394034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814421fb>] > rcu_lock_acquire+0x0/0x20
>> [ 189.179712] class_idx:898 -> chain_key:0002728068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
>> [ 189.181374] class_idx:1221 -> chain_key:4e500d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
>> [ 189.183400] class_idx:1222 -> chain_key:01a1c10e0898ad0c > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
>> [ 189.185178] Locks in cached chain:depth: 6
>> [ 189.185791] class_idx:1252 -> chain_key:00000000000004e4 > (((&ndev->rs_timer))){+.-...}
>> [ 189.186957] class_idx:26 -> chain_key:00000000009c801a > (rcu_read_lock){......}
>> [ 189.187991] class_idx:898 -> chain_key:0000001390034382 > (rcu_read_lock_bh){......}
>> [ 189.189023] class_idx:898 -> chain_key:0002720068704382 > (rcu_read_lock_bh){......}
>> [ 189.190050] class_idx:1221 -> chain_key:4e400d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
>> [ 189.191370] class_idx:1220 -> chain_key:01a1c10e0898ad0c > (&(&list->lock)->rlock#3){+.-...}
>
>> and IPv6:
>
>> [ 22.082615] ======================
>> [ 22.083126] [chain_key collision ]
>> [ 22.083646] 4.7.0-rc2+ #310 Tainted: G W
>> [ 22.084410] ----------------------
>> [ 22.084927] sshd/1401: Hash chain already cached but the contents don't > match!
>> [ 22.086010] Held locks:depth: 6
>> [ 22.086509] class_idx:1255 -> chain_key:00000000000004e7 > (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
>> [ 22.088253] class_idx:26 -> chain_key:00000000009ce01a > (rcu_read_lock){......}, at: [<ffffffff814cd8a2>] rcu_read_lock+0x0/0x68
>> [ 22.089999] class_idx:898 -> chain_key:000000139c034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814a031b>] > rcu_lock_acquire+0x0/0x20
>> [ 22.091811] class_idx:898 -> chain_key:0002738068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
>> [ 22.093639] class_idx:1223 -> chain_key:4e700d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
>> [ 22.095824] class_idx:1224 -> chain_key:01a1c10e0898ed06 > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
>> [ 22.097616] Locks in cached chain:depth: 6
>> [ 22.098197] class_idx:1248 -> chain_key:00000000000004e0 > (((&ndev->rs_timer))){+.-...}
>> [ 22.099345] class_idx:26 -> chain_key:00000000009c001a > (rcu_read_lock){......}
>> [ 22.100419] class_idx:898 -> chain_key:0000001380034382 > (rcu_read_lock_bh){......}
>> [ 22.101535] class_idx:898 -> chain_key:0002700068704382 > (rcu_read_lock_bh){......}
>> [ 22.102656] class_idx:1223 -> chain_key:4e000d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
>> [ 22.104095] class_idx:1222 -> chain_key:01a1c10e0898ed06 > (&(&list->lock)->rlock#3){+.-...}
>
>
> Right, so those hashes do indeed match up pretty nicely.
>
> Now note that this isn't a new issue, just a new warning. Such
> collisions were silent before and could/can/do hide actual lock cycles.
>
> In any case, could you perhaps try with commit (from tip/locking/core)
> to see if that makes it any better:
>
> dfaaf3fa01d6 ("locking/lockdep: Use __jhash_mix() for iterate_chain_key()")
I'll add it and run for a while. I tend to see the trace once a day or
so depending on my workload (ie., time to work on code).
>
> I still have to consider a better solution to the problem, because
> fundamentally collisions are unavoidable, and we should deal better with
> them. But our mixing function did suck somewhat, so making collisions
> less likely might give us a little more time to ponder.
>
> But if you're really annoyed by it all, disable DEBUG_LOCKDEP for a
> while and rest assured that its not more broken than it was before --
> just louder :-)
>
Not really an option if I want to make sure I am not adding locking
issues and the habit of ignoring splats defeats the purpose of enabling
the lock debugging. So hopefully you come up with a better answer.
Thanks for the quick response
Powered by blists - more mailing lists