lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 21 Jun 2016 15:13:35 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	David Ahern <dsa@...ulusnetworks.com>
Cc:	Ingo Molnar <mingo@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: chain_key collision with 4.7.0-rec2+

On Tue, Jun 21, 2016 at 07:03:58AM -0600, David Ahern wrote:
> Peter/Ingo:
> 
> I have this hit many times over the past few weeks, but I do not have a
> reliable reproducer to attempt a git bisect. There 2 dumps below I hit last
> night within a few minutes of each other (reboot in between). First dump is
> for ipv4 (first login to VM after boot) and second dump is for ipv6 (second
> login to VM after boot). Both are using net-next tree, top commit is
> 697666eac66.

Plz as to not wrap dmesg output when pasting

> IPv4 one:

> [  189.171084] ======================
> [  189.171555] [chain_key collision ]
> [  189.172020] 4.7.0-rc2+ #310 Tainted: G        W
> [  189.172729] ----------------------
> [  189.173192] sshd/1463: Hash chain already cached but the contents don't > match!
> [  189.174236] Held locks:depth: 6
> [  189.174707]  class_idx:1253 -> chain_key:00000000000004e5 > (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
> [  189.176376]  class_idx:26 -> chain_key:00000000009ca01a > (rcu_read_lock){......}, at: [<ffffffff814421fb>] rcu_lock_acquire+0x0/0x20
> [  189.178014]  class_idx:898 -> chain_key:0000001394034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814421fb>] > rcu_lock_acquire+0x0/0x20
> [  189.179712]  class_idx:898 -> chain_key:0002728068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
> [  189.181374]  class_idx:1221 -> chain_key:4e500d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
> [  189.183400]  class_idx:1222 -> chain_key:01a1c10e0898ad0c > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
> [  189.185178] Locks in cached chain:depth: 6
> [  189.185791]  class_idx:1252 -> chain_key:00000000000004e4 > (((&ndev->rs_timer))){+.-...}
> [  189.186957]  class_idx:26 -> chain_key:00000000009c801a > (rcu_read_lock){......}
> [  189.187991]  class_idx:898 -> chain_key:0000001390034382 > (rcu_read_lock_bh){......}
> [  189.189023]  class_idx:898 -> chain_key:0002720068704382 > (rcu_read_lock_bh){......}
> [  189.190050]  class_idx:1221 -> chain_key:4e400d0e087044c5 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
> [  189.191370]  class_idx:1220 -> chain_key:01a1c10e0898ad0c > (&(&list->lock)->rlock#3){+.-...}

> and IPv6:

> [   22.082615] ======================
> [   22.083126] [chain_key collision ]
> [   22.083646] 4.7.0-rc2+ #310 Tainted: G        W
> [   22.084410] ----------------------
> [   22.084927] sshd/1401: Hash chain already cached but the contents don't > match!
> [   22.086010] Held locks:depth: 6
> [   22.086509]  class_idx:1255 -> chain_key:00000000000004e7 > (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff8144a84c>] lock_sock+0xb/0xd
> [   22.088253]  class_idx:26 -> chain_key:00000000009ce01a > (rcu_read_lock){......}, at: [<ffffffff814cd8a2>] rcu_read_lock+0x0/0x68
> [   22.089999]  class_idx:898 -> chain_key:000000139c034382 > (rcu_read_lock_bh){......}, at: [<ffffffff814a031b>] > rcu_lock_acquire+0x0/0x20
> [   22.091811]  class_idx:898 -> chain_key:0002738068704382 > (rcu_read_lock_bh){......}, at: [<ffffffff81400393>] > rcu_lock_acquire+0x0/0x20
> [   22.093639]  class_idx:1223 -> chain_key:4e700d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at: > [<ffffffff81408a7e>] __dev_queue_xmit+0x40c/0x72d
> [   22.095824]  class_idx:1224 -> chain_key:01a1c10e0898ed06 > (_xmit_ETHER#2){+.-...}, at: [<ffffffff814287e6>] sch_direct_xmit+0x6f/0x167
> [   22.097616] Locks in cached chain:depth: 6
> [   22.098197]  class_idx:1248 -> chain_key:00000000000004e0 > (((&ndev->rs_timer))){+.-...}
> [   22.099345]  class_idx:26 -> chain_key:00000000009c001a > (rcu_read_lock){......}
> [   22.100419]  class_idx:898 -> chain_key:0000001380034382 > (rcu_read_lock_bh){......}
> [   22.101535]  class_idx:898 -> chain_key:0002700068704382 > (rcu_read_lock_bh){......}
> [   22.102656]  class_idx:1223 -> chain_key:4e000d0e087044c7 > (dev->qdisc_running_key ?: &qdisc_running_key){+.....}
> [   22.104095]  class_idx:1222 -> chain_key:01a1c10e0898ed06 > (&(&list->lock)->rlock#3){+.-...}


Right, so those hashes do indeed match up pretty nicely.

Now note that this isn't a new issue, just a new warning. Such
collisions were silent before and could/can/do hide actual lock cycles.

In any case, could you perhaps try with commit (from tip/locking/core)
to see if that makes it any better:

  dfaaf3fa01d6 ("locking/lockdep: Use __jhash_mix() for iterate_chain_key()")

I still have to consider a better solution to the problem, because
fundamentally collisions are unavoidable, and we should deal better with
them. But our mixing function did suck somewhat, so making collisions
less likely might give us a little more time to ponder.

But if you're really annoyed by it all, disable DEBUG_LOCKDEP for a
while and rest assured that its not more broken than it was before --
just louder :-)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ