linux-kernel - [PATCH] LOCKDEP: fix mismatched lockdep_depth/curr_chain

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20071005034058.20275.21559.stgit@ghaskins-t60p.haskins.net>
Date:	Fri, 05 Oct 2007 00:03:01 -0400
From:	Gregory Haskins <ghaskins@...ell.com>
To:	mingo@...e.hu
Cc:	linux-kernel@...r.kernel.org, linux-rt-users@...r.kernel.org,
	ghaskins@...ell.com
Subject: [PATCH] LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash

Hi Ingo,
  I am seeing a problem on the latest -rt where lockdep completely overwhelms
  the system to the point that it grinds to a halt on large (8-way+) systems.
  The problem seems to be that the class->locks_before and locks_after grow
  unbounded (I have observed over 1M+ entries in them) so a lock_acquire call
  can take over 10 seconds to finish resolving.  Related to this seems to be
  that lockdep appears to see a chain-hash miss over and over for what I would
  assume should be an established graph (for instance, in
  double_lock_balance() in an rt_overload condition).  Turning off
  PROVE_LOCKING (statically, or by setting debug_locks=0 dynamically restores
  the system to normal behavior.

  I took some time tonight to study lockdep (it is quite an impressive body of
  code!), and came up with the following "fix".  It does improve things
  significantly by addressing what I believe is the issue with the
  cache-misses (though it would appear there are still a few more issues
  there that need addressing as some boots are still very lethargic).  I use 
  the term "fix" loosely since I am not confident that I fully understand the
  intention of your logic here so I can't say for sure if it was really
  broken, or if I have made it worse ;)

  Could you comment on what I have done here, or offer any advice on what to
  look for elsewhere?  I based the patch on pure linux-2.6.git since I see the
  same issue (by visual inspection, that is) there as well.

  Thanks in advance!
  -Greg   

------

LOCKDEP: fix mismatched lockdep_depth/curr_chain_hash

It is possible for the current->curr_chain_key to become inconsistent with the
current index if the chain fails to validate.  The end result is that future
lock_acquire() operations may inadvertently fail to find a hit in the cache
resulting in a new node being added to the graph for every acquire.

Signed-off-by: Gregory Haskins <ghaskins@...ell.com>
---

 kernel/lockdep.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 734da57..efb0d7e 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2450,11 +2450,11 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 		chain_head = 1;
 	}
 	chain_key = iterate_chain_key(chain_key, id);
-	curr->curr_chain_key = chain_key;

 	if (!validate_chain(curr, lock, hlock, chain_head))
 		return 0;

+	curr->curr_chain_key = chain_key;
 	curr->lockdep_depth++;
 	check_chain_key(curr);
 #ifdef CONFIG_DEBUG_LOCKDEP

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/