linux-kernel - Re: [PATCH] lockdep: Speed up lockdep_unregister

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1e4c0df6-cb4d-462c-9019-100044ea8016@redhat.com>
Date: Tue, 25 Mar 2025 19:20:44 -0400
From: Waiman Long <llong@...hat.com>
To: Boqun Feng <boqun.feng@...il.com>, Waiman Long <llong@...hat.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Peter Zijlstra
 <peterz@...radead.org>, Breno Leitao <leitao@...ian.org>,
 Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>, aeh@...a.com,
 linux-kernel@...r.kernel.org, netdev@...r.kernel.org, jhs@...atatu.com,
 kernel-team@...a.com, Erik Lundgren <elundgren@...a.com>,
 "Paul E. McKenney" <paulmck@...nel.org>
Subject: Re: [PATCH] lockdep: Speed up lockdep_unregister_key() with expedited
 RCU synchronization

On 3/25/25 3:42 PM, Boqun Feng wrote:
> On Tue, Mar 25, 2025 at 03:23:30PM -0400, Waiman Long wrote:
> [...]
>>>>> That commit seemed fixing a race between disabling lockdep and
>>>>> unregistering key, and most importantly, call zap_class() for the
>>>>> unregistered key even if lockdep is disabled (debug_locks = 0). It might
>>>>> be related, but I'm not sure that's the reason of putting
>>>>> synchronize_rcu() there. Say you want to synchronize between
>>>>> /proc/lockdep and lockdep_unregister_key(), and you have
>>>>> synchronize_rcu() in lockdep_unregister_key(), what's the RCU read-side
>>>>> critical section at /proc/lockdep?
>>>> I agree that the commit that I mentioned is not relevant to the current
>>>> case. You are right that is_dynamic_key() is the only function that is
>>>> problematic, the other two are protected by the lockdep_lock. So they are
>>>> safe. Anyway, I believe that the actual race happens in the iteration of the
>>>> hashed list in is_dynamic_key(). The key that you save in the
>>>> lockdep_key_hazptr in your proposed patch should never be the key (dead_key)
>>> The key stored in lockdep_key_hazptr is the one that the rest of the
>>> function will use after is_dynamic_key() return true. That is,
>>>
>>> 	CPU 0				CPU 1
>>> 	=====				=====
>>> 	WRITE_ONCE(*lockdep_key_hazptr, key);
>>> 	smp_mb();
>>>
>>> 	is_dynamic_key():
>>> 	  hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
>>> 	    if (k == key) {
>>> 	      found = true;
>>> 	      break;
>>> 	    }
>>> 	  }
>>> 	  				lockdep_unregister_key():
>>> 					  hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
>>> 					    if (k == key) {
>>> 					      hlist_del_rcu(&k->hash_entry);
>>> 				              found = true;
>>> 				              break;
>>> 					    }
>>> 					  }
>>>
>>> 				        smp_mb();
>>>
>>> 					synchronize_lockdep_key_hazptr():
>>> 					  for_each_possible_cpu(cpu) {
>>> 					    <wait for the hazptr slot on
>>> 					    that CPU to be not equal to
>>> 					    the removed key>
>>> 					  }
>>>
>>>
>>> , so that if is_dynamic_key() finds a key was in the hash, even though
>>> later on the key would be removed by lockdep_unregister_key(), the
>>> hazard pointers guarantee lockdep_unregister_key() would wait for the
>>> hazard pointer to release.
>>>
>>>> that is passed to lockdep_unregister_key(). In is_dynamic_key():
>>>>
>>>>       hlist_for_each_entry_rcu(k, hash_head, hash_entry) {
>>>>                   if (k == key) {
>>>>                           found = true;
>>>>                           break;
>>>>                   }
>>>>           }
>>>>
>>>> key != k (dead_key), but before accessing its content to get to hash_entry,
>>> It is the dead_key.
>>>
>>>> an interrupt/NMI can happen. In the mean time, the structure holding the key
>>>> is freed and its content can be overwritten with some garbage. When
>>>> interrupt/NMI returns, hash_entry can point to anything leading to crash or
>>>> an infinite loop.  Perhaps we can use some kind of synchronization mechanism
>>> No, hash_entry cannot be freed or overwritten because the user has
>>> protect the key with hazard pointers, only when the user reset the
>>> hazard pointer to NULL, lockdep_unregister_key() can then return and the
>>> key can be freed.
>>>
>>>> between is_dynamic_key() and lockdep_unregister_key() to prevent this kind
>>>> of racing. For example, we can have an atomic counter associated with each
>>> The hazard pointer I proposed provides the exact synchronization ;-)
>> What I am saying is that register_lock_class() is trying to find a newkey
>> while lockdep_unregister_key() is trying to take out an oldkey (newkey !=
>> oldkey). If they happens in the same hash list, register_lock_class() will
>> put newkey into the hazard pointer, but synchronize_lockdep_key_hazptr()
>> call from lockdep_unregister_key() is checking for oldkey which is not the
>> one saved in the hazard pointer. So lockdep_unregister_key() will return and
>> the key will be freed and reused while is_dynamic_key() may just have a
>> reference to the oldkey and trying to access its content which is invalid. I
>> think this is a possible scenario.
>>
> Oh, I see. And yes, the hazard pointers I proposed cannot prevent this
> race unfortunately. (Well, technically we can still use an extra slot to
> hold the key in the hash list iteration, but that would generates a lot
> of stores, so it won't be ideal). But...
>
> [...]
>>>> head of the hashed table. is_dynamic_key() can increment the counter if it
>>>> is not zero to proceed and lockdep_unregister_key() have to make sure it can
>>>> safely decrement the counter to 0 before going ahead. Just a thought!
>>>>
> Your idea inspires another solution with hazard pointers, we can
> put the pointer of the hash_head into the hazard pointer slot ;-)
>
> in register_lock_class():
>
> 		/* hazptr: protect the key */
> 		WRITE_ONCE(*key_hazptr, keyhashentry(lock->key));
>
> 		/* Synchronizes with the smp_mb() in synchronize_lockdep_key_hazptr() */
> 		smp_mb();
>
> 		if (!static_obj(lock->key) && !is_dynamic_key(lock->key)) {
> 			return NULL;
> 		}
>
> in lockdep_unregister_key():
>
> 	/* Wait until register_lock_class() has finished accessing k->hash_entry. */
> 	synchronize_lockdep_key_hazptr(keyhashentry(key));
>
>
> which is more space efficient than per hash list atomic or lock unless
> you have 1024 or more CPUs.

It looks like you are trying hard to find a use case for hazard pointer 
in the kernel :-)

Anyway, that may work. The only problem that I see is the issue of 
nesting of an interrupt context on top of a task context. It is possible 
that the first use of a raw_spinlock may happen in an interrupt context. 
If the interrupt happens when the task has set the hazard pointer and 
iterating the hash list, the value of the hazard pointer may be 
overwritten. Alternatively we could have multiple slots for the hazard 
pointer, but that will make the code more complicated. Or we could 
disable interrupt before setting the hazard pointer.

The solution that I am thinking about is to have a simple unfair rwlock 
to protect just the hash list iteration. lockdep_unregister_key() and 
lockdep_register_key() take the write lock with interrupt disabled. 
While is_dynamic_key() takes the read lock. Nesting in this case isn't a 
problem and we don't need RCU to protect the iteration process and so 
the last synchronize_rcu() call isn't needed. The level of contention 
should be low enough that live lock isn't an issue.

Cheers,
Longman