[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <b23d56c5-da54-4fbd-81ec-743cc53e0162@redhat.com>
Date: Mon, 31 Mar 2025 17:47:39 -0400
From: Waiman Long <llong@...hat.com>
To: Boqun Feng <boqun.feng@...il.com>, Waiman Long <llong@...hat.com>
Cc: paulmck@...nel.org, Eric Dumazet <edumazet@...gle.com>,
Peter Zijlstra <peterz@...radead.org>, Breno Leitao <leitao@...ian.org>,
Ingo Molnar <mingo@...hat.com>, Will Deacon <will@...nel.org>, aeh@...a.com,
linux-kernel@...r.kernel.org, netdev@...r.kernel.org, jhs@...atatu.com,
kernel-team@...a.com, Erik Lundgren <elundgren@...a.com>
Subject: Re: [PATCH] lockdep: Speed up lockdep_unregister_key() with expedited
RCU synchronization
On 3/31/25 5:21 PM, Boqun Feng wrote:
> On Mon, Mar 31, 2025 at 02:57:20PM -0400, Waiman Long wrote:
>> On 3/31/25 2:33 PM, Paul E. McKenney wrote:
>>> On Mon, Mar 31, 2025 at 01:33:22PM -0400, Waiman Long wrote:
>>>> On 3/31/25 1:26 PM, Boqun Feng wrote:
>>>>> On Wed, Mar 26, 2025 at 11:39:49AM -0400, Waiman Long wrote:
>>>>> [...]
>>>>>>>> Anyway, that may work. The only problem that I see is the issue of nesting
>>>>>>>> of an interrupt context on top of a task context. It is possible that the
>>>>>>>> first use of a raw_spinlock may happen in an interrupt context. If the
>>>>>>>> interrupt happens when the task has set the hazard pointer and iterating the
>>>>>>>> hash list, the value of the hazard pointer may be overwritten. Alternatively
>>>>>>>> we could have multiple slots for the hazard pointer, but that will make the
>>>>>>>> code more complicated. Or we could disable interrupt before setting the
>>>>>>>> hazard pointer.
>>>>>>> Or we can use lockdep_recursion:
>>>>>>>
>>>>>>> preempt_disable();
>>>>>>> lockdep_recursion_inc();
>>>>>>> barrier();
>>>>>>>
>>>>>>> WRITE_ONCE(*hazptr, ...);
>>>>>>>
>>>>>>> , it should prevent the re-entrant of lockdep in irq.
>>>>>> That will probably work. Or we can disable irq. I am fine with both.
>>>>> Disabling irq may not work in this case, because an NMI can also happen
>>>>> and call register_lock_class().
>>>> Right, disabling irq doesn't work with NMI. So incrementing the recursion
>>>> count is likely the way to go and I think it will work even in the NMI case.
>>>>
>>>>> I'm experimenting a new idea here, it might be better (for general
>>>>> cases), and this has the similar spirit that we could move the
>>>>> protection scope of a hazard pointer from a key to a hash_list: we can
>>>>> introduce a wildcard address, and whenever we do a synchronize_hazptr(),
>>>>> if the hazptr slot equal to wildcard, we treat as it matches to any ptr,
>>>>> hence synchronize_hazptr() will still wait until it's zero'd. Not only
>>>>> this could help in the nesting case, it can also be used if the users
>>>>> want to protect multiple things with this simple hazard pointer
>>>>> implementation.
>>>> I think it is a good idea to add a wildcard for the general use case.
>>>> Setting the hazptr to the list head will be enough for this particular case.
>>> Careful! If we enable use of wildcards outside of the special case
>>> of synchronize_hazptr(), we give up the small-memory-footprint advantages
>>> of hazard pointers. You end up having to wait on all hazard-pointer
>>> readers, which was exactly why RCU was troublesome here. ;-)
> Technically, only the hazard-pointer readers that have switched to
> wildcard mode because multiple hazptr critical sections ;-)
>
>> If the plan is to have one global set of hazard pointers for all the
> A global set of hazard pointers for all the possible use cases is the
> current plan (at least it should be when we have fully-featured hazptr
> [1]). Because the hazard pointer value already points the the data to
> protect, so no need to group things into "domain"s.
>
>> possible use cases, supporting wildcard may be a problem. If we allow
> I had some off-list discussions with Paul, and I ended up with the idea
> of user-specific wildcard (i.e. different users can have different
> wildcards) + one global set of hazard pointers. However, it just occured
> to me that it won'd quite work in this simple hazard pointer
> implementation (one slot per-CPU) :( Because you can have a user A's
> hazptr critical interrupted by a user B's interrupt handler, and if both
> A & B are using customized wildcard but they don't know each other, it's
> not going to work by setting either wildcard value into the slot.
>
> To make it clear for the discussion, we have two hazard pointer
> implementations:
>
> 1. The fully-featured one [1], which allow users to provide memory for
> hazptr slots, so no issue about nesting/re-entry etc. And wildcard
> doesn't make sense in this implemenation.
>
> 2. The simple variant, which is what I've proposed in this thread, and
> since it only has one slot per CPU, either all the users need to
> prevent the re-entries or we need a global wildcard. Also the readers
> of the simple variant need to disable preemption regardlessly because
> it only has one hazptr slot to use. That means its read-side critical
> section should be short usually.
>
> I could try to use the fully-featured one in lockdep, what I need to do
> is creating enough hazard_context so we have enough slots for lockdep
> and may or may not need lockdep_recursion to prevent reentries. However,
> I still believe (or I don't have data to show otherwise) that the simple
> variant with one slot per CPU + global wildcard will work fine in
> practice.
>
> So what I would like to do is introducing the simple variant as a
> general API with a global wildcard (because without it, it cannot be a
> general API because one user have to prevent entering another user's
> critical section), and lockdep can use it. And we can monitor the
> delay of synchronize_shazptr() and if wildcard becomes a problem, move
> to a fully-featured hazptr implementation. Sounds like a plan?
>
> [1]: https://lore.kernel.org/lkml/20240917143402.930114-2-boqun.feng@gmail.com/
Thank for the detailed explanation. I am looking forward to your new
hazptr patch series.
Cheers,
Longman
>
> Regards,
> Boqun
>
>> different sets of hazard pointers for different use cases, it will be less
>> an issue. Anyway, maybe we should skip wildcard for the current case so that
>> we have more time to think through it first.
>>
>> Cheers,
>> Longman
>>
Powered by blists - more mailing lists