[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84b6be23-edba-4bf1-8d9e-1ecb59eceaca@amd.com>
Date: Tue, 12 Nov 2024 08:45:17 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: paulmck@...nel.org
Cc: frederic@...nel.org, rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel-team@...a.com, rostedt@...dmis.org,
kernel test robot <oliver.sang@...el.com>,
Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Kent Overstreet <kent.overstreet@...ux.dev>, bpf@...r.kernel.org
Subject: Re: [PATCH rcu 06/12] srcu: Add srcu_read_lock_lite() and
srcu_read_unlock_lite()
>>>> I am trying to understand the (unlikely) case where synchronize_srcu() is done before any
>>>> srcu reader lock/unlock lite call is done. Can new SRCU readers fail to observe the
>>>> updates?
>>>
>>> If a SRCU reader fail to observe the index flip, then isn't it the case
>>> that the synchronize_rcu() invoked from srcu_readers_active_idx_check()
>>> must wait on it?
>>
>> Below is the sequence of operations I was thinking of, where at step 4 CPU2
>> reads old pointer
>>
>> ptr = old
>>
>>
>> CPU1 CPU2
>>
>> 1. Update ptr = new
>>
>> 2. synchronize_srcu()
>>
>> <Does not use synchronize_rcu()
>> as SRCU_READ_FLAVOR_LITE is not
>> set for any sdp as srcu_read_lock_lite()
>> hasn't been called by any CPU>
>>
>> 3. srcu_read_lock_lite()
>> <No smp_mb() ordering>
>>
>> 4. Can read ptr == old ?
>
> As long as the kernel was built with CONFIG_PROVE_RCU=y and given a fix
> to the wrong-CPU issue you quite rightly point out below, no it cannot.
> The CPU's first call to srcu_read_lock_lite() will use cmpxchg() to update
> ->srcu_reader_flavor, which will place full ordering between that update
> and the later read from "ptr".
>
> So if the synchronize_srcu() is too early to see the SRCU_READ_FLAVOR_LITE
> bit, then the reader must see the new value of "ptr". Similarly,
> if the reader can see the old value of "ptr", then synchronize_srcu()
> must see the reader's setting of the SRCU_READ_FLAVOR_LITE bit.
>
> But both the CONFIG_PROVE_RCU=n and the wrong-CPU issue must be fixed
> for this to work. Please see the upcoming patches to be posted as a
> reply to this email.
>
Got it. It will be good to document how full ordering provided by cmpxchg()
helps here.
>>>>> + if (mask & SRCU_READ_FLAVOR_LITE && !gp)
>>>>> + return false;
>>>>
>>>> So, srcu_readers_active_idx_check() can potentially return false for very long
>>>> time, until the CPU executing srcu_readers_active_idx_check() does
>>>> at least one read lock/unlock lite call?
>>>
>>> That is correct. The theory is that until after an srcu_read_lock_lite()
>>> has executed, there is no need to wait on it. Does the practice match the
>>> theory in this case, or is there some sequence of events that I missed?
>>
>> Below sequence
>>
>> CPU1 CPU2
>> 1. srcu_read_lock_lite()
>>
>>
>> 2. srcu_read_unlock_lite()
>>
>> 3. synchronize_srcu()
>>
>> 3.1 srcu_readers_lock_idx() is
>> called with gp = false as
>> srcu_read_lock_lite() was never
>> called on this CPU for this
>> srcu_struct. So
>> ssp->sda->srcu_reader_flavor is not
>> set for CPU1's sda.
>
> Good eyes! Yes, the scan that sums the ->srcu_unlock_count[] counters
> must also OR together the ->srcu_reader_flavor fields.
>
Agree
>> 3.2 Inside srcu_readers_lock_idx()
>> "mask" contains SRCU_READ_FLAVOR_LITE
>> as CPU2's sdp->srcu_reader_flavor has it.
>>
>> 3.3 CPU1 keeps returning false from
>> below check until CPU1 does at least
>> one srcu_read_lock_lite() call or
>> the thread migrates.
>>
>> if (mask & SRCU_READ_FLAVOR_LITE && !gp)
>> return false;
>
> This is also fixed by the OR of the ->srcu_reader_flavor fields, correct?
>
Yes, agree.
> I guess I could claim that this bug prevents the wrong-CPU bug above
> from resulting in a too-short SRCU grace period, but it is of course
> better to just fix the bugs. ;-)
>
>>>>> + return sum == unlocks;
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -473,6 +482,7 @@ static unsigned long srcu_readers_unlock_idx(struct srcu_struct *ssp, int idx)
>>>>> */
>>>>> static bool srcu_readers_active_idx_check(struct srcu_struct *ssp, int idx)
>>>>> {
>>>>> + bool did_gp = !!(raw_cpu_read(ssp->sda->srcu_reader_flavor) & SRCU_READ_FLAVOR_LITE);
>>>>
>>>> sda->srcu_reader_flavor is only set when CONFIG_PROVE_RCU is enabled. But we
>>>> need the reader flavor information for srcu lite variant to work. So, lite
>>>> variant does not work when CONFIG_PROVE_RCU is disabled. Am I missing something
>>>> obvious here?
>>>
>>> At first glance, it appears that I am the one who missed something obvious.
>>> Including in testing, which failed to uncover this issue.
>>>
>>> Thank you for the careful reviews!
>>
>> Sure thing, no problem!
>
> And again, thank you!
>
Again, no problem!
- Neeraj
Powered by blists - more mailing lists