lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84b6be23-edba-4bf1-8d9e-1ecb59eceaca@amd.com>
Date: Tue, 12 Nov 2024 08:45:17 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: paulmck@...nel.org
Cc: frederic@...nel.org, rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
 kernel-team@...a.com, rostedt@...dmis.org,
 kernel test robot <oliver.sang@...el.com>,
 Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
 Peter Zijlstra <peterz@...radead.org>,
 Kent Overstreet <kent.overstreet@...ux.dev>, bpf@...r.kernel.org
Subject: Re: [PATCH rcu 06/12] srcu: Add srcu_read_lock_lite() and
 srcu_read_unlock_lite()




>>>> I am trying to understand the (unlikely) case where synchronize_srcu() is done before any
>>>> srcu reader lock/unlock lite call is done. Can new SRCU readers fail to observe the
>>>> updates?
>>>
>>> If a SRCU reader fail to observe the index flip, then isn't it the case
>>> that the synchronize_rcu() invoked from srcu_readers_active_idx_check()
>>> must wait on it?
>>
>> Below is the sequence of operations I was thinking of, where at step 4 CPU2
>> reads old pointer
>>
>> ptr = old
>>
>>
>> CPU1                                         CPU2
>>
>> 1. Update ptr = new
>>
>> 2. synchronize_srcu()
>>
>> <Does not use synchronize_rcu()
>>  as SRCU_READ_FLAVOR_LITE is not
>>  set for any sdp as srcu_read_lock_lite()
>>  hasn't been called by any CPU>
>>
>>                                       3. srcu_read_lock_lite()
>>                                         <No smp_mb() ordering>
>>
>>                                       4.  Can read ptr == old ?
> 
> As long as the kernel was built with CONFIG_PROVE_RCU=y and given a fix
> to the wrong-CPU issue you quite rightly point out below, no it cannot.
> The CPU's first call to srcu_read_lock_lite() will use cmpxchg() to update
> ->srcu_reader_flavor, which will place full ordering between that update
> and the later read from "ptr".
> 
> So if the synchronize_srcu() is too early to see the SRCU_READ_FLAVOR_LITE
> bit, then the reader must see the new value of "ptr".  Similarly,
> if the reader can see the old value of "ptr", then synchronize_srcu()
> must see the reader's setting of the SRCU_READ_FLAVOR_LITE bit.
> 
> But both the CONFIG_PROVE_RCU=n and the wrong-CPU issue must be fixed
> for this to work.  Please see the upcoming patches to be posted as a
> reply to this email.
> 

Got it. It will be good to document how full ordering provided by cmpxchg()
helps here.


>>>>> +	if (mask & SRCU_READ_FLAVOR_LITE && !gp)
>>>>> +		return false;
>>>>
>>>> So, srcu_readers_active_idx_check() can potentially return false for very long
>>>> time, until the CPU executing srcu_readers_active_idx_check() does
>>>> at least one read lock/unlock lite call?
>>>
>>> That is correct.  The theory is that until after an srcu_read_lock_lite()
>>> has executed, there is no need to wait on it.  Does the practice match the
>>> theory in this case, or is there some sequence of events that I missed?
>>
>> Below sequence
>>
>> CPU1                     CPU2     
>>                        1. srcu_read_lock_lite()
>>                        
>>                        
>>                        2. srcu_read_unlock_lite()
>>
>> 3. synchronize_srcu()
>>
>> 3.1 srcu_readers_lock_idx() is
>> called with gp = false as
>> srcu_read_lock_lite() was never
>> called on this CPU for this
>> srcu_struct. So
>> ssp->sda->srcu_reader_flavor is not
>> set for CPU1's sda.
> 
> Good eyes!  Yes, the scan that sums the ->srcu_unlock_count[] counters
> must also OR together the ->srcu_reader_flavor fields.
> 

Agree

>> 3.2 Inside srcu_readers_lock_idx()
>> "mask" contains SRCU_READ_FLAVOR_LITE
>> as CPU2's sdp->srcu_reader_flavor has it.
>>
>> 3.3 CPU1 keeps returning false from
>> below check until CPU1 does at least
>> one srcu_read_lock_lite() call or
>> the thread migrates.
>>
>> if (mask & SRCU_READ_FLAVOR_LITE && !gp)
>>   return false;
> 
> This is also fixed by the OR of the ->srcu_reader_flavor fields, correct?
> 

Yes, agree.

> I guess I could claim that this bug prevents the wrong-CPU bug above
> from resulting in a too-short SRCU grace period, but it is of course
> better to just fix the bugs.  ;-)
> 
>>>>> +	return sum == unlocks;
>>>>>  }
>>>>>  
>>>>>  /*
>>>>> @@ -473,6 +482,7 @@ static unsigned long srcu_readers_unlock_idx(struct srcu_struct *ssp, int idx)
>>>>>   */
>>>>>  static bool srcu_readers_active_idx_check(struct srcu_struct *ssp, int idx)
>>>>>  {
>>>>> +	bool did_gp = !!(raw_cpu_read(ssp->sda->srcu_reader_flavor) & SRCU_READ_FLAVOR_LITE);
>>>>
>>>> sda->srcu_reader_flavor is only set when CONFIG_PROVE_RCU is enabled. But we
>>>> need the reader flavor information for srcu lite variant to work. So, lite
>>>> variant does not work when CONFIG_PROVE_RCU is disabled. Am I missing something
>>>> obvious here?
>>>
>>> At first glance, it appears that I am the one who missed something obvious.
>>> Including in testing, which failed to uncover this issue.
>>>
>>> Thank you for the careful reviews!
>>
>> Sure thing, no problem!
> 
> And again, thank you!
> 

Again, no problem!


- Neeraj

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ