linux-kernel - Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bb794e83-992e-8181-d9b9-acc68536ce5a@efficios.com>
Date:   Wed, 21 Dec 2022 12:11:42 -0500
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

On 2022-12-21 06:59, Frederic Weisbecker wrote:
> On Tue, Dec 20, 2022 at 10:34:19PM -0500, Mathieu Desnoyers wrote:
[...]
>>
>> The memory ordering constraint I am concerned about here is:
>>
>>   * [...] In addition,
>>   * each CPU having an SRCU read-side critical section that extends beyond
>>   * the return from synchronize_srcu() is guaranteed to have executed a
>>   * full memory barrier after the beginning of synchronize_srcu() and before
>>   * the beginning of that SRCU read-side critical section. [...]
>>
>> So if we have a SRCU read-side critical section that begins after the beginning
>> of synchronize_srcu, but before its first memory barrier, it would miss the
>> guarantee that the full memory barrier is issued before the beginning of that
>> SRCU read-side critical section. IOW, that memory barrier needs to be at the
>> very beginning of the grace period.
> 
> I'm confused, what's wrong with this ?
> 
> UPDATER                  READER
> -------                  ------
> STORE X = 1              STORE srcu_read_lock++
> // rcu_seq_snap()        smp_mb()
> smp_mb()                 READ X
> // scans
> READ srcu_read_lock

What you refer to here is only memory ordering of the store to X and 
load from X wrt loading/increment of srcu_read_lock, which is internal 
to the srcu implementation. If we really want to model the provided 
high-level memory ordering guarantees, we should consider a scenario 
where SRCU is used for its memory ordering properties to synchronize 
other variables.

I'm concerned about the following Dekker scenario, where 
synchronize_srcu() and srcu_read_lock/unlock would be used instead of 
memory barriers:

Initial state: X = 0, Y = 0

Thread A                   Thread B
---------------------------------------------
STORE X = 1                STORE Y = 1
synchronize_srcu()
                            srcu_read_lock()
                            r1 = LOAD X
                            srcu_read_unlock()
r0 = LOAD Y

BUG_ON(!r0 && !r1)

So in the synchronize_srcu implementation, there appears to be two
major scenarios: either srcu_gp_start_if_needed starts a gp or expedited 
gp, or it uses an already started gp/expedited gp. When snapshotting 
with rcu_seq_snap, the fact that the memory barrier is after the 
ssp->srcu_gp_seq load means that it does not order prior memory accesses 
before that load. This sequence value is then used to identify which 
gp_seq to wait for when piggy-backing on another already-started gp. I 
worry about reordering between STORE X = 1 and load of ssp->srcu_gp_seq, 
which is then used to piggy-back on an already-started gp.

I suspect that the implicit barrier in srcu_read_lock() invoked at the
beginning of srcu_gp_start_if_needed() is really the barrier that makes
all this behave as expected. But without documentation it's rather hard 
to follow.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com