linux-kernel - Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <bcb9eb90-9261-ce96-859d-af4cc1d03baa@efficios.com>
Date:   Tue, 20 Dec 2022 22:47:04 -0500
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

On 2022-12-20 19:07, Frederic Weisbecker wrote:
> On Tue, Dec 20, 2022 at 12:00:58PM -0500, Mathieu Desnoyers wrote:
>> On 2022-12-19 20:04, Joel Fernandes wrote:
>> The main benefit I expect is improved performance of the grace period
>> implementation in common cases where there are few or no readers present,
>> especially on machines with many cpus.
>>
>> It allows scanning both periods (0/1) for each cpu within the same pass,
>> therefore loading both period's unlock counters sitting in the same cache
>> line at once (improved locality), and then loading both period's lock
>> counters, also sitting in the same cache line.
>>
>> It also allows skipping the period flip entirely if there are no readers
>> present, which is an -arguably- tiny performance improvement as well.
> 
> I would indeed expect performance improvement if there are no readers in the
> active period/idx but if there are, it's a performance penalty due to the extra
> scans.
> 
> So my mean questions are:
> 
> * Is the no-present-readers the most likely case? I guess it depends on the ssp.
> 
> * Does the SRCU update side deserve to be optimized with added code (because
>    we are not debating about removing the flip, rather about adding a fast-path
>    and keep the flip as a slow-path)
>    
> * The SRCU machinery is already quite complicated. Look how we little things lock
>    ourselves in for days doing our exegesis of SRCU state machine. And halfway
>    through it we are still debating some ordering. Is it worth adding a new path there?

I'm not arguing for making things more complex unless there are good 
reasons to do so. However I think we badly need to improve the 
documentation of the memory barriers in SRCU, because the claimed 
barrier pairing is odd.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com