linux-kernel - Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Wed, 21 Dec 2022 01:07:15 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

On Tue, Dec 20, 2022 at 12:00:58PM -0500, Mathieu Desnoyers wrote:
> On 2022-12-19 20:04, Joel Fernandes wrote:
> The main benefit I expect is improved performance of the grace period
> implementation in common cases where there are few or no readers present,
> especially on machines with many cpus.
> 
> It allows scanning both periods (0/1) for each cpu within the same pass,
> therefore loading both period's unlock counters sitting in the same cache
> line at once (improved locality), and then loading both period's lock
> counters, also sitting in the same cache line.
> 
> It also allows skipping the period flip entirely if there are no readers
> present, which is an -arguably- tiny performance improvement as well.

I would indeed expect performance improvement if there are no readers in the
active period/idx but if there are, it's a performance penalty due to the extra
scans.

So my mean questions are:

* Is the no-present-readers the most likely case? I guess it depends on the ssp.

* Does the SRCU update side deserve to be optimized with added code (because
  we are not debating about removing the flip, rather about adding a fast-path
  and keep the flip as a slow-path)
  
* The SRCU machinery is already quite complicated. Look how we little things lock
  ourselves in for days doing our exegesis of SRCU state machine. And halfway
  through it we are still debating some ordering. Is it worth adding a new path there?

Thanks.