lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 21 Dec 2022 01:07:15 +0100
From:   Frederic Weisbecker <frederic@...nel.org>
To:     Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org,
        Josh Triplett <josh@...htriplett.org>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        "Paul E. McKenney" <paulmck@...nel.org>, rcu@...r.kernel.org,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [RFC 0/2] srcu: Remove pre-flip memory barrier

On Tue, Dec 20, 2022 at 12:00:58PM -0500, Mathieu Desnoyers wrote:
> On 2022-12-19 20:04, Joel Fernandes wrote:
> The main benefit I expect is improved performance of the grace period
> implementation in common cases where there are few or no readers present,
> especially on machines with many cpus.
> 
> It allows scanning both periods (0/1) for each cpu within the same pass,
> therefore loading both period's unlock counters sitting in the same cache
> line at once (improved locality), and then loading both period's lock
> counters, also sitting in the same cache line.
> 
> It also allows skipping the period flip entirely if there are no readers
> present, which is an -arguably- tiny performance improvement as well.

I would indeed expect performance improvement if there are no readers in the
active period/idx but if there are, it's a performance penalty due to the extra
scans.

So my mean questions are:

* Is the no-present-readers the most likely case? I guess it depends on the ssp.

* Does the SRCU update side deserve to be optimized with added code (because
  we are not debating about removing the flip, rather about adding a fast-path
  and keep the flip as a slow-path)
  
* The SRCU machinery is already quite complicated. Look how we little things lock
  ourselves in for days doing our exegesis of SRCU state machine. And halfway
  through it we are still debating some ordering. Is it worth adding a new path there?

Thanks.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ