lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 6 May 2024 00:45:58 +0200
From: Andrea Parri <parri.andrea@...il.com>
To: Puranjay Mohan <puranjay@...nel.org>
Cc: Will Deacon <will@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
	Boqun Feng <boqun.feng@...il.com>,
	Mark Rutland <mark.rutland@....com>,
	Paul Walmsley <paul.walmsley@...ive.com>,
	Palmer Dabbelt <palmer@...belt.com>,
	Albert Ou <aou@...s.berkeley.edu>, linux-kernel@...r.kernel.org,
	linux-riscv@...ts.infradead.org, puranjay12@...il.com
Subject: Re: [PATCH] riscv/atomic.h: optimize ops with acquire/release
 ordering

Hi Puranjay,

On Sun, May 05, 2024 at 12:33:40PM +0000, Puranjay Mohan wrote:
> Currently, atomic ops with acquire or release ordering are implemented
> as atomic ops with relaxed ordering followed by or preceded by an
> acquire fence or a release fence.
> 
> Section 8.1 of the "The RISC-V Instruction Set Manual Volume I:
> Unprivileged ISA", titled, "Specifying Ordering of Atomic Instructions"
> says:
> 
> | To provide more efficient support for release consistency [5], each
> | atomic instruction has two bits, aq and rl, used to specify additional
> | memory ordering constraints as viewed by other RISC-V harts.
> 
> and
> 
> | If only the aq bit is set, the atomic memory operation is treated as
> | an acquire access.
> | If only the rl bit is set, the atomic memory operation is treated as a
> | release access.
> 
> So, rather than using two instructions (relaxed atomic op + fence), use
> a single atomic op instruction with acquire/release ordering.
> 
> Example program:
> 
>   atomic_t cnt = ATOMIC_INIT(0);
>   atomic_fetch_add_acquire(1, &cnt);
>   atomic_fetch_add_release(1, &cnt);
> 
> Before:
> 
>   amoadd.w        a4,a5,(a4)  // Atomic add with relaxed ordering
>   fence   r,rw                // Fence to force Acquire ordering
> 
>   fence   rw,w                // Fence to force Release ordering
>   amoadd.w        a4,a5,(a4)  // Atomic add with relaxed ordering
> 
> After:
> 
>   amoadd.w.aq     a4,a5,(a4)  // Atomic add with Acquire ordering
> 
>   amoadd.w.rl     a4,a5,(a4)  // Atomic add with Release ordering
> 
> Signed-off-by: Puranjay Mohan <puranjay@...nel.org>

Your changes are effectively partially reverting:

  5ce6c1f3535fa ("riscv/atomic: Strengthen implementations with fences")

Can you please provide (and possibly include in the changelog of v2) a more
thoughtful explanation for the correctness of such revert?

(Anticipating a somewhat non-trivial analysis...)

Have you tried your changes on some actual hardware?  How did they perform?
Anything worth mentioning (besides the mere instruction count)?

  Andrea

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ