lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Jan 2024 11:53:46 -0800
From: Boqun Feng <boqun.feng@...il.com>
To: Mark Rutland <mark.rutland@....com>
Cc: Fenghua Yu <fenghua.yu@...el.com>, Vinod Koul <vkoul@...nel.org>,
	Dave Jiang <dave.jiang@...el.com>, dmaengine@...r.kernel.org,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Nikhil Rao <nikhil.rao@...el.com>, Tony Zhu <tony.zhu@...el.com>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org
Subject: Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying
 completion record to user space

On Tue, Jan 30, 2024 at 05:58:24PM +0000, Mark Rutland wrote:
> This patch might be ok (it looks reasonable as an optimization), but I think
> the description of wmb() and smp_wmb() is incorrect. I also think that you're

Agreed. A wmb() -> smp_wmb() change can only be an optimization rather
than a fix.

> missing an rmb()/smp_rmb()eor equivalent on the reader side.
> 
> On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote:
> > wmb() is used to ensure status in the completion record is written
> > after the rest of the completion record, making it visible to the user.
> > However, on SMP systems, this may not guarantee visibility across
> > different CPUs.
> > 
> > Considering this scenario that event log handler is running on CPU1 while
> > user app is polling completion record (cr) status on CPU2:
> > 
> > 	CPU1				CPU2
> > event log handler			user app
> > 
> > 					1. cr = 0 (status = 0)
> > 2. copy X to user cr except "status"
> > 3. wmb()
> > 4. copy Y to user cr "status"
> > 					5. poll status value Y
> > 				 	6. read rest cr which is still 0.
> > 					   cr handling fails
> > 					7. cr value X visible now
> > 
> > Although wmb() ensure value Y is written and visible after X is written
> > on CPU1, the order is not guaranteed on CPU2. So user app may see status
> > value Y while cr value X is still not visible yet on CPU2. This will
> > cause reading 0 from the rest of cr and cr handling fails.
> 
> The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2
> between reading the 'status' and 'rest' parts; otherwise CPU2 (or the
> compiler!) is permitted to hoist the read of 'rest' early, before reading from
> 'status', and hence you can end up with a sequence that is effectively:
> 
> 	CPU1				CPU2
>   event log handler			user app
> 					
>   					1. cr = 0 (status = 0)
>   				 	6a. read rest cr which is still 0.
>   2. copy X to user cr except "status"
>   3. wmb()
>   4. copy Y to user cr "status"
>   					5. poll status value Y
>   					6b. cr handling fails
>   					7. cr value X visible now
> 
> Since this is all to regular cacheable memory, it's *sufficient* to use
> smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering
> fix.
> 
> Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb()
> is just a compiler barrier), and IIUC loads are not reordered w.r.t. other
> loads (and so smp_rmb() is also just a compiler barrier).
> 
> > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1
> > and CPU2. This guarantees that user app can consume cr in right order.

A barrier can only provide ordering for memory accesses on the same CPU,
so this doesn't make any sense.

> 
> This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is
> actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a
> barrier()).
> 
> Thanks,
> Mark.
> 
> > 
> > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
> > Suggested-by: Nikhil Rao <nikhil.rao@...el.com>
> > Tested-by: Tony Zhu <tony.zhu@...el.com>

Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there has
been a test w/ and w/o this patch showing it can resolve a real issue
*constantly*? If so, I think x86 might be broken somewhere.

[Cc x86 maintainers]

Regards,
Boqun

> > Signed-off-by: Fenghua Yu <fenghua.yu@...el.com>
> > ---
> >  drivers/dma/idxd/cdev.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> > index 77f8885cf407..9b7388a23cbe 100644
> > --- a/drivers/dma/idxd/cdev.c
> > +++ b/drivers/dma/idxd/cdev.c
> > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr,
> >  		 * Ensure that the completion record's status field is written
> >  		 * after the rest of the completion record has been written.
> >  		 * This ensures that the user receives the correct completion
> > -		 * record information once polling for a non-zero status.
> > +		 * record information on any CPU once polling for a non-zero
> > +		 * status.
> >  		 */
> > -		wmb();
> > +		smp_wmb();
> >  		status = *(u8 *)cr;
> >  		if (put_user(status, (u8 __user *)addr))
> >  			left += status_size;
> > -- 
> > 2.37.1
> > 
> > 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ