lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1547553603.24248.49.camel@synopsys.com>
Date:   Tue, 15 Jan 2019 12:00:04 +0000
From:   Eugeniy Paltsev <eugeniy.paltsev@...opsys.com>
To:     Vineet Gupta <vineet.gupta1@...opsys.com>,
        "linux-snps-arc@...ts.infradead.org" 
        <linux-snps-arc@...ts.infradead.org>
CC:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Alexey Brodkin" <alexey.brodkin@...opsys.com>
Subject: Re: [PATCH 1/2] ARCv2: LIB: memeset: fix doing prefetchw outside of
 buffer

Hi Vineet,

On Tue, 2019-01-15 at 01:09 +0000, Vineet Gupta wrote:
> On 1/14/19 7:17 AM, Eugeniy Paltsev wrote:
> > Current ARCv2 memeset implementation may call 'prefetchw'
> > instruction for address which lies outside of memset area.
> > So we got one modified (dirty) cache line outside of memset
> > area. This may lead to data corruption if this area is used
> > for DMA IO.
> > 
> > Another issue is that current ARCv2 memeset implementation
> > may call 'prealloc' instruction for L1 cache line which
> > doesn't fully belongs to memeset area in case of 128B L1 D$
> > line length. That leads to data corruption.
> > 
> > 
 * Fix prefetchw/prealloc instructions using in case of 64B L1 data
> > cache line (default case) and don't use prefetch* instructions
> > for other possible L1 data cache line lengths (32B and 128B).
> > 
> > Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@...opsys.com>
> > ---
> >  arch/arc/lib/memset-archs.S | 30 +++++++++++++++++++++++-------
> >  1 file changed, 23 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arc/lib/memset-archs.S b/arch/arc/lib/memset-archs.S
> > index 62ad4bcb841a..c7717832336f 100644
> > --- a/arch/arc/lib/memset-archs.S
> > +++ b/arch/arc/lib/memset-archs.S
> > @@ -7,11 +7,32 @@
> >   */
> >  
> >  #include <linux/linkage.h>
> > +#include <asm/cache.h>
> >  
> >  #undef PREALLOC_NOT_AVAIL
> >  
> > +/*
> > + * The memset implementation below is optimized to use prefetchw and prealloc
> > + * instruction in case of CPU with 64B L1 data cache line (L1_CACHE_SHIFT == 6)
> > + * If you want to implement optimized memset for other possible L1 data cache
> > + * line lengths (32B and 128B) you should rewrite code carefully checking
> > + * we don't call any prefetchw/prealloc instruction for L1 cache lines which
> > + * don't belongs to memset area.
> 
> Good point. FWIW, it is possible to support those non common line lengths by using
> L1_CACHE_SHIFT etc in asm code below but I agree its not worth the trouble.
> 
> > + */
> > +#if L1_CACHE_SHIFT!=6
> > +# define PREALLOC_INSTR(...)
> > +# define PREFETCHW_INSTR(...)
> > +#else  /* L1_CACHE_SHIFT!=6 */
> > +# define PREFETCHW_INSTR(...)	prefetchw __VA_ARGS__
> > +# ifdef PREALLOC_NOT_AVAIL
> > +#  define PREALLOC_INSTR(...)	prefetchw __VA_ARGS__
> > +# else
> > +#  define PREALLOC_INSTR(...)	prealloc __VA_ARGS__
> > +# endif
> > +#endif /* L1_CACHE_SHIFT!=6 */
> > +
> >  ENTRY_CFI(memset)
> > -	prefetchw [r0]		; Prefetch the write location
> > +	PREFETCHW_INSTR([r0])	; Prefetch the first write location
> >  	mov.f	0, r2
> >  ;;; if size is zero
> >  	jz.d	[blink]
> > @@ -48,11 +69,7 @@ ENTRY_CFI(memset)
> >  
> >  	lpnz	@.Lset64bytes
> >  	;; LOOP START
> > -#ifdef PREALLOC_NOT_AVAIL
> > -	prefetchw [r3, 64]	;Prefetch the next write location
> > -#else
> > -	prealloc  [r3, 64]
> > -#endif
> > +	PREALLOC_INSTR([r3, 64]) ;Prefetch the next write location
> 
> These are not solving the issue - I'd break this up and move these bits to your
> next patch.

Actually these are solving another issue - current implementation may call
'prealloc' instruction for L1 cache line which doesn't fully belongs to
memeset area in case of 128B L1 D$ line length. As the 'prealloc' fill cache line
with zeros this leads to data corruption.

So I would better keep these changes in this 'fix' patch.


BTW, I've forgot again to add Cc: stable@...r.kernel.org, could you add it for me,
when applying patch?
Thanks.


> >  #ifdef CONFIG_ARC_HAS_LL64
> >  	std.ab	r4, [r3, 8]
> >  	std.ab	r4, [r3, 8]
> > @@ -85,7 +102,6 @@ ENTRY_CFI(memset)
> >  	lsr.f	lp_count, r2, 5 ;Last remaining  max 124 bytes
> >  	lpnz	.Lset32bytes
> >  	;; LOOP START
> > -	prefetchw   [r3, 32]	;Prefetch the next write location
> 
> So the real fix for issue at hand is this line. I'll keep this for the fix and
> beef up the changelog. Thing is existing code was already skipping the last 64B
> from the main loop (thus avoided prefetching the next line), but then reintroduced
> prefetchw is last 32B loop, spoiling the party.  That prefetchw was pointless anyways
> 
> -Vineet
-- 
 Eugeniy Paltsev

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ