lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7c95cb9c9255448bb74d1f1f694abffb@AcuMS.aculab.com>
Date:   Thu, 24 Mar 2022 22:54:04 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Nick Desaulniers' <ndesaulniers@...gle.com>,
        Borislav Petkov <bp@...en8.de>
CC:     Nathan Chancellor <nathan@...nel.org>, x86-ml <x86@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        "llvm@...ts.linux.dev" <llvm@...ts.linux.dev>,
        Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: RE: clang memcpy calls

From: Nick Desaulniers
> Sent: 24 March 2022 18:44
> 
> On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <bp@...en8.de> wrote:
> >
> > Hi folks,
> >
> > so I've been looking at a recent objtool noinstr warning from clang
> > builds:
> >
> > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> >
> > The issue is that clang generates a memcpy() call when a struct copy
> > happens:
> >
> >         if (regs != eregs)
> >                 *regs = *eregs;
> 
> Specifically, this is copying one struct pt_regs to another. It looks
> like the sizeof struct pt_regs is just large enough to have clang emit
> the libcall.
> https://godbolt.org/z/scx6aa8jq
> Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> the structs are below ARBITRARY_THRESHOLD.  Should ARBITRARY_THRESHOLD
> be raised so that we continue to inline the memcpy? *shrug*

I've just looked at some instruction timings.
For 32 byte aligned copies it actually looks like 'rep movs'
(probably movsq) is actually reasonable for large buffers
on all mainstream Intel cpu since sandy brige.
On the more recent ones it runs at 32 bytes/clock.
It may not be that bad for shorter and non 32 byte aligned
buffers as well.

Certainly I can't see a reason for calling memcpy() for
large copies!

At least no one uses P4 any more, setup latency was something
like 186 clocks!

I was thinking that 'rep movsq' only made any sense with -Os.
But it seems to be better than I thought.
(I might even have measured it running 'fast' on Ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ