linux-kernel - RE: clang memcpy calls

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7c95cb9c9255448bb74d1f1f694abffb@AcuMS.aculab.com>
Date:   Thu, 24 Mar 2022 22:54:04 +0000
From:   David Laight <David.Laight@...LAB.COM>
To:     'Nick Desaulniers' <ndesaulniers@...gle.com>,
        Borislav Petkov <bp@...en8.de>
CC:     Nathan Chancellor <nathan@...nel.org>, x86-ml <x86@...nel.org>,
        lkml <linux-kernel@...r.kernel.org>,
        "llvm@...ts.linux.dev" <llvm@...ts.linux.dev>,
        Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>,
        Josh Poimboeuf <jpoimboe@...hat.com>
Subject: RE: clang memcpy calls

From: Nick Desaulniers
> Sent: 24 March 2022 18:44
> 
> On Thu, Mar 24, 2022 at 4:19 AM Borislav Petkov <bp@...en8.de> wrote:
> >
> > Hi folks,
> >
> > so I've been looking at a recent objtool noinstr warning from clang
> > builds:
> >
> > vmlinux.o: warning: objtool: sync_regs()+0x20: call to memcpy() leaves .noinstr.text section
> >
> > The issue is that clang generates a memcpy() call when a struct copy
> > happens:
> >
> >         if (regs != eregs)
> >                 *regs = *eregs;
> 
> Specifically, this is copying one struct pt_regs to another. It looks
> like the sizeof struct pt_regs is just large enough to have clang emit
> the libcall.
> https://godbolt.org/z/scx6aa8jq
> Otherwise clang will also use rep; movsq; when -mno-sse -O2 is set and
> the structs are below ARBITRARY_THRESHOLD.  Should ARBITRARY_THRESHOLD
> be raised so that we continue to inline the memcpy? *shrug*

I've just looked at some instruction timings.
For 32 byte aligned copies it actually looks like 'rep movs'
(probably movsq) is actually reasonable for large buffers
on all mainstream Intel cpu since sandy brige.
On the more recent ones it runs at 32 bytes/clock.
It may not be that bad for shorter and non 32 byte aligned
buffers as well.

Certainly I can't see a reason for calling memcpy() for
large copies!

At least no one uses P4 any more, setup latency was something
like 186 clocks!

I was thinking that 'rep movsq' only made any sense with -Os.
But it seems to be better than I thought.
(I might even have measured it running 'fast' on Ivy bridge.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)