[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3f739385-5bc5-6061-cf49-00c92ded3166@kernel.dk>
Date: Fri, 23 Nov 2018 23:09:24 -0700
From: Jens Axboe <axboe@...nel.dk>
To: Linus Torvalds <torvalds@...ux-foundation.org>, pabeni@...hat.com
Cc: Ingo Molnar <mingo@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, bp@...en8.de,
Peter Anvin <hpa@...or.com>,
the arch/x86 maintainers <x86@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrew Lutomirski <luto@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>, dvlasenk@...hat.com,
brgerst@...il.com,
Linux List Kernel Mailing <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] x86: only use ERMS for user copies for larger sizes
On 11/21/18 11:16 AM, Linus Torvalds wrote:
> On Wed, Nov 21, 2018 at 9:27 AM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
>>
>> It would be interesting to know exactly which copy it is that matters
>> so much... *inlining* the erms case might show that nicely in
>> profiles.
>
> Side note: the fact that Jens' patch (which I don't like in that form)
> allegedly shrunk the resulting kernel binary would seem to indicate
> that there's a *lot* of compile-time constant-sized memcpy calls that
> we are missing, and that fall back to copy_user_generic().
Other kind of side note... This also affects memset(), which does
rep stosb if we have ERMS if any size memset. I noticed this from
sg_init_table(), which does a memset of the table. For my kind of
testing, the entry size is small. The below, too, reduces memset()
overhead by 50% here for me.
diff --git a/arch/x86/lib/memset_64.S b/arch/x86/lib/memset_64.S
index 9bc861c71e75..bad0fdb9ddcd 100644
--- a/arch/x86/lib/memset_64.S
+++ b/arch/x86/lib/memset_64.S
@@ -60,6 +60,8 @@ EXPORT_SYMBOL(__memset)
* rax original destination
*/
ENTRY(memset_erms)
+ cmpl $128,%edx
+ jb memset_orig
movq %rdi,%r9
movb %sil,%al
movq %rdx,%rcx
--
Jens Axboe
Powered by blists - more mailing lists