[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190211124716.GA13062@gmail.com>
Date: Mon, 11 Feb 2019 13:47:16 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Alexey Dobriyan <adobriyan@...il.com>
Cc: tglx@...utronix.de, mingo@...hat.com, bp@...en8.de, hpa@...or.com,
x86@...nel.org, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org
Subject: Re: [PATCH v-1] x86_64: new and improved memset() + question
* Alexey Dobriyan <adobriyan@...il.com> wrote:
> Current memset() implementation does silly things:
> * multiplication to get wide constant:
> waste of cycles if filler is known at compile time,
>
> * REP STOSQ followed by REP STOSB:
> this code is used when REP STOSB is slow but still it is used
> for small length (< 8) when setup overhead is relatively big,
>
> * suboptimal calling convention:
> REP STOSB/STOSQ favours (rdi, rcx)
>
> * memset_orig():
> it is hard to even look at it :^)
>
> New implementation is based on the following observations:
> * c == 0 is the most common form,
> filler can be done with "xor eax, eax" and pushed into memset()
> saving 2 bytes per call and multiplication
>
> * len divisible by 8 is the most common form:
> all it takes is one pointer or unsigned long inside structure,
> dispatch at compile time to code without those ugly "lets fill
> at most 7 bytes" tails,
>
> * multiplication to get wider filler value can be done at compile time
> for "c != 0" with 1 insn/10 bytes at most saving multiplication.
>
> * those leaner forms of memset can be done withing 3/4 registers (RDI,
> RCX, RAX, [RSI]) saving the rest from clobbering.
Ok, sorry about the belated reply - all that sounds like very nice
improvements!
> Note: "memset0" name is chosen because "bzero" is officially deprecated.
> Note: memset(,0,) form is interleaved into memset(,c,) form to save
> space.
>
> QUESTION: is it possible to tell gcc "this function is semantically
> equivalent to memset(3) so make high level optimizations but call it
> when it is necessary"? I suspect the answer is "no" :-\
No idea ...
> TODO:
> CONFIG_FORTIFY_SOURCE is enabled by distros
> benchmarks
> testing
> more comments
> check with memset_io() so that no surprises pop up
I'd only like to make happy noises here to make sure you continue with
this work - it does look promising. :-)
Thanks,
Ingo
Powered by blists - more mailing lists