[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHEDAzOrndyJeb7L95cMPmHHk0O5wk=tRCe35FE6GVYs1w@mail.gmail.com>
Date: Thu, 20 Mar 2025 19:02:21 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Herton Krzesinski <hkrzesin@...hat.com>
Cc: x86@...nel.org, tglx@...utronix.de, mingo@...hat.com, bp@...en8.de,
dave.hansen@...ux.intel.com, hpa@...or.com, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org, olichtne@...hat.com, atomasov@...hat.com,
aokuliar@...hat.com
Subject: Re: [PATCH] x86: write aligned to 8 bytes in copy_user_generic (when
without FSRM/ERMS)
On Thu, Mar 20, 2025 at 6:51 PM Herton Krzesinski <hkrzesin@...hat.com> wrote:
>
> On Thu, Mar 20, 2025 at 11:36 AM Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > On Thu, Mar 20, 2025 at 3:22 PM Herton R. Krzesinski <herton@...hat.com> wrote:
> > > diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
> > > index fc9fb5d06174..b8f74d80f35c 100644
> > > --- a/arch/x86/lib/copy_user_64.S
> > > +++ b/arch/x86/lib/copy_user_64.S
> > > @@ -74,6 +74,24 @@ SYM_FUNC_START(rep_movs_alternative)
> > > _ASM_EXTABLE_UA( 0b, 1b)
> > >
> > > .Llarge_movsq:
> > > + /* Do the first possibly unaligned word */
> > > +0: movq (%rsi),%rax
> > > +1: movq %rax,(%rdi)
> > > +
> > > + _ASM_EXTABLE_UA( 0b, .Lcopy_user_tail)
> > > + _ASM_EXTABLE_UA( 1b, .Lcopy_user_tail)
> > > +
> > > + /* What would be the offset to the aligned destination? */
> > > + leaq 8(%rdi),%rax
> > > + andq $-8,%rax
> > > + subq %rdi,%rax
> > > +
> > > + /* .. and update pointers and count to match */
> > > + addq %rax,%rdi
> > > + addq %rax,%rsi
> > > + subq %rax,%rcx
> > > +
> > > + /* make %rcx contain the number of words, %rax the remainder */
> > > movq %rcx,%rax
> > > shrq $3,%rcx
> > > andl $7,%eax
> >
> > The patch looks fine to me, but there is more to do if you are up for it.
> >
> > It was quite some time since I last seriously played with the area and
> > I don't remember all the details, on top of that realities of uarchs
> > probably improved.
> >
> > That said, have you experimented with aligning the target to 16 bytes
> > or more bytes?
>
> Yes I tried to do 32-byte write aligned on an old Xeon (Sandy Bridge based)
> and got no improvement at least in the specific benchmark I'm doing here.
> Also after your question here I tried 16-byte/32-byte on the AMD cpu as
> well and got no difference from the 8-byte alignment, same bench as well.
> I tried to do 8-byte alignment for the ERMS case on Intel and got no
> difference on the systems I tested. I'm not saying it may not improve in
> some other case, just that in my specific testing I couldn't tell/measure
> any improvement.
>
oof, I would not got as far back as Sandy Bridge. ;)
I think Skylake is the oldest yeller to worry about, if one insists on it.
That said, if memory serves right these bufs like to be misaligned to
weird extents, it very well may be in your tests aligning to 8 had a
side effect of aligning it to 16 even.
> >
> > Moreover, I have some recollection that there were uarchs with ERMS
> > which also liked the target to be aligned -- as in perhaps this should
> > be done regardless of FSRM?
>
> Where I tested I didn't see improvements but may be there is some case,
> but I didn't have any.
>
> >
> > And most importantly memset, memcpy and clear_user would all use a
> > revamp and they are missing rep handling for bigger sizes (I verified
> > they *do* show up). Not only that, but memcpy uses overlapping stores
> > while memset just loops over stuff.
> >
> > I intended to sort it out long time ago and maybe will find some time
> > now that I got reminded of it, but I would be deligthed if it got
> > picked up.
> >
> > Hacking this up is just some screwing around, the real time consuming
> > part is the benchmarking so I completely understand if you are not
> > interested.
>
> Yes, the most time you spend is on benchmarking. May be later I could
> try to take a look but will not put any promises on it.
>
Now I'm curious enough what's up here. If I don't run out of steam,
I'm gonna cover memset and memcpy myself.
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists