[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4ZSorEEkUZOobAyDzkyG+DujEoUOGiMPuiqd9V3C-a39w@mail.gmail.com>
Date: Wed, 11 Oct 2023 22:00:42 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: x86@...nel.org, linux-kernel@...r.kernel.org,
Nadav Amit <namit@...are.com>,
Andy Lutomirski <luto@...nel.org>,
Brian Gerst <brgerst@...il.com>,
Denys Vlasenko <dvlasenk@...hat.com>,
"H . Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v2 -tip] x86/percpu: Use C for arch_raw_cpu_ptr()
On Wed, Oct 11, 2023 at 9:52 PM Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> On Wed, 11 Oct 2023 at 11:42, Uros Bizjak <ubizjak@...il.com> wrote:
> >
> > The attached patch was tested on a target with fsgsbase CPUID and
> > without it. It works!
>
> .. I should clearly read all my emails before answering some of them.
>
> Yes, that patch looks good to me, and I'm happy to hear that you
> actually tested it unlike my "maybe something like this".
>
> > The patch improves amd_pmu_enable_virt() in the same way as reported
> > in the original patch submission and also reduces the number of percpu
> > offset reads (either from this_cpu_off or with rdgsbase) from 1663 to
> > 1571.
>
> Dio y ou have any actka performance numbers? The patch looks good to
> me, and I *think* rdgsbase ends up being faster in practice due to
> avoiding a memory access, but that's very much a gut feel.
Unfortunately, I don't have any perf numbers, only those from Agner's
instruction tables. The memory access performance has so many
parameters, that gut feeling is the only thing besides real
case-by-case measurements. The rule of thumb in the compiler world is
also that memory access should be avoided.
Uros.
>
> > The only drawback is a larger binary size:
> >
> > text data bss dec hex filename
> > 25546594 4387686 808452 30742732 1d518cc vmlinux-new.o
> > 25515256 4387814 808452 30711522 1d49ee2 vmlinux-old.o
> >
> > that increases by 31k (0.123%), probably due to 1578 rdgsbase alternatives.
>
> I'm actually surprised that it increases the text size. The 'rdgsbase'
> instruction should be smaller than a 'mov %gs', so I would have
> expected the *data* size to increase due to the alternatives tables,
> but not the text size.
>
> [ Looks around ]
>
> Oh. It's because we put the altinstructions into the text section.
> That's kind of silly, but whatever.
>
> So I think that increase in text-size is not "real" - yes, it
> increases our binary size because we obviously have two instructions,
> but the actual *executable* part likely stays the same, and it's just
> that we grow the altinstruction metadata.
>
> Linus
Powered by blists - more mailing lists