lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXHpFK+=1gdo11Msw9w6gh2f-4gnSCkyA5kaB_x4mafS5A@mail.gmail.com>
Date: Tue, 20 May 2025 19:46:33 +0200
From: Ard Biesheuvel <ardb@...nel.org>
To: Borislav Petkov <bp@...en8.de>
Cc: "Kirill A. Shutemov" <kirill@...temov.name>, Ard Biesheuvel <ardb+git@...gle.com>, 
	linux-kernel@...r.kernel.org, x86@...nel.org, Ingo Molnar <mingo@...nel.org>, 
	Linus Torvalds <torvalds@...ux-foundation.org>, Brian Gerst <brgerst@...il.com>
Subject: Re: [PATCH v5 2/7] x86/mm: Use a single cache hot per-CPU variable to
 record pgdir_shift

On Tue, 20 May 2025 at 19:38, Borislav Petkov <bp@...en8.de> wrote:
>
> On Tue, May 20, 2025 at 07:03:37PM +0200, Ard Biesheuvel wrote:
> > No. But if you had read the next couple of patches, you would have
> > noticed that PGDIR_SHIFT, PTRS_PER_P4D and pgtable_l5_enabled() will
> > all be derived from this variable, and the latter currently uses code
> > patching (in cpu_feature_enabled())
> >
> > This is also explained in the cover letter btw
>
> Yes, I saw that.
>
> The question remains: are the *users* - PGDIR_SHIFT, etc - on some hot path
> which I'm not seeing?
>
> For example pgd_index() is called in a bunch of places and I guess that adds
> up. But without measuring that, we won't know for sure.
>

Look at pgtable_l5_enabled() please, that is the important one.

> Looking at an example:
>
> # ./arch/x86/include/asm/pgtable_64_types.h:32:         return this_cpu_read_stable(__pgdir_shift);
> #APP
> # 32 "./arch/x86/include/asm/pgtable_64_types.h" 1
>         movb %gs:__pgdir_shift(%rip), %cl       #, pfo_val__
> # 0 "" 2
>
> ...
>
>         movq    40(%rdi), %rax  # ppd_20(D)->vaddr, tmp128
>         shrq    %cl, %rax       # pfo_val__, tmp128
> # arch/x86/boot/startup/sme.c:105:      pgd_p = ppd->pgd + pgd_index(ppd->vaddr);
>         movq    8(%rdi), %rcx   # ppd_20(D)->pgd, ppd_20(D)->pgd
> # arch/x86/boot/startup/sme.c:105:      pgd_p = ppd->pgd + pgd_index(ppd->vaddr);
>         andl    $511, %eax      #, tmp130
> # arch/x86/boot/startup/sme.c:105:      pgd_p = ppd->pgd + pgd_index(ppd->vaddr);
>         leaq    (%rcx,%rax,8), %rsi     #, pgd_p
>
> that looks like two insns to me: the RIP-relative mov to %cl and then the
> shift.
>
> If you use a "normal" variable, that would be also two insns, no?
>
> Or am I way off?
>
> Because if not, the percpu thing doesn't buy you anything...
>

The variable access is identical in terms of instructions, the only
difference is the %gs offset being applied, and the fact that using
cache hot data is guaranteed not to increase the number of cachelines
covering the working set of any existing workload (the region is
bounded to a fixed number of cachelines)

Happy to keep this as a simple __ro_after_init variable if there is
consensus between the tip maintainers that we don't need this perf
advantage, or we can use some kind of code patching that is not CPU
feature based or ternary alternative based to short circuit these
things (and pgtable_l5_enabled() in particular) after boot.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ