[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4f5f29d4-9c50-453c-8ad3-03a92fed192e@p183>
Date: Thu, 7 Dec 2023 17:57:05 +0300
From: Alexey Dobriyan <adobriyan@...il.com>
To: Kees Cook <keescook@...omium.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Florian Weimer <fweimer@...hat.com>,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
linux-api@...r.kernel.org, x86@...nel.org
Subject: Re: [PATCH v2] ELF: supply userspace with available page shifts
(AT_PAGE_SHIFT_MASK)
On Wed, Dec 06, 2023 at 12:47:27PM -0800, Kees Cook wrote:
> On Tue, Dec 05, 2023 at 07:01:34PM +0300, Alexey Dobriyan wrote:
> > Report available page shifts in arch independent manner, so that
> > userspace developers won't have to parse /proc/cpuinfo hunting
> > for arch specific strings:
> >
> > Note!
> >
> > This is strictly for userspace, if some page size is shutdown due
> > to kernel command line option or CPU bug workaround, than is must not
> > be reported in aux vector!
>
> Given Florian in CC, I assume this is something glibc would like to be
> using? Please mention this in the commit log.
glibc can use it. Main user is libhugetlbfs, I guess:
https://github.com/libhugetlbfs/libhugetlbfs/blob/master/hugeutils.c#L915
Loop inside getauxval() can run faster than opendir().
> > x86_64 machine with 1 GiB pages:
> >
> > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
> > 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00
> >
> > x86_64 machine with 2 MiB pages only:
> >
> > 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
> > 00000040 1d 00 00 00 00 00 00 00 00 10 20 00 00 00 00 00
> >
> > AT_PAGESZ is always 4096 which is not that interesting.
>
> That's not always true. For example, see arm64:
> arch/arm64/include/asm/elf.h:#define ELF_EXEC_PAGESIZE PAGE_SIZE
Yes, I'm x86_64 guy, AT_PAGESZ remark is about x86_64.
> I'm not actually sure why x86 forces it to 4096. I'd need to go look
> through the history there.
> > --- a/arch/x86/include/asm/elf.h
> > +++ b/arch/x86/include/asm/elf.h
> > @@ -358,6 +358,18 @@ else if (IS_ENABLED(CONFIG_IA32_EMULATION)) \
> >
> > #define COMPAT_ELF_ET_DYN_BASE (TASK_UNMAPPED_BASE + 0x1000000)
> >
> > +#define ARCH_AT_PAGE_SHIFT_MASK \
> > + do { \
> > + u32 val = 1 << 12; \
> > + if (boot_cpu_has(X86_FEATURE_PSE)) { \
> > + val |= 1 << 21; \
> > + } \
> > + if (boot_cpu_has(X86_FEATURE_GBPAGES)) { \
> > + val |= 1 << 30; \
> > + } \
> > + NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, val); \
> > + } while (0)
> > +
> > #endif /* !CONFIG_X86_32 */
>
> Can't we have a generic ARCH_AT_PAGE_SHIFT_MASK too? Something like:
>
> #ifndef ARCH_AT_PAGE_SHIFT_MASK
> #define ARCH_AT_PAGE_SHIFT_MASK
> NEW_AUX_ENT(AT_PAGE_SHIFT_MASK, 1 << PAGE_SHIFT)
> #endif
>
> Or am I misunderstanding something here?
1) Arch maintainers can opt into this new way to report information at
their own pace.
2) AT_PAGE_SHIFT_MASK is about _all_ pagesizes supported by CPU.
Reporting just one is missing the point.
I'll clarify comment: mmap() support require many things including
tests for hugetlbfs being mounted, this is about CPU support.
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -240,6 +240,9 @@ create_elf_tables(struct linux_binprm *bprm, const struct elfhdr *exec,
> > #endif
> > NEW_AUX_ENT(AT_HWCAP, ELF_HWCAP);
> > NEW_AUX_ENT(AT_PAGESZ, ELF_EXEC_PAGESIZE);
> > +#ifdef ARCH_AT_PAGE_SHIFT_MASK
> > + ARCH_AT_PAGE_SHIFT_MASK;
> > +#endif
>
> That way we can avoid an #ifdef in the .c file.
That's a false economy. ifdefs aren't bad inherently.
When all archs implement AT_PAGE_SHIFT_MASK, ifdef will be removed.
> > --- a/include/uapi/linux/auxvec.h
> > +++ b/include/uapi/linux/auxvec.h
> > @@ -33,6 +33,20 @@
> > #define AT_RSEQ_FEATURE_SIZE 27 /* rseq supported feature size */
> > #define AT_RSEQ_ALIGN 28 /* rseq allocation alignment */
> >
> > +/*
> > + * Page sizes available for mmap(2) encoded as bitmask.
> > + *
> > + * Example: x86_64 system with pse, pdpe1gb /proc/cpuinfo flags reports
> > + * 4 KiB, 2 MiB and 1 GiB page support.
> > + *
> > + * $ hexdump -C /proc/self/auxv
>
> FWIW, a more readable form is: $ LD_SHOW_AUXV=1 /bin/true
OK. It doesn't show new values as text, but OK.
> > + * 00000030 06 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00
> > + * 00000040 1d 00 00 00 00 00 00 00 00 10 20 40 00 00 00 00
> > + *
> > + * For 2^64 hugepage support please contact your Universe sales representative.
> > + */
> > +#define AT_PAGE_SHIFT_MASK 29
>
> ... hmm, why is 29 unused?
>
> > +
> > #define AT_EXECFN 31 /* filename of program */
> >
> > #ifndef AT_MINSIGSTKSZ
>
> This will need a man page update for "getauxval" as well...
Hear, hear!
Powered by blists - more mailing lists