[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250206172243-bec6d9b7-c0b5-44af-908d-c7190b63c0e4@linutronix.de>
Date: Thu, 6 Feb 2025 17:37:35 +0100
From: Thomas Weißschuh <thomas.weissschuh@...utronix.de>
To: enh <enh@...gle.com>
Cc: Jeff Xu <jeffxu@...omium.org>, Pedro Falcato <pedro.falcato@...il.com>,
Benjamin Berg <benjamin@...solutions.net>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Kees Cook <kees@...nel.org>, akpm@...ux-foundation.org, jannh@...gle.com,
torvalds@...ux-foundation.org, adhemerval.zanella@...aro.org, oleg@...hat.com,
linux-kernel@...r.kernel.org, linux-hardening@...r.kernel.org, linux-mm@...ck.org,
jorgelo@...omium.org, sroettger@...gle.com, ojeda@...nel.org, adobriyan@...il.com,
anna-maria@...utronix.de, mark.rutland@....com, linus.walleij@...aro.org, Jason@...c4.com,
deller@....de, rdunlap@...radead.org, davem@...emloft.net, hch@....de,
peterx@...hat.com, hca@...ux.ibm.com, f.fainelli@...il.com, gerg@...nel.org,
dave.hansen@...ux.intel.com, mingo@...nel.org, ardb@...nel.org, Liam.Howlett@...cle.com,
mhocko@...e.com, 42.hyeyoo@...il.com, peterz@...radead.org, ardb@...gle.com,
rientjes@...gle.com, groeck@...omium.org, mpe@...erman.id.au,
Vlastimil Babka <vbabka@...e.cz>, Andrei Vagin <avagin@...il.com>,
Dmitry Safonov <0x7f454c46@...il.com>, Mike Rapoport <mike.rapoport@...il.com>,
Alexander Mikhalitsyn <aleksandr.mikhalitsyn@...onical.com>
Subject: Re: [PATCH v4 1/1] exec: seal system mappings
On Thu, Feb 06, 2025 at 10:51:54AM -0500, enh wrote:
> On Thu, Feb 6, 2025 at 10:28 AM Thomas Weißschuh
> <thomas.weissschuh@...utronix.de> wrote:
> >
> > On Thu, Feb 06, 2025 at 09:38:59AM -0500, enh wrote:
> > > On Thu, Feb 6, 2025 at 8:20 AM Thomas Weißschuh
> > > <thomas.weissschuh@...utronix.de> wrote:
> > > >
> > > > On Fri, Jan 17, 2025 at 02:35:18PM -0500, enh wrote:
> > > > > On Fri, Jan 17, 2025 at 1:20 PM Jeff Xu <jeffxu@...omium.org> wrote:
<snip>
> > > > x86 has two additional vvar pages for virtual clocks.
> > > > (Since v6.13 even split into their own mapping)
> > > > Loongarch has per-cpu vvar data which is larger than one page.
> > > > The vdso mapping is however many pages the code ends up being compiled as,
> > > > for example on my current x86_64 distro kernel it's two pages.
> > > > In the near future, probably v6.14, vvars will be split over multiple
> > > > pages in general [0].
> > >
> > > /me checks the nearest arm64 phone ... yeah, vdso is still only one
> > > page there but vvars is already more than one.
> >
> > Probably due to CONFIG_TIME_NS, see below.
> >
> > > is there a TL;DR (or RTFM link) for why this is so big? a quick look
> > > at the x86 suggests there should only be 640 bytes of various things
> > > plus a handful of bytes for the rng, and while arm64 looks very
> > > different, that looks like it's explicitly asking for a page (with the
> > > vdso_data_store stuff)? (i've never had any reason to look at vvars
> > > before, only vdso.)
> >
> > I don't think there is any real manual.
> >
> > The vvar data is *shared* between the kernel and userspace.
> > This is done by mapping the *same* physical memory into the kernel
> > ("vdso_data_store") and (read-only) into all userspace processes.
> > As PTEs always cover a full page and the kernel can not expose random
> > other internal kernel data into userspace, the vvars need to be in their
> > own dedicated page.
> > (The same is true for the vDSO code, uprobe trampoline, etc... mappings)
> >
> > The vDSO functions also need to be aware of time namespaces. This is
> > implemented by allocating one page per namespace and mapping this
> > in place of the regular vvar page. But the vDSO still needs to access
> > the regular vvar page for some information, so both are mapped.
>
> ah, i see. yeah, that makes sense. (amusingly, i almost quipped "it's
> not like there are _that_ many clocks to go in there" in my previous
> mail, forgetting that there are effectively an unbounded number of
> clocks thanks to this feature!)
Tiny clarification:
The additional, namespaced clocks do not use additional space in the
global time vvar page. They live in a dedicated, dynamically allocated,
per-namespace page. So the used space within a vvar page does not change
at runtime and can never run out. The amount of vvar mappings per
process is also constant.
The namespaced time vvar pages have the same structure layout as the
global one, but not all fields are used and some are used differently.
Specifically the namespace pages only contain the offsets to the base
clock and the dynamic clock data is read from the global page.
<snip>
Powered by blists - more mailing lists