[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whRpLyY+U9mkKo8O=2_BXNk=7sjYeObzFr3fGi0KLjLJw@mail.gmail.com>
Date: Fri, 5 Jul 2024 10:39:48 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: jolsa@...nel.org, mhiramat@...nel.org, cgzones@...glemail.com,
brauner@...nel.org, linux-kernel@...r.kernel.org, arnd@...db.de
Subject: Re: deconflicting new syscall numbers for 6.11
On Fri, 5 Jul 2024 at 09:18, Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> VM_DROPPABLE *is* actually a very useful feature. Or it at least seems
> like it could be one.
Yes. It's been discussed exactly in that "this _could_ be very useful"
sense, although we've never actually pulled the trigger.
I tried to find previous discussions on lore, but failed miserably, so
I can't point to previous discussions from long ago, but one question
was also always about whether you wanted some explicit "populate this
page range" interface together with getting a SIGBUS when it's
unpopulated (so that you can basically do demand-paging in user
space).
With just a "this could be useful" but no hard users, it never really
got anywhere.
Anyway, I really don't mind VM_DROPPABLE with "it just gets
re-populated as a new anonymous page" model, particularly since we
could easily then later decide that we could expand on it as a
MAP_SHARED thing with SIGBUS semantics and explicit initialization if
we ever really want it.
End result: I don't think there are necessariyl *lots* of users, but I
do think that this is something where some enterprising person goes "I
can use this", and makes some cool library that uses it for caching,
and then we'd be stuck with it.
> And then, indeed, it'd make sense to eventually expose this properly to
> mmap() and let people use it. (Or if you want to do that in reverse,
> adding it to mmap() first, so that people don't misuse
> vgetrandom_alloc(), that's fine.)
Yes. And it should be pretty trivial.
We just at least initially have to be very careful to limit it to
MAP_ANONYMOUS and MAP_PRIVATE. Because dropping dirty bits on shared
mappings sounds insane and like a possible source of confusion (and
thus bugs and maybe even security issues).
It's possible that we might even use a MAP_TYPE flag for this. Or make
it a PROT_xyz bit rather than a MAP_xyz.
So there's some trivial sanity checks and some UI issues to just pick,
but apart from "just pick something sane", exposing this for mmap() is
_not_ hard, and I do think it needs to be done first.
And once it's done, I think the argument for having a special system
call is basically gone too.
> - The "mechanism" needs to return allocated memory to userspace that can
> be chunked up on a per-thread basis, with no state straddling pages,
> which means it also needs to return the size of each state, and the
> number of states that were allocated.
>
> - The size of each state might change kernel version to kernel version.
Just pick a size large enough.
And why would that size not be one page?
Considering that you really don't want to rely on page-crossing state
*ANYWAY* because of the whole "one page can go away while another one
sticks around" issue, I would expect that states over one page per
thread would be a *very* questionable idea to begin with.
I don't think we'll ever see systems with page sizes smaller than 4k.
They have existed in the past, but they're not making a comeback.
People want larger pages, not smaller ones.
And the stat size rigth now is what - 200 bytes? So a single page
seems (a) sufficient and (b) kind of the sane maximum anyway due to
the dropping.
No?
Linus
Powered by blists - more mailing lists