lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZogcxjLv3NAeQYvA@zx2c4.com>
Date: Fri, 5 Jul 2024 18:18:14 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: jolsa@...nel.org, mhiramat@...nel.org, cgzones@...glemail.com,
	brauner@...nel.org, linux-kernel@...r.kernel.org, arnd@...db.de
Subject: Re: deconflicting new syscall numbers for 6.11

Hi Linus,

On Thu, Jul 04, 2024 at 02:07:41PM -0700, Linus Torvalds wrote:
> On Thu, 4 Jul 2024 at 12:19, Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > Bah. I guess I'll have to walk through the patch series once again.

Thanks for having a look. I really appreciate it.

> Ok, I went through it once. First comments:
> 
> The system call additions look really random. You don't add them to
> all architectures, but the ones you *do* add them to seem positively
> pointless:
> 
>  - I don't think you should introduce the system all on 32-bit
> architectures, and that includes as a compat call on 64-bit.
> 
>     The VM_DROPPABLE infrastructure doesn't even exist on 32-bit, and
> while that might not be technically a requirement, it does seem to
> argue against doing this on 32-bit architectures. Plus nobody sane
> cares.
> 
>     You didn't even enable it on 32-bit x86 in the vdso, so why did
> you enable it as a syscall?
> 
>  - even 64-bit architectures don't necessarily have anything like a
> vdso, eg alpha.
> 
> It looks like you randomly just picked the architectures that have a
> syscall.tbl file, rather than architectures where this made sense. I
> thin kyou should drop all of them except possibly arm64, s390 and
> powerpc.

The first versions of my series actually only enabled it on x86.
(Somebody also wrote an arm64 implementation of all this already, but
that's for later.) But after I posted that, people (Arnd, I think?) told
me I should add it to all architectures to "reserve" the number. That
was a lot of annoying busy work to do, but I did it, and not just random
archs, but *all* of them.

I'd be happy to revert all this and just enable it on x86. I'll do that
for the v+1 patch. It's less work for me and would make this series one
patch less.

But there might be a conversation to have (that I think you've begun
with Arnd) about what the expectations are for this, because the "enable
it on all of them" seems to be something I've heard on more than one
occasion.


> I'm very ambivalent about the VM_DROPPABLE code.
> 
> On one hand, it's something we've discussed many times, and I don't
> hate it. On the other hand, the discussions have always been about
> actually exposing it to user space as a MAP_DROPPABLE so that user
> space can do caching.
> 
> In fact, I'm almost certain that *because* you didn't expose it to
> mmap(), people will now then instead mis-use vgetrandom_alloc()
> instead to allocate random MAP_DROPPABLE pages. That is going to be a
> nightmare.

VM_DROPPABLE *is* actually a very useful feature. Or it at least seems
like it could be one. One can imagine various database caches that do a
memory vs cpu trade off using it. (But, to be clear, I've never actually
spoken with database developers about it.)

There are some other improvements for it I have in mind that I was
considering posting in some time when this work here has settled.

And then, indeed, it'd make sense to eventually expose this properly to
mmap() and let people use it. (Or if you want to do that in reverse,
adding it to mmap() first, so that people don't misuse
vgetrandom_alloc(), that's fine.)


> And that nightmare has to be avoided. Which in turn means that I think
> vgetrandom_alloc() has to go, and you just need to expose
> MAP_DROPPABLE instead that obly works for private anonymous mappings,
> and make sure glibc uses that.
> 
> Because as your patch series stands now, the semantics are unacceptable.
> 
> This is a non-starter. When I see a new system call where my reaction
> is not just "this should have been just a mmap()", but then
> immediately followed by "Oh, and people will mis-use this as a cool
> mmap", I'm not merging that system call.
> 
> So I don't hate VM_DROPPABLE per se, but the interface is simply not
> ok. vgetrandom_alloc() absolutely *has* to go, and needs to just be a
> user-space wrapper around regular mmap.

So I'm not wedded to adding a syscall for this and am pretty open to
other ways of doing it, but I actually think given the requirements,
this kind of makes sense. I was talking about this problem with tglx or
with Greg a while back, kind of frustrated, and one of them suggested,
"well just make it a syscall; that's what those are for," and it
immediately made sense, and so that's what I've done. Here are the
requirements:

- The "mechanism" needs to return allocated memory to userspace that can
  be chunked up on a per-thread basis, with no state straddling pages,
  which means it also needs to return the size of each state, and the
  number of states that were allocated.

- The size of each state might change kernel version to kernel version.

- In an effort to match the behaviors of syscall getrandom() as much as
  possible, it needs to be mapped with various flags (the ones in the
  current vgetrandom_alloc() implementation).

- Which flags are needed might change kernel version to kernel version.

- Future memory tagging CPU extensions might allow us to prevent the
  memory from being accessed unless the accesses are coming from vDSO
  code, which would avoid heartbleed-like bugs. This is very appealing.

So, the memory that's returned, and the parameters about it are sort of
tied to the actual [v]getrandom() implementation. That sounds to me like
this should be done by a function that the kernel is in charge of. Hence
the syscall. (Or a vDSO function, but then it wouldn't correspond with
an equivalent syscall, which might not be appealing to tglx, and it
starts to smell like "library code" which we really don't want.)

Given this, it seemed like a syscall was the cleanest most cromulent
solution. But if you have other suggestions, I'm open to it.

Maybe, though, the best way of assuaging your concerns would be to
expose MAP_DROPPABLE in mmap() in the same series as the rest, so that
there *isn't* a chance that vgetrandom_alloc() will be abused when
people realize it's a handy feature to have.

Thoughts?

Jason

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ