linux-kernel - Re: [PATCH v13 00/13] nommu UML

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <defcec3945fbc37e90070b030bf1596b11b6d926.camel@sipsolutions.net>
Date: Tue, 25 Nov 2025 10:58:53 +0100
From: Johannes Berg <johannes@...solutions.net>
To: Hajime Tazaki <thehajime@...il.com>
Cc: hch@...radead.org, linux-um@...ts.infradead.org, ricarkol@...gle.com, 
	Liam.Howlett@...cle.com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v13 00/13] nommu UML

On Wed, 2025-11-12 at 17:52 +0900, Hajime Tazaki wrote:
> > >   What is it for ?
> > >   ================
> > >   
> > >   - Alleviate syscall hook overhead implemented with ptrace(2)
> > >   - To exercises nommu code over UML (and over KUnit)
> > >   - Less dependency to host facilities
> > 
> > FWIW, in some way, this order of priorities is exactly why this hasn't
> > been going anywhere, and every time I looked at it I got somewhat
> > annoyed by what seems to me like choices made to support especially the
> > first bullet.
> 
> over the past versions, I've been emphasized that the 2nd bullet (testing)
> is the primary usecase as I saw several actually cases from mm folks,
> 
> https://lists.infradead.org/pipermail/maple-tree/2024-November/003775.html
> https://lore.kernel.org/all/cb1cf0be-871d-4982-9a1b-5fdd54deec8d@lucifer.local/
> 
> and I think this is not limited to mm code.

Not sure there's much value in testing much else in no-MMU, but sure,
I'll give you that it's useful for testing.

> other 2 bullets are additional benefits which we observed in a
> comment, and our experience.

But are they really _worthwhile_ benefits? A lot of this design adds
additional complexity, and it doesn't really seem necessary for the
testing use case. Making it faster is nice, but it's not like the
speedup really is 20x for arbitrary tests, that's just for corner cases
like "sit in a loop of gettimeofday()". And for kunit there's no syscall
boundary at all, so there's no speedup.

> > I suspect that the first and third bullet are not even really true any
> > more, since you moved to seccomp (per our request), yet I think design
> > choices influenced by them persist.
> 
> this observation is not true; the first bullet is still true even
> using seccomp.  please look at the benchmark result in the patch
> [12/13], quoted below.

> [snip]

So thanks for the correction. If that's the case, however, it means the
speedup can't be due to the syscall boundary itself (seccomp) but must
rather be due to some pagefault/mapping handling issue? Which would be
inherent in no-MMU, even taking an approach of using two host processes
rather than embedding everything into one.

> > However, I'm not yet convinced that all of the complexities presented in
> > this patchset (such as completely separate seccomp implementation) are
> > actually necessary in support of _just_ the second bullet. These seem to
> > me like design choices necessary to support the _first_ bullet [1].
> 
> separate seccomp implementation is indeed needed due to the design
> choice we made, to use a single process to host a (um) userspace.

That sounds misleading or even wrong to me, I'd say it's due to putting
the (um) userspace in the same host process as the kernel space?

> I don't see why you see this as a _complexity_, as functionally both
> seccomp handling don't interfere each other.

The complexity isn't so much in the separate code, which is a small
factor, but in the "put everything into the same process" aspect of it.
That has consequences around the host context state handling, things we
didn't really need to consider before suddenly become crucially
important. In the current (with-MMU) design, we only need to worry about
being able to correctly switch between userspace tasks/threads within a
userspace mm (host) process. With the no-MMU design you propose, we also
need to be able to correctly switch between kernel and userspace tasks
within the same single (host) process.

I think this is a pretty significant difference, and saying "there's no
complexity here" is simply pretending it isn't a relevant difference. I
believe you're not even handling this correctly right now in this patch
set, specifically wrt. the GS register which has been pointed out
before, but I wouldn't say that I even have a complete picture in my
head over what state handling would be necessary and sufficient.

So yeah, I think this warrants taking another look as to whether or not
the approach of putting everything into the same host process is even
worth it. I tend to believe that it isn't, given the use cases. And if
you say the speedup still is with seccomp, that kills the speed argument
too.

> > I've thought about what would happen if we stuck to creating a (single)
> > separate process on the host to execute userspace, and just used
> > CLONE_VM for it. That way, it's still no-MMU with full memory access,
> > but there's some implicit isolation between the kernel and userspace
> > processes which will likely remove complexities around FP/SSE/AVX
> > handling, may completely remove the need for a separate seccomp
> > implementation, etc.
> 
> this would be doable I think, but we went the different way, as
> using separate host processes (with ptrace/seccomp) is slow and add
> complexity by the synchronization between processes, which we think
> it's not easy to maintain in the future.

Which one is it then, slow or not? Not sure I follow. You just said you
do have seccomp when comparing speeds, so that in itself doesn't make it
slow. What synchronization? It'd (have to) be CLONE_VM, but that
actually _simplifies_ state transfer/synchronization, and we already
have (to have) state transfer between different userspace threads in the
same host process for the with-MMU case.

johannes