[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <17702e7f-479a-22b8-70d9-56e418c8120b@huawei.com>
Date: Tue, 4 Jul 2023 17:18:43 +0200
From: Petr Tesarik <petr.tesarik.ext@...wei.com>
To: Roberto Sassu <roberto.sassu@...weicloud.com>,
Jann Horn <jannh@...gle.com>
CC: Oleg Nesterov <oleg@...hat.com>, Paul Moore <paul@...l-moore.com>,
James Morris <jmorris@...ei.org>,
"Serge E. Hallyn" <serge@...lyn.com>,
Stephen Smalley <stephen.smalley.work@...il.com>,
Eric Paris <eparis@...isplace.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Mimi Zohar <zohar@...ux.ibm.com>,
Kees Cook <keescook@...omium.org>,
Casey Schaufler <casey@...aufler-ca.com>,
David Howells <dhowells@...hat.com>,
LuisChamberlain <mcgrof@...nel.org>,
Eric Biederman <ebiederm@...ssion.com>,
Christoph Hellwig <hch@...radead.org>,
Petr Mladek <pmladek@...e.com>,
Peter Zijlstra <peterz@...radead.org>,
Thomas Gleixner <tglx@...utronix.de>,
Tejun Heo <tj@...nel.org>, <linux-mm@...ck.org>,
<linux-security-module@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <keyrings@...r.kernel.org>,
<linux-integrity@...r.kernel.org>,
<linux-hardening@...r.kernel.org>
Subject: Re: [QUESTION] Full user space process isolation?
On 7/3/2023 5:28 PM, Roberto Sassu wrote:
> On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
>> On Thu, Jun 22, 2023 at 4:45 PM Roberto Sassu
>> <roberto.sassu@...weicloud.com> wrote:
>>> I wanted to execute some kernel workloads in a fully isolated user
>>> space process, started from a binary statically linked with klibc,
>>> connected to the kernel only through a pipe.
>>
>> FWIW, the kernel has some infrastructure for this already, see
>> CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
>> example in net/bpfilter/.
>
> Thanks, I actually took that code to make a generic UMD management
> library, that can be used by all use cases:
>
> https://lore.kernel.org/linux-kernel/20230317145240.363908-1-roberto.sassu@huaweicloud.com/
>
>>> I also wanted that, for the root user, tampering with that process is
>>> as hard as if the same code runs in kernel space.
>>
>> I believe that actually making it that hard would probably mean that
>> you'd have to ensure that the process doesn't use swap (in other
>> words, it would have to run with all memory locked), because root can
>> choose where swapped pages are stored. Other than that, if you mark it
>> as a kthread so that no ptrace access is allowed, you can probably get
>> pretty close. But if you do anything like that, please leave some way
>> (like a kernel build config option or such) to enable debugging for
>> these processes.
>
> I didn't think about the swapping part... thanks!
>
> Ok to enable debugging with a config option.
>
>> But I'm not convinced that it makes sense to try to draw a security
>> boundary between fully-privileged root (with the ability to mount
>> things and configure swap and so on) and the kernel - my understanding
>> is that some kernel subsystems don't treat root-to-kernel privilege
>> escalation issues as security bugs that have to be fixed.
>
> Yes, that is unfortunately true, and in that case the trustworthy UMD
> would not make things worse. On the other hand, on systems where that
> separation is defined, the advantage would be to run more exploitable
> code in user space, leaving the kernel safe.
>
> I'm thinking about all the cases where the code had to be included in
> the kernel to run at the same privilege level, but would not use any of
> the kernel facilities (e.g. parsers).
Thanks for reminding me of kexec-tools. The complete image for booting a
new kernel was originally prepared in user space. With kernel lockdown,
all this code had to move into the kernel, adding a new syscall and lots
of complexity to build purgatory code, etc. Yet, this new implementation
in the kernel does not offer all features of kexec-tools, so both code
bases continue to exist and are happily diverging...
> If the boundary is extended to user space, some of these components
> could be moved away from the kernel, and the functionality would be the
> same without decreasing the security.
All right, AFAICS your idea is limited to relatively simple cases for
now. I mean, allowing kexec-tools to run in user space is not easily
possible when UID 0 is not trusted, because kexec needs to open various
files and make various other syscalls, which would require a complex LSM
policy. It looks technically possible to write one, but then the big
question is if it would be simpler to review and maintain than adding
more kexec-tools features to the kernel.
Anyway, I can sense a general desire to run less code in the most
privileged system environment. Robert's proposal is one of few that go
in this direction. What are the alternatives?
Petr T
Powered by blists - more mailing lists