[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87pm1kbiou.fsf@email.froward.int.ebiederm.org>
Date: Wed, 11 Oct 2023 22:53:05 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Brian Geffon <bgeffon@...gle.com>
Cc: Kees Cook <keescook@...omium.org>,
Christian Brauner <brauner@...nel.org>,
"Rafael J . Wysocki" <rafael@...nel.org>,
Matthias Kaehlcke <mka@...omium.org>,
Luis Chamberlain <mcgrof@...nel.org>,
Frederic Weisbecker <frederic@...nel.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] pid: Allow frozen userspace to reboot from non-init pid ns
Brian Geffon <bgeffon@...gle.com> writes:
> On Fri, Sep 29, 2023 at 4:09 PM Kees Cook <keescook@...omium.org> wrote:
>>
>> On Fri, Sep 29, 2023 at 01:44:42PM -0400, Brian Geffon wrote:
>> > When the system has a frozen userspace, for example, during hibernation
>> > the child reaper task will also be frozen. Attmepting to deliver a
>> > signal to it to handle the reboot(2) will ultimately lead to the system
>> > hanging unless userspace is thawed.
>> >
>> > This change checks if the current task is the suspending task and if so
>> > it will allow it to proceed with a reboot from the non-init pid ns.
>>
>> I don't know the code flow too well here, but shouldn't init_pid_ns
>> always be doing the reboot regardless of anything else?
>
> I think the point of this is, normally the reaper is runnable and so
> an appropriate signal will be delivered allowing them to also clean up
> [2]. In our case, they won't be runnable and doing this wouldn't make
> sense.
The entire reboot_pid_ns thing is just a polite way of keeping
applications like /sbin/reboot working inside a pid namespace.
Ordinarily the process calling reboot (inside the container) won't
have the privileges to request an entire system reboot. So I don't
see anything making sense to promote that reboot into a system-wide
reboot.
Which leads me to the question. What is actually happening with
hibernation that we want something inside a pid namespace to somehow
have the permissions to reboot the entire machine?
>> Also how is this syscall running if current is frozen? This feels weird
>> to me... shouldn't the frozen test be against pid_ns->child_reaper
>> instead of current?
>
> The task which froze the system won't be frozen to make sure this
> happens it will have the flag PF_SUSPEND_TASK added, so we know if we
> have this flag we're the only running user space task [1].
Someone has a task inside a container that is successfully suspending
the entire system?
I don't see how that makes sense.
But on the level that it somehow does I would put a test in
kernel/reboot.c something like:
/*
* If the caller can't perform a normal reboot call
* reboot_pid_ns
*/
if ((pid_ns != &init_pid_ns) &&
!((current->flags & PF_SUSPEND_TASK) && capable(CAP_SYS_BOOT))) {
return reboot_pid_ns(pid_ns, cmd);
}
Making reboot_pid_ns responsible for the logic that should be bypassing
it is quite confusing.
> I hope my understanding is correct and it makes sense. Thanks for
> taking the time to review.
>
> Brian
>
> 1. https://elixir.bootlin.com/linux/latest/source/kernel/power/process.c#L130
> 2. https://elixir.bootlin.com/linux/latest/source/kernel/pid_namespace.c#L327
I really don't know if allowing PF_SUSPEND_TASK so that hibernation and
the like can work from inside a container makes any sense at all.
But the above is roughly how I would make it work.
Eric
Powered by blists - more mailing lists