linux-kernel - Re: [PATCH 0/7] fork: Make init and umh ordinary tasks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87r152xtko.fsf@email.froward.int.ebiederm.org>
Date:   Mon, 09 May 2022 16:52:07 -0500
From:   "Eric W. Biederman" <ebiederm@...ssion.com>
To:     Qian Cai <quic_qiancai@...cinc.com>
Cc:     <linux-arch@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Al Viro <viro@...IV.linux.org.uk>,
        Jens Axboe <axboe@...nel.dk>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linus Torvalds <torvalds@...uxfoundation.org>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/7] fork: Make init and umh ordinary tasks

Qian Cai <quic_qiancai@...cinc.com> writes:

> On Fri, May 06, 2022 at 09:11:36AM -0500, Eric W. Biederman wrote:
>> 
>> In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
>> all kthreads") caused init and the user mode helper threads that call
>> kernel_execve to have struct kthread allocated for them.
>> 
>> I believe my first patch in this series is enough to fix the bug
>> and is simple enough and obvious enough to be backportable.
>> 
>> The rest of the changes pass struct kernel_clone_args to clean things
>> up and cause the code to make sense.
>> 
>> There is one rough spot in this change.  In the init process before the
>> user space init process is exec'd there is a lot going on.  I have found
>> when async_schedule_domain is low on memory or has more than 32K callers
>> executing do_populate_rootfs will now run in a user space thread making
>> flush_delayed_fput meaningless, and __fput_sync is unusable.  I solved
>> this as I did in usermode_driver.c with an added explicit task_work_run.
>> I point this out as I have seen some talk about making flushing file
>> handles more explicit.
>
> Reverting the last 3 commits of the series fixed a boot crash.
>
> 1b2552cbdbe0 fork: Stop allowing kthreads to call execve
> 753550eb0ce1 fork: Explicitly set PF_KTHREAD
> 68d85f0a33b0 init: Deal with the init process being a user mode process

Hmm.  It looks like I missed a little detail.

task_tick_fair
  task_tick_numa
    task_scan_start
      task_scan_min
        task_nr_scan_windows
          p->mm

If I read this code right task_tick_numa makes the assumption that only
tasks with PF_KTHREAD set don't have an mm.

This should fix the failure.  For init we could possibly populate .mm
and not just .active_mm.  For user mode helpers cloned from kernel
threads I don't think that is a realistic option.  So I think this
is going to be the proper fix.

I believe this only happens when numa rebalancing happens at an
unfortunate moment.

Qian Cai can you test this?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..db6f0df9d43e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2915,7 +2915,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr)
        /*
         * We don't care about NUMA placement if we don't have memory.
         */
-       if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
+       if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
                return;
 
        /*


Eric