[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87r152xtko.fsf@email.froward.int.ebiederm.org>
Date: Mon, 09 May 2022 16:52:07 -0500
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Qian Cai <quic_qiancai@...cinc.com>
Cc: <linux-arch@...r.kernel.org>, Tejun Heo <tj@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Al Viro <viro@...IV.linux.org.uk>,
Jens Axboe <axboe@...nel.dk>,
Thomas Gleixner <tglx@...utronix.de>,
Linus Torvalds <torvalds@...uxfoundation.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 0/7] fork: Make init and umh ordinary tasks
Qian Cai <quic_qiancai@...cinc.com> writes:
> On Fri, May 06, 2022 at 09:11:36AM -0500, Eric W. Biederman wrote:
>>
>> In commit 40966e316f86 ("kthread: Ensure struct kthread is present for
>> all kthreads") caused init and the user mode helper threads that call
>> kernel_execve to have struct kthread allocated for them.
>>
>> I believe my first patch in this series is enough to fix the bug
>> and is simple enough and obvious enough to be backportable.
>>
>> The rest of the changes pass struct kernel_clone_args to clean things
>> up and cause the code to make sense.
>>
>> There is one rough spot in this change. In the init process before the
>> user space init process is exec'd there is a lot going on. I have found
>> when async_schedule_domain is low on memory or has more than 32K callers
>> executing do_populate_rootfs will now run in a user space thread making
>> flush_delayed_fput meaningless, and __fput_sync is unusable. I solved
>> this as I did in usermode_driver.c with an added explicit task_work_run.
>> I point this out as I have seen some talk about making flushing file
>> handles more explicit.
>
> Reverting the last 3 commits of the series fixed a boot crash.
>
> 1b2552cbdbe0 fork: Stop allowing kthreads to call execve
> 753550eb0ce1 fork: Explicitly set PF_KTHREAD
> 68d85f0a33b0 init: Deal with the init process being a user mode process
Hmm. It looks like I missed a little detail.
task_tick_fair
task_tick_numa
task_scan_start
task_scan_min
task_nr_scan_windows
p->mm
If I read this code right task_tick_numa makes the assumption that only
tasks with PF_KTHREAD set don't have an mm.
This should fix the failure. For init we could possibly populate .mm
and not just .active_mm. For user mode helpers cloned from kernel
threads I don't think that is a realistic option. So I think this
is going to be the proper fix.
I believe this only happens when numa rebalancing happens at an
unfortunate moment.
Qian Cai can you test this?
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index d4bd299d67ab..db6f0df9d43e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2915,7 +2915,7 @@ static void task_tick_numa(struct rq *rq, struct task_struct *curr)
/*
* We don't care about NUMA placement if we don't have memory.
*/
- if ((curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
+ if (!curr->mm || (curr->flags & (PF_EXITING | PF_KTHREAD)) || work->next != work)
return;
/*
Eric
Powered by blists - more mailing lists