[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <170172483155.7109.15983228851050210918@noble.neil.brown.name>
Date: Tue, 05 Dec 2023 08:20:31 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Al Viro" <viro@...iv.linux.org.uk>
Cc: "Christian Brauner" <brauner@...nel.org>,
"Jens Axboe" <axboe@...nel.dk>, "Oleg Nesterov" <oleg@...hat.com>,
"Chuck Lever" <chuck.lever@...cle.com>,
"Jeff Layton" <jlayton@...nel.org>,
"Ingo Molnar" <mingo@...hat.com>,
"Peter Zijlstra" <peterz@...radead.org>,
"Juri Lelli" <juri.lelli@...hat.com>,
"Vincent Guittot" <vincent.guittot@...aro.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nfs@...r.kernel.org
Subject: Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work_run()
On Mon, 04 Dec 2023, Al Viro wrote:
> On Mon, Dec 04, 2023 at 12:36:41PM +1100, NeilBrown wrote:
>
> > This means that any cost for doing the work is not imposed on the kernel
> > thread, and importantly excessive amounts of work cannot apply
> > back-pressure to reduce the amount of new work queued.
>
> It also means that a stuck ->release() won't end up with stuck
> kernel thread...
Is a stuck kernel thread any worse than a stuck user-space thread?
>
> > earlier than would be ideal. When __dput (from the workqueue) calls
>
> WTF is that __dput thing? __fput, perhaps?
Either __fput or dput :-)
->release isn't the problem that I am seeing.
The call trace that I see causing problems is
__fput -> dput -> dentry_kill -> destroy_inode -> xfs_fs_destroy_inode
so both __fput and dput are there, but most of the code is dput related.
So both "put"s were swimming in by brain and the wrong combination came
out.
I changed it to __fput - thanks.
>
> > This patch adds a new process flag PF_RUNS_TASK_WORK which is now used
> > instead of PF_KTHREAD to determine whether it is sensible to queue
> > something to task_works. This flag is always set for non-kernel threads.
>
> *ugh*
>
> What's that flag for? task_work_add() always can fail; any caller must
> have a fallback to cope with that possibility; fput() certainly does.
As Oleg pointed out, all threads including kernel threads call
task_work_run() at exit, and some kernel threads depend on this. So
disabling task_work_add() early for all kernel threads would break
things.
Currently task_work_add() fails only once the process has started
exiting. Only code that might run during the exit handling need check.
>
> Just have the kernel threads born with ->task_works set to &work_exited
> and provide a primitive that would flip it from that to NULL.
>
> > @@ -1328,7 +1328,7 @@ static void mntput_no_expire(struct mount *mnt)
> >
> > if (likely(!(mnt->mnt.mnt_flags & MNT_INTERNAL))) {
> > struct task_struct *task = current;
> > - if (likely(!(task->flags & PF_KTHREAD))) {
> > + if (likely((task->flags & PF_RUNS_TASK_WORK))) {
> > init_task_work(&mnt->mnt_rcu, __cleanup_mnt);
> > if (!task_work_add(task, &mnt->mnt_rcu, TWA_RESUME))
> > return;
>
> Now, *that* is something I have much stronger objections to.
> Stuck filesystem shutdown is far more likely than stuck
> ->release(). You are seriously asking for trouble here.
>
> Why would you want to have nfsd block on that?
>
I don't *want* nfsd block on that, but I don't care if it does. nfsd
will only call task_work_run() at a safe time. This is no different to
user-space processes only calling task_work_run() at a safe time.
The new flag isn't "I_AM_NFSD" or "QUEUE_FPUT_WORK_TO_TASK". It is
"RUNS_TASK_WORK". So any code that would prefer to call task_work_add()
but has a fall-back for tasks that don't call run_task_work() should
test the new flag. Doing otherwise would be inconsistent and
potentially confusing.
I don't think that nfsd getting stuck would be any worse than systemd
getting stuck, or automount getting stuck, or udiskd getting stuck.
Thanks,
NeilBrown
Powered by blists - more mailing lists