[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZWVEcasahyVQ4QqV@tissot.1015granger.net>
Date: Mon, 27 Nov 2023 20:37:53 -0500
From: Chuck Lever <chuck.lever@...cle.com>
To: NeilBrown <neilb@...e.de>
Cc: Al Viro <viro@...iv.linux.org.uk>,
Christian Brauner <brauner@...nel.org>,
Jens Axboe <axboe@...nel.dk>, Oleg Nesterov <oleg@...hat.com>,
Jeff Layton <jlayton@...nel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-nfs@...r.kernel.org
Subject: Re: [PATCH/RFC] core/nfsd: allow kernel threads to use task_work.
On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote:
> On Tue, 28 Nov 2023, Chuck Lever wrote:
> > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote:
> > >
> > > I have evidence from a customer site of 256 nfsd threads adding files to
> > > delayed_fput_lists nearly twice as fast they are retired by a single
> > > work-queue thread running delayed_fput(). As you might imagine this
> > > does not end well (20 million files in the queue at the time a snapshot
> > > was taken for analysis).
> > >
> > > While this might point to a problem with the filesystem not handling the
> > > final close efficiently, such problems should only hurt throughput, not
> > > lead to memory exhaustion.
> >
> > I have this patch queued for v6.8:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0
> >
>
> Thanks....
> I think that change is good, but I don't think it addresses the problem
> mentioned in the description, and it is not directly relevant to the
> problem I saw ... though it is complicated.
>
> The problem "workqueue ... hogged cpu..." probably means that
> nfsd_file_dispose_list() needs a cond_resched() call in the loop.
> That will stop it from hogging the CPU whether it is tied to one CPU or
> free to roam.
>
> Also that work is calling filp_close() which primarily calls
> filp_flush().
> It also calls fput() but that does minimal work. If there is much work
> to do then that is offloaded to another work-item. *That* is the
> workitem that I had problems with.
>
> The problem I saw was with an older kernel which didn't have the nfsd
> file cache and so probably is calling filp_close more often.
Without the file cache, the filp_close() should be handled directly
by the nfsd thread handling the RPC, IIRC.
> So maybe
> my patch isn't so important now. Particularly as nfsd now isn't closing
> most files in-task but instead offloads that to another task. So the
> final fput will not be handled by the nfsd task either.
>
> But I think there is room for improvement. Gathering lots of files
> together into a list and closing them sequentially is not going to be as
> efficient as closing them in parallel.
I believe the file cache passes the filps to the work queue one at
a time, but I don't think there's anything that forces the work
queue to handle each flush/close completely before proceeding to the
next.
IOW there is some parallelism there already, especially now that
nfsd_filecache_wq is UNBOUND.
> > > For normal threads, the thread that closes the file also calls the
> > > final fput so there is natural rate limiting preventing excessive growth
> > > in the list of delayed fputs. For kernel threads, and particularly for
> > > nfsd, delayed in the final fput do not impose any throttling to prevent
> > > the thread from closing more files.
> >
> > I don't think we want to block nfsd threads waiting for files to
> > close. Won't that be a potential denial of service?
>
> Not as much as the denial of service caused by memory exhaustion due to
> an indefinitely growing list of files waiting to be closed by a single
> thread of workqueue.
The cache garbage collector is single-threaded, but nfsd_filecache_wq
has a max_active setting of zero.
> I think it is perfectly reasonable that when handling an NFSv4 CLOSE,
> the nfsd thread should completely handle that request including all the
> flush and ->release etc. If that causes any denial of service, then
> simple increase the number of nfsd threads.
>
> For NFSv3 it is more complex. On the kernel where I saw a problem the
> filp_close happen after each READ or WRITE (though I think the customer
> was using NFSv4...). With the file cache there is no thread that is
> obviously responsible for the close.
> To get the sort of throttling that I think is need, we could possibly
> have each "nfsd_open" check if there are pending closes, and to wait for
> some small amount of progress.
Well nfsd_open() in particular appears to be used only for readdir.
But maybe nfsd_file_acquire() could wait briefly, in the garbage-
collected case, if the nfsd_net's disposal queue is long.
> But don't think it is reasonable for the nfsd threads to take none of
> the burden of closing files as that can result in imbalance.
>
> I'll need to give this more thought.
--
Chuck Lever
Powered by blists - more mailing lists