lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <dac077f01aab5afe283b3b683176013605e2fe8f.camel@kernel.org>
Date: Tue, 18 Feb 2025 19:01:10 -0500
From: Jeff Layton <jlayton@...nel.org>
To: NeilBrown <neilb@...e.de>
Cc: Chuck Lever <chuck.lever@...cle.com>, Olga Kornievskaia	
 <okorniev@...hat.com>, Dai Ngo <Dai.Ngo@...cle.com>, Tom Talpey
 <tom@...pey.com>,  Li Lingfeng <lilingfeng3@...wei.com>,
 linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] nfsd: don't allow concurrent queueing of workqueue
 jobs

On Wed, 2025-02-19 at 09:26 +1100, NeilBrown wrote:
> On Wed, 19 Feb 2025, Jeff Layton wrote:
> > While looking at the problem that Li Lingfeng reported [1] around
> > callback queueing failures, I noticed that there were potential
> > scenarios where the callback workqueue jobs could run concurrently with
> > an rpc_task. Since they touch some of the same fields, this is incorrect
> > at best and potentially dangerous.
> 
> If the problem is that workqueue jobs might run concurrently with
> rpc_tasks and that we don't want that, could we simply run all the cb
> tasks as "sync" rpc tasks in the workqueue??
> 
> i.e. change rpc_call_async() in nfsd4_run_cb_work() to rpc_call_sync ...
> and fix any breakage because I doubt it is really as simple as that.
> 
> This would fully serialise all the callbacks.  Is that what to goal is
> here, or is the goal more subtle?
> 

I think we need to serialize callbacks that use the same struct
nfsd4_callback. Today we can end up with the same callback running
concurrently with some of them:

The callback workqueue jobs only run until the rpc_task has been
submitted. After that point the workqueue job can run again and submit
a second rpc_task. That can end up with multiple RPCs racing to set and
fetch the cb_status, cb_slot, etc. in the nfsd4_callback.

Note that a few of the callbacks have their own mechanism for
serialization (CB_RECALL_ANY, CB_GETATTR, and the callback channel
probes), but the others don't.

Making all of the RPCs synchronous would mean that all callbacks to the
client would be serialized, which would be overkill. We'd be going back
to a single-slot callback channel.

I think serializing on the nfsd4_callback is right. The part I'm not
yet certain about is whether to just ignore attempts to queue up the
callback while one is running, or if we need to ensure that it runs
again after the current callback has finished if that happens.

The latter seems more correct, but I can't quite come up with a
situation where it matters. Maybe a multiple CB_LAYOUTRECALL callback
attempts can race with a layout stateid morphing?


> 
> 
> > 
> > This patchset adds a new mechanism for ensuring that the same
> > nfsd4_callback can't run concurrently with itself, regardless of where
> > it is in its execution. This also gives us a more sure mechanism for
> > handling the places where we need to take and hold a reference on an
> > object while the callback is running.
> > 
> > This should also fix the problem that Li Lingfeng reported, since
> > queueing the work from nfsd4_cb_release() should never fail. Note that
> > the patch they sent earlier (fdf5c9413ea) should be dropped from
> > nfsd-testing before this will apply cleanly.
> > 
> > [1]: https://lore.kernel.org/linux-nfs/20250218135423.1487309-1-lilingfeng3@huawei.com/
> > 
> > Signed-off-by: Jeff Layton <jlayton@...nel.org>
> > ---
> > Jeff Layton (3):
> >       nfsd: prevent callback tasks running concurrently
> >       nfsd: eliminate cl_ra_cblist and NFSD4_CLIENT_CB_RECALL_ANY
> >       nfsd: move cb_need_restart flag into cb_flags
> > 
> >  fs/nfsd/nfs4callback.c | 12 ++++++------
> >  fs/nfsd/nfs4layouts.c  |  7 ++++---
> >  fs/nfsd/nfs4proc.c     |  2 +-
> >  fs/nfsd/nfs4state.c    | 26 +++++++++++---------------
> >  fs/nfsd/state.h        | 13 ++++++++++---
> >  fs/nfsd/trace.h        |  2 +-
> >  6 files changed, 33 insertions(+), 29 deletions(-)
> > ---
> > base-commit: 4a52e5e49d1b50fcb584e434cced6d0547ddea42
> > change-id: 20250218-nfsd-callback-f723b8498c78
> > 
> > Best regards,
> > -- 
> > Jeff Layton <jlayton@...nel.org>
> > 
> > 
> 

-- 
Jeff Layton <jlayton@...nel.org>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ