[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4931ecd2e702c95a4da0e1f0c4d9e439f941a451.camel@kernel.org>
Date: Tue, 23 Apr 2024 08:42:58 -0400
From: Jeff Layton <jlayton@...nel.org>
To: NeilBrown <neilb@...e.de>
Cc: Lorenzo Bianconi <lorenzo@...nel.org>, linux-nfs@...r.kernel.org,
lorenzo.bianconi@...hat.com, chuck.lever@...cle.com,
netdev@...r.kernel.org, kuba@...nel.org
Subject: Re: [PATCH v8 3/6] NFSD: add write_version to netlink command
On Mon, 2024-04-22 at 13:36 +1000, NeilBrown wrote:
> On Wed, 17 Apr 2024, Jeff Layton wrote:
> > On Wed, 2024-04-17 at 10:43 +1000, NeilBrown wrote:
> > > On Wed, 17 Apr 2024, Jeff Layton wrote:
> > > > On Wed, 2024-04-17 at 09:05 +1000, NeilBrown wrote:
> > > > > On Wed, 17 Apr 2024, Jeff Layton wrote:
> > > > > > On Wed, 2024-04-17 at 07:48 +1000, NeilBrown wrote:
> > > > > > > On Tue, 16 Apr 2024, Jeff Layton wrote:
> > > > > > > > On Tue, 2024-04-16 at 13:16 +1000, NeilBrown wrote:
> > > > > > > > > On Tue, 16 Apr 2024, Lorenzo Bianconi wrote:
> > > > > > > > > > Introduce write_version netlink command through a "declarative" interface.
> > > > > > > > > > This patch introduces a change in behavior since for version-set userspace
> > > > > > > > > > is expected to provide a NFS major/minor version list it wants to enable
> > > > > > > > > > while all the other ones will be disabled. (procfs write_version
> > > > > > > > > > command implements imperative interface where the admin writes +3/-3 to
> > > > > > > > > > enable/disable a single version.
> > > > > > > > >
> > > > > > > > > It seems a little weird to me that the interface always disables all
> > > > > > > > > version, but then also allows individual versions to be disabled.
> > > > > > > > >
> > > > > > > > > Would it be reasonable to simply ignore the "enabled" flag when setting
> > > > > > > > > version, and just enable all versions listed??
> > > > > > > > >
> > > > > > > > > Or maybe only enable those with the flag, and don't disable those
> > > > > > > > > without the flag?
> > > > > > > > >
> > > > > > > > > Those don't necessarily seem much better - but the current behaviour
> > > > > > > > > still seems odd.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I think it makes sense.
> > > > > > > >
> > > > > > > > We disable all versions, and enable any that have the "enabled" flag set
> > > > > > > > in the call from userland. Userland technically needn't send down the
> > > > > > > > versions that are disabled in the call, but the current userland program
> > > > > > > > does.
> > > > > > > >
> > > > > > > > I worry about imperative interfaces that might only say -- "enable v4.1,
> > > > > > > > but disable v3" and leave the others in their current state. That
> > > > > > > > requires that both the kernel and userland keep state about what
> > > > > > > > versions are currently enabled and disabled, and it's possible to get
> > > > > > > > that wrong.
> > > > > > >
> > > > > > > I understand and support your aversion for imperative interfaces.
> > > > > > > But this interface, as currently implemented, looks somewhat imperative.
> > > > > > > The message sent to the kernel could enable some interfaces and then
> > > > > > > disable them. I know that isn't the intent, but it is what the code
> > > > > > > supports. Hence "weird".
> > > > > > >
> > > > > > > We could add code to make that sort of thing impossible, but there isn't
> > > > > > > much point. Better to make it syntactically impossible.
> > > > > > >
> > > > > > > Realistically there will never be NFSv4.3 as there is no need - new
> > > > > > > features can be added incrementally.
> > > > > > >
> > > > > >
> > > > > > There is no need _so_far_. Who knows what the future will bring? Maybe
> > > > > > we'll need v4.3 in order to add some needed feature?
> > > > > >
> > > > > > > So we could just pass an array of
> > > > > > > 5 active flags: 2,3,4.0,4.1,4.2. I suspect you wouldn't like that and
> > > > > > > I'm not sure that I do either. A "read" would return the same array
> > > > > > > with 3 possible states: unavailable, disabled, enabled. (Maybe the
> > > > > > > array could be variable length so 5.0 could be added one day...).
> > > > > > >
> > > > > >
> > > > > > A set of flags is basically what this interface is. They just happen to
> > > > > > also be labeled with the major and minorversion. I think that's a good
> > > > > > thing.
> > > > >
> > > > > Good for whom? Labelling allows for labelling inconsistencies.
> > > > >
> > > >
> > > > Now you're just being silly. You wanted a variable length array, but
> > > > what if the next slot is not v4.3 but 5.0? A positional interpretation
> > > > of a slot in an array is just as just as subject to problems.
> > >
> > > I don't think it could look like a imperative interface, which the
> > > current one does a bit.
> > >
> > > >
> > > > > Maybe the kernel should be able to provide an ordered list of labels,
> > > > > and separately an array of which labels are enabled/disabled.
> > > > > Userspace could send down a new set of enabled/disabled flags based on
> > > > > the agreed set of labels.
> > > > >
> > > >
> > > > How is this better than what's been proposed? One strength of netlink is
> > > > that the data is structured. The already proposed interface takes
> > > > advantage of that.
> > > >
> > > > > Here is a question that is a bit of a diversion, but might help us
> > > > > understand the context more fully:
> > > > >
> > > > > Why would anyone disable v4.2 separately from v4.1 ??
> > > > >
> > > >
> > > > Furthermore, what does it mean to disable v4.1 but leave v4.2 enabled?
> > >
> > > Indeed!
> > >
> > > >
> > > > > I understand that v2, v3, v4.0, v4.1 are effectively different protocols
> > > > > and you might want to ensure that only the approved/tested protocol is
> > > > > used. But v4.2 is just a few enhancements on v4.1. Why would you want
> > > > > to disable it?
> > > > >
> > > > > The answer I can think of that there might be bugs (perish the
> > > > > thought!!) in some of those features so you might want to avoid using
> > > > > them.
> > > > > But in that case, it is really the features that you want to suppress,
> > > > > not the protocol version.
> > > > >
> > > > > Maybe I might want to disable delegation - to just write delegation.
> > > > > Can I do that? What if I just want to disable server-side copy, but
> > > > > keep fallocate and umask support?
> > > > >
> > > > > i.e. is a list of versions really the granularity that we want to use
> > > > > for this interface?
> > > > >
> > > >
> > > > Our current goal is to replace rpc.nfsd with a new program that works
> > > > via netlink. An important bit of what rpc.nfsd does is start the NFS
> > > > server with the settings in /etc/nfs.conf. Some of those settings are
> > > > vers3=, vers4.0=, etc. that govern how /proc/fs/nfsd/versions is set.
> > > > We have an immediate need to deal with those settings today, and
> > > > probably will for quite some time.
> > > >
> > > > I'm not opposed to augmenting that with something more granular, but I
> > > > don't think we should block this interface and wait on that. We can
> > > > extend the interface at some point in the future to take a new feature
> > > > bitmask or something, and just declare that (e.g.) declaring vers4.2=n
> > > > disables some subset of those features.
> > >
> > > I agree that we don't want to block "good" while we wait for "perfect". I
> > > just want to be sure that what we have is "good".
> > >
> > > The current /proc interface uses strings like "v3" and "v4.1".
> > > The proposed netlink interface uses pairs on u32s - "major" and "minor".
> > > So we lose some easy paths for extensibility. Are we comfortable with
> > > that?
> > >
> > > This isn't a big deal - certainly not a blocked. I just don't feel
> > > entirely comfortable about the current interface and I'm exploring to
> > > see if there might be something better.
> > >
> > >
> >
> > Ok, I'm not convinced that anything you've proposed so far is better
> > than what we have, but I'm open-minded.
>
> Thanks. I admit that I don't think anything I've come up with is better
> either.
>
> >
> > At this point we have these to-dos:
> >
> > 1) make the threads interface accept an array of integers rather than a
> > singleton. This is needed when sunrpc.pool_mode is not global.
> >
>
> While this I think probably still makes sense, I'm feeling less
> enthusiastic about it and probably wouldn't complain if it didn't
> happen.
>
Ok. I think making it an array is fine. For now, it will just error out
when you send down more than one value, but I think we can make it work
the way you'd expect. Regardlness, cleaning up and unifying the pooled
vs. non-pooled server handling seems like a good thing to do. The
existing one is a long-standing kludge.
> I don't like that fact that we need to configure a number of threads. I
> would much rather it grow/shrink dynamically. I've got more patches
> that are heading in this direction but they aren't ready yet.
>
> If we do land these and they prove to be effective, then configuring
> per-pool threads would become extremely uninteresting. The demand on
> each pool would adjust each pool separately.
>
> So if use an array here turns out to be problematic for some reason, I
> won't complain.
>
+1 on making the thread pool sizing dynamic. I had started looking at
what heuristics we should use, but I haven't gotten very far yet.
Some general thoughts:
nfsd threads spend most of their time waiting. They are only very rarely
cpu-bound, so the max size of the thread pools should mostly scale with
the amount of memory in the host.
I have a patch that adds a new percpu counter to count how often we go
to wake a thread and don't find one, and have to queue it. It clearly
hits a lot more when there are fewer running nfsd threads.
I don't think that's sufficient to know when to spawn a new thread
though. For that, we probably ought to be looking at how long RPCs are
sitting and waiting for a thread to pick them up. When that starts
growing then we probably need to bring in more threads.
Conversely, when a thread is sitting idle for a long time (10s? 60s?),
we can probably spin it down. Maybe we can have a new LRU for the
threads and have a periodic thread reaper job that checks for those.
> > 2) have the interface pass down the scope string in the THREADS_SET
> > instead of making userland change the UTS namespace before starting
> > threads. This should just mean adding a new optional string attribute to
> > the threads interfaces.
> >
> > ...anything else we're missing that needs doing here? We'd really like
> > to see this in -next soon, so we can possibly make v6.10 with this.
>
> I haven't yet found any obvious gaps. I agree that we can and should
> move forward promptly.
>
Great. I think Lorenzo is planning to send another set soon that should
hopefully be reasonable for merge.
>
> >
> > Thanks,
> > --
> > Jeff Layton <jlayton@...nel.org>
> >
>
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists