[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <15f028ac3191b54ed26cf5c90cacec9c5b43ad69.camel@themaw.net>
Date: Tue, 23 Jun 2020 16:29:29 +0800
From: Ian Kent <raven@...maw.net>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: Tejun Heo <tj@...nel.org>,
Rick Lindsley <ricklind@...ux.vnet.ibm.com>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>,
David Howells <dhowells@...hat.com>,
Miklos Szeredi <miklos@...redi.hu>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 0/6] kernfs: proposed locking and concurrency
improvement
On Tue, 2020-06-23 at 16:01 +0800, Ian Kent wrote:
> On Tue, 2020-06-23 at 08:02 +0200, Greg Kroah-Hartman wrote:
> > On Tue, Jun 23, 2020 at 01:09:08PM +0800, Ian Kent wrote:
> > > On Mon, 2020-06-22 at 20:03 +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Jun 22, 2020 at 01:48:45PM -0400, Tejun Heo wrote:
> > > > > Hello, Ian.
> > > > >
> > > > > On Sun, Jun 21, 2020 at 12:55:33PM +0800, Ian Kent wrote:
> > > > > > > > They are used for hotplugging and partitioning memory.
> > > > > > > > The
> > > > > > > > size of
> > > > > > > > the
> > > > > > > > segments (and thus the number of them) is dictated by
> > > > > > > > the
> > > > > > > > underlying
> > > > > > > > hardware.
> > > > > > >
> > > > > > > This sounds so bad. There gotta be a better interface for
> > > > > > > that,
> > > > > > > right?
> > > > > >
> > > > > > I'm still struggling a bit to grasp what your getting at
> > > > > > but
> > > > > > ...
> > > > >
> > > > > I was more trying to say that the sysfs device interface with
> > > > > per-
> > > > > object
> > > > > directory isn't the right interface for this sort of usage at
> > > > > all.
> > > > > Are these
> > > > > even real hardware pieces which can be plugged in and out?
> > > > > While
> > > > > being a
> > > > > discrete piece of hardware isn't a requirement to be a device
> > > > > model
> > > > > device,
> > > > > the whole thing is designed with such use cases on mind. It
> > > > > definitely isn't
> > > > > the right design for representing six digit number of logical
> > > > > entities.
> > > > >
> > > > > It should be obvious that representing each consecutive
> > > > > memory
> > > > > range with a
> > > > > separate directory entry is far from an optimal way of
> > > > > representing
> > > > > something like this. It's outright silly.
> > > >
> > > > I agree. And again, Ian, you are just "kicking the problem
> > > > down
> > > > the
> > > > road" if we accept these patches. Please fix this up properly
> > > > so
> > > > that
> > > > this interface is correctly fixed to not do looney things like
> > > > this.
> > >
> > > Fine, mitigating this problem isn't the end of the story, and you
> > > don't want to do accept a change to mitigate it because that
> > > could
> > > mean no further discussion on it and no further work toward
> > > solving
> > > it.
> > >
> > > But it seems to me a "proper" solution to this will cross a
> > > number
> > > of areas so this isn't just "my" problem and, as you point out,
> > > it's
> > > likely to become increasingly problematic over time.
> > >
> > > So what are your ideas and recommendations on how to handle
> > > hotplug
> > > memory at this granularity for this much RAM (and larger
> > > amounts)?
> >
> > First off, this is not my platform, and not my problem, so it's
> > funny
> > you ask me :)
>
> Sorry, but I don't think it's funny at all.
>
> It's not "my platform" either, I'm just the poor old sole that
> took this on because, on the face of it, it's a file system
> problem as claimed by others that looked at it and promptly
> washed their hands of it.
>
> I don't see how asking for your advice is out of order at all.
>
> > Anyway, as I have said before, my first guesses would be:
> > - increase the granularity size of the "memory chunks",
> > reducing
> > the number of devices you create.
>
> Yes, I didn't get that from your initial comments but you've said
> it a couple of times recently and I do get it now.
>
> I'll try and find someone appropriate to consult about that and
> see where it goes.
>
> > - delay creating the devices until way after booting, or do it
> > on a totally different path/thread/workqueue/whatever to
> > prevent delay at booting
>
> When you first said this it sounded like a ugly workaround to me.
> But perhaps it isn't (I'm not really convinced it is TBH), so it's
> probably worth trying to follow up on too.
>
> > And then there's always:
> > - don't create them at all, only only do so if userspace asks
> > you to.
>
> At first glance the impression I get from this is that it's an even
> uglier work around than delaying it but it might actually the most
> sensible way to handle this, as it's been called, silliness.
>
> We do have the inode flag S_AUTOMOUNT that will cause the dcache flag
> DCACHE_NEED_AUTOMOUNT to be set on the dentry and that will cause
> the dentry op ->d_automount() to be called on access so, from a path
> walk perspective, the dentries could just appear when needed.
>
> The question I'd need to answer is do the kernfs nodes exist so
> ->d_automount() can discover if the node lookup is valid, and I think
> the answer might be yes (but we would need to suppress udev
> notifications for S_AUTOMOUNT nodes).
Or maybe taking the notion of on-demand dentry creation further
is "nothing" more than not triggering udev notifications when
nodes are created letting the VFS create dentries on access alone
is all that would be needed.
I'm really not sure about how this would work yet ...
Ian
Powered by blists - more mailing lists