[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHHg-T+ZOTm0fSpW0Hztfxn=fpfnksz5Q3=3YeCeEPo7LQ@mail.gmail.com>
Date: Tue, 2 Jul 2024 19:58:02 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>, kernel test robot <oliver.sang@...el.com>, oe-lkp@...ts.linux.dev,
lkp@...el.com, Linux Memory Management List <linux-mm@...ck.org>, linux-kernel@...r.kernel.org,
ying.huang@...el.com, feng.tang@...el.com, fengwei.yin@...el.com
Subject: Re: [linux-next:master] [lockref] d042dae6ad: unixbench.throughput
-33.7% regression
On Tue, Jul 2, 2024 at 7:46 PM Mateusz Guzik <mjguzik@...il.com> wrote:
>
> On Tue, Jul 2, 2024 at 7:28 PM Linus Torvalds
> <torvalds@...ux-foundation.org> wrote:
> >
> > On Tue, 2 Jul 2024 at 10:03, Mateusz Guzik <mjguzik@...il.com> wrote:
> > >
> > > I was thinking a different approach.
> > >
> > > A lookup variant which resolves everything and returns the dentry + an
> > > information whether this is rcu mode.
> >
> > That would work equally.
> >
> > But the end result ends up being very similar: you need to hook into
> > that final complete_walk() -> try_to_unlazy() -> legitimize_path() and
> > check a flag whether you actually then do "get_lockref_or_dead()" or
> > not.
> >
>
> Ye, the magic routine to validate if you can pretend the ref was taken
> would wrap it.
>
> > It really *shouldn't* be too bad, but this is just so subtle code that
> > it just takes a lot of care. Even if the patch itself ends up not
> > necessarily being very large.
> >
> > As mentioned, I've looked at it, but it always ended up being _just_
> > scary enough that I never really started doing it.
> >
>
> I implemented something like this as a demo in FreeBSD few years back,
> it did not blow up at least. The work did not get committed though
> because I could not be arsed to productize it.
>
> tbf if anything the only shady things here that I see is that stat et
> al do their work without any locks held nor seqc verification in
> current kernel.
>
> In FreeBSD this was operating directly in vnodes (here one can pretend
> it's inodes). In that system I added sequence counters to the vnode
> itself and any state change like write, setattr, unlink or whatever
> would bump it. Then something like stat could safely read whatever it
> wants in a lockless manner with the final check for maching seqc
> indicating nothing changed.
>
> Not having a "someone is messing with the inode" indicator (only with
> a dentry) in Linux is definitely worrisome when pushing RCU further,
> if that's what you meant.
>
> Again, I'm going to poke around if only for kicks when I find the time
> and we will see what happens.
Suppose the rcu fast path lookup reads the dentry seqc, then does all
the legitimize_mnt and other work. Everything, except modifying the
lockref. The caller is given a mnt to put (per-cpu scalable), dentry
seqc read before any of the path validation and an indication this is
rcu.
Then after whatever is done if the seqc still matches this is the same
as if there was lockref get/put around it.
The only worry is pointers suddenly going NULL or similar as
dentry/inode is looked at. To be worked out on per-syscall basis.
Unless I'm missing something.
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists