[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHHuQ2PjmX5HG+E6WMeaaOhSNEhdinCssd75dM0P+3ZG8Q@mail.gmail.com>
Date: Fri, 29 Sep 2023 23:39:15 +0200
From: Mateusz Guzik <mjguzik@...il.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Jann Horn <jannh@...gle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH v2] vfs: shave work on failed file open
On 9/29/23, Mateusz Guzik <mjguzik@...il.com> wrote:
> On 9/29/23, Christian Brauner <brauner@...nel.org> wrote:
>> On Fri, Sep 29, 2023 at 03:31:29PM +0200, Jann Horn wrote:
>>> On Fri, Sep 29, 2023 at 11:20 AM Christian Brauner <brauner@...nel.org>
>>> wrote:
>>> > > But yes, that protection would be broken by SLAB_TYPESAFE_BY_RCU,
>>> > > since then the "f_count is zero" is no longer a final thing.
>>> >
>>> > I've tried coming up with a patch that is simple enough so the pattern
>>> > is easy to follow and then converting all places to rely on a pattern
>>> > that combine lookup_fd_rcu() or similar with get_file_rcu(). The
>>> > obvious
>>> > thing is that we'll force a few places to now always acquire a
>>> > reference
>>> > when they don't really need one right now and that already may cause
>>> > performance issues.
>>>
>>> (Those places are probably used way less often than the hot
>>> open/fget/close paths though.)
>>>
>>> > We also can't fully get rid of plain get_file_rcu() uses itself
>>> > because
>>> > of users such as mm->exe_file. They don't go from one of the rcu
>>> > fdtable
>>> > lookup helpers to the struct file obviously. They rcu replace the file
>>> > pointer in their struct ofc so we could change get_file_rcu() to take
>>> > a
>>> > struct file __rcu **f and then comparing that the passed in pointer
>>> > hasn't changed before we managed to do atomic_long_inc_not_zero().
>>> > Which
>>> > afaict should work for such cases.
>>> >
>>> > But overall we would introduce a fairly big and at the same time
>>> > subtle
>>> > semantic change. The idea is pretty neat and it was fun to do but I'm
>>> > just not convinced we should do it given how ubiquitous struct file is
>>> > used and now to make the semanics even more special by allowing
>>> > refcounts.
>>> >
>>> > I've kept your original release_empty_file() proposal in vfs.misc
>>> > which
>>> > I think is a really nice change.
>>> >
>>> > Let me know if you all passionately disagree. ;)
>>
>> So I'm appending the patch I had played with and a fix from Jann on top.
>> @Linus, if you have an opinion, let me know what you think.
>>
>> Also available here:
>> https://gitlab.com/brauner/linux/-/commits/vfs.file.rcu
>>
>> Might be interesting if this could be perfed to see if there is any real
>> gain for workloads with massive numbers of fds.
>>
>
> I would feel safer with a guaranteed way to tell that the file was
> reallocated.
>
> I think this could track allocs/frees with a sequence counter embedded
> into the object, say odd means deallocated and even means allocated.
>
> Then you would know for a fact whether you raced with the file getting
> whacked and would never have to wonder if you double-checked
> everything you needed (like that f_mode) thing.
>
> This would also mean that consumers which get away with poking around
> the file without getting a ref could still do it, this is at least
> true for tid_fd_mode. All of them would need patching though.
>
> Extending struct file is not ideal by any means, but the good news is that:
> 1. there is a 4 byte hole in there, if one is fine with an int-sized
> counter
> 2. if one insists on 8 bytes, the struct is 232 bytes on my kernel
> (debian). still some room up to 256, so it may be tolerable?
>
So to be clear, obtaining the initial count would require a dedicated
accessor. First you would find the file obj, wait for the count to
reach "allocated" state, validate the source still has the right
pointer, validate the count did not change (with acq fences sprinkled
in there). At the end of it you know that the seq counter you got from
the file was there when the file was still "installed".
Then you can poke around and validate you poked around the right thing
by once more validating the counter.
Maybe I missed something, but the idea in general should work.
--
Mateusz Guzik <mjguzik gmail.com>
Powered by blists - more mailing lists