[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <968148ad-787e-4ccb-9d84-f32b5da88517@fastmail.fm>
Date: Mon, 9 Oct 2023 14:52:42 +0200
From: Bernd Schubert <bernd.schubert@...tmail.fm>
To: Krister Johansen <kjlx@...pleofstupid.com>
Cc: Miklos Szeredi <miklos@...redi.hu>, linux-fsdevel@...r.kernel.org,
Miklos Szeredi <mszeredi@...hat.com>,
linux-kernel@...r.kernel.org,
German Maglione <gmaglione@...hat.com>,
Greg Kurz <groug@...d.org>, Max Reitz <mreitz@...hat.com>
Subject: Re: [resend PATCH v2 2/2] fuse: ensure that submounts lookup their
parent
On 10/7/23 02:41, Krister Johansen wrote:
> On Fri, Oct 06, 2023 at 07:13:06PM +0200, Bernd Schubert wrote:
>>
>>
>> On 10/2/23 17:24, Krister Johansen wrote:
>>> The submount code uses the parent nodeid passed into the function in
>>> order to create the root dentry for the new submount. This nodeid does
>>> not get its remote reference count incremented by a lookup option.
>>>
>>> If the parent inode is evicted from its superblock, due to memory
>>> pressure for example, it can result in a forget opertation being sent to
>>> the server. Should this nodeid be forgotten while it is still in use in
>>> a submount, users of the submount get an error from the server on any
>>> subsequent access. In the author's case, this was an EBADF on all
>>> subsequent operations that needed to reference the root.
>>>
>>> Debugging the problem revealed that the dentry shrinker triggered a forget
>>> after killing the dentry with the last reference, despite the root
>>> dentry in another superblock still using the nodeid.
>>>
>>> As a result, a container that was also using this submount failed to
>>> access its filesystem because it had borrowed the reference instead of
>>> taking its own when setting up its superblock for the submount.
>>>
>>> This commit fixes the problem by having the new submount trigger a
>>> lookup for the parent as part of creating a new root dentry for the
>>> virtiofsd submount superblock. This allows each superblock to have its
>>> inodes removed by the shrinker when unreferenced, while keeping the
>>> nodeid reference count accurate and active with the server.
>>>
>>> Signed-off-by: Krister Johansen <kjlx@...pleofstupid.com>
>>> ---
>>> fs/fuse/dir.c | 10 +++++-----
>>> fs/fuse/fuse_i.h | 6 ++++++
>>> fs/fuse/inode.c | 43 +++++++++++++++++++++++++++++++++++++------
>>> 3 files changed, 48 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>> index 5e01946d7531..333730c74619 100644
>>> --- a/fs/fuse/dir.c
>>> +++ b/fs/fuse/dir.c
>>> @@ -183,11 +183,11 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>>> args->out_args[0].value = outarg;
>>> }
>>> -static int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
>>> - struct dentry *entry,
>>> - struct inode *inode,
>>> - struct fuse_entry_out *outarg,
>>> - bool *lookedup)
>>> +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
>>> + struct dentry *entry,
>>> + struct inode *inode,
>>> + struct fuse_entry_out *outarg,
>>> + bool *lookedup)
>>> {
>>> struct dentry *parent;
>>> struct fuse_forget_link *forget;
>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>>> index 405252bb51f2..a66fcf50a4cc 100644
>>> --- a/fs/fuse/fuse_i.h
>>> +++ b/fs/fuse/fuse_i.h
>>> @@ -1325,6 +1325,12 @@ void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>>> bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>> void fuse_dax_cancel_work(struct fuse_conn *fc);
>>> +/* dir.c */
>>> +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, struct dentry *entry,
>>> + struct inode *inode,
>>> + struct fuse_entry_out *outarg,
>>> + bool *lookedup);
>>> +
>>> /* ioctl.c */
>>> long fuse_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
>>> long fuse_file_compat_ioctl(struct file *file, unsigned int cmd,
>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>>> index 444418e240c8..79a31cb55512 100644
>>> --- a/fs/fuse/inode.c
>>> +++ b/fs/fuse/inode.c
>>> @@ -1464,7 +1464,13 @@ static int fuse_fill_super_submount(struct super_block *sb,
>>> struct fuse_mount *fm = get_fuse_mount_super(sb);
>>> struct super_block *parent_sb = parent_fi->inode.i_sb;
>>> struct fuse_attr root_attr;
>>> + struct fuse_inode *fi;
>>> struct inode *root;
>>> + struct inode *parent;
>>> + struct dentry *pdent;
>>> + struct fuse_entry_out outarg;
>>> + bool lookedup = false;
>>> + int ret;
>>> fuse_sb_defaults(sb);
>>> fm->sb = sb;
>>> @@ -1480,14 +1486,39 @@ static int fuse_fill_super_submount(struct super_block *sb,
>>> if (parent_sb->s_subtype && !sb->s_subtype)
>>> return -ENOMEM;
>>> - fuse_fill_attr_from_inode(&root_attr, parent_fi);
>>> - root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
>>> /*
>>> - * This inode is just a duplicate, so it is not looked up and
>>> - * its nlookup should not be incremented. fuse_iget() does
>>> - * that, though, so undo it here.
>>> + * It is necessary to lookup the parent_if->nodeid in case the dentry
>>> + * that triggered the automount of the submount is later evicted.
>>> + * If this dentry is evicted without the lookup count getting increased
>>> + * on the submount root, then the server can subsequently forget this
>>> + * nodeid which leads to errors when trying to access the root of the
>>> + * submount.
>>> */
>>> - get_fuse_inode(root)->nlookup--;
>>> + parent = &parent_fi->inode;
>>> + pdent = d_find_alias(parent);
>>> + if (!pdent)
>>> + return -EINVAL;
>>> +
>>> + ret = fuse_dentry_revalidate_lookup(fm, pdent, parent, &outarg,
>>> + &lookedup);
>>> + dput(pdent);
>>> + /*
>>> + * The new root owns this nlookup on success, and it is incremented by
>>> + * fuse_iget(). In the case the lookup succeeded but revalidate fails,
>>> + * ensure that the lookup count is tracked by the parent.
>>> + */
>>> + if (ret <= 0) {
>>> + if (lookedup) {
>>> + fi = get_fuse_inode(parent);
>>> + spin_lock(&fi->lock);
>>> + fi->nlookup++;
>>> + spin_unlock(&fi->lock);
>>> + }
>>
>> I might be wrong, but doesn't that mean that
>> "get_fuse_inode(root)->nlookup--" needs to be called?
>
> In the case where ret > 0, the nlookup on get_fuse_inode(root) is set to
> 1 by fuse_iget(). That ensures that the root is forgotten when later
> unmounted. The code that handles the forget uses the count of nlookup
> to tell the server-side how many references to forget. (That's in
> fuse_evict_inode()).
>
> However, if the fuse_dentry_revalidate_lookup() call performs a valid
> lookup but returns an error, this function will return before it fills
> out s_root in the superblock or calls fuse_iget(). If the superblock
> doesn't have a s_root set, then the code in generic_kill_super() won't
> dput() the root dentry and trigger the forget.
>
> The intention of this code was to handle the case where the lookup had
> succeeded, but the code determined it was still necessary to return an
> error. In that situation, the reference taken by the lookup has to be
> accounted somewhere, and the parent seemed like a plausible candidate.
Yeah sorry, I had just missed that fuse_iget() also moved and then
thought it would have increased fi->nlookup already.
>
> However, after writing up this response, I can see that there's still a
> problem here if d_make_root(root) returns NULL, because we'll also lose
> track of the nlookup in that case.
>
> If you agree that charging this to the parent on error makes sense, I'll
> re-work the error handling here so that the right thing happens when
> either fuse_dentry_revalidate_lookup() or d_make_root() encounter an
> error.
Oh yeah, I also missed that. Although, iput() calls iput_final, which
then calls evict and sends the fuse forget - isn't that the right action
already?
>
> Thanks for the feedback.
Well, false alarm from my side, sorry again!
Bernd
Powered by blists - more mailing lists