lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <968148ad-787e-4ccb-9d84-f32b5da88517@fastmail.fm>
Date:   Mon, 9 Oct 2023 14:52:42 +0200
From:   Bernd Schubert <bernd.schubert@...tmail.fm>
To:     Krister Johansen <kjlx@...pleofstupid.com>
Cc:     Miklos Szeredi <miklos@...redi.hu>, linux-fsdevel@...r.kernel.org,
        Miklos Szeredi <mszeredi@...hat.com>,
        linux-kernel@...r.kernel.org,
        German Maglione <gmaglione@...hat.com>,
        Greg Kurz <groug@...d.org>, Max Reitz <mreitz@...hat.com>
Subject: Re: [resend PATCH v2 2/2] fuse: ensure that submounts lookup their
 parent



On 10/7/23 02:41, Krister Johansen wrote:
> On Fri, Oct 06, 2023 at 07:13:06PM +0200, Bernd Schubert wrote:
>>
>>
>> On 10/2/23 17:24, Krister Johansen wrote:
>>> The submount code uses the parent nodeid passed into the function in
>>> order to create the root dentry for the new submount.  This nodeid does
>>> not get its remote reference count incremented by a lookup option.
>>>
>>> If the parent inode is evicted from its superblock, due to memory
>>> pressure for example, it can result in a forget opertation being sent to
>>> the server.  Should this nodeid be forgotten while it is still in use in
>>> a submount, users of the submount get an error from the server on any
>>> subsequent access.  In the author's case, this was an EBADF on all
>>> subsequent operations that needed to reference the root.
>>>
>>> Debugging the problem revealed that the dentry shrinker triggered a forget
>>> after killing the dentry with the last reference, despite the root
>>> dentry in another superblock still using the nodeid.
>>>
>>> As a result, a container that was also using this submount failed to
>>> access its filesystem because it had borrowed the reference instead of
>>> taking its own when setting up its superblock for the submount.
>>>
>>> This commit fixes the problem by having the new submount trigger a
>>> lookup for the parent as part of creating a new root dentry for the
>>> virtiofsd submount superblock.  This allows each superblock to have its
>>> inodes removed by the shrinker when unreferenced, while keeping the
>>> nodeid reference count accurate and active with the server.
>>>
>>> Signed-off-by: Krister Johansen <kjlx@...pleofstupid.com>
>>> ---
>>>    fs/fuse/dir.c    | 10 +++++-----
>>>    fs/fuse/fuse_i.h |  6 ++++++
>>>    fs/fuse/inode.c  | 43 +++++++++++++++++++++++++++++++++++++------
>>>    3 files changed, 48 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
>>> index 5e01946d7531..333730c74619 100644
>>> --- a/fs/fuse/dir.c
>>> +++ b/fs/fuse/dir.c
>>> @@ -183,11 +183,11 @@ static void fuse_lookup_init(struct fuse_conn *fc, struct fuse_args *args,
>>>    	args->out_args[0].value = outarg;
>>>    }
>>> -static int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
>>> -					 struct dentry *entry,
>>> -					 struct inode *inode,
>>> -					 struct fuse_entry_out *outarg,
>>> -					 bool *lookedup)
>>> +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm,
>>> +				  struct dentry *entry,
>>> +				  struct inode *inode,
>>> +				  struct fuse_entry_out *outarg,
>>> +				  bool *lookedup)
>>>    {
>>>    	struct dentry *parent;
>>>    	struct fuse_forget_link *forget;
>>> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
>>> index 405252bb51f2..a66fcf50a4cc 100644
>>> --- a/fs/fuse/fuse_i.h
>>> +++ b/fs/fuse/fuse_i.h
>>> @@ -1325,6 +1325,12 @@ void fuse_dax_dontcache(struct inode *inode, unsigned int flags);
>>>    bool fuse_dax_check_alignment(struct fuse_conn *fc, unsigned int map_alignment);
>>>    void fuse_dax_cancel_work(struct fuse_conn *fc);
>>> +/* dir.c */
>>> +int fuse_dentry_revalidate_lookup(struct fuse_mount *fm, struct dentry *entry,
>>> +				  struct inode *inode,
>>> +				  struct fuse_entry_out *outarg,
>>> +				  bool *lookedup);
>>> +
>>>    /* ioctl.c */
>>>    long fuse_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
>>>    long fuse_file_compat_ioctl(struct file *file, unsigned int cmd,
>>> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
>>> index 444418e240c8..79a31cb55512 100644
>>> --- a/fs/fuse/inode.c
>>> +++ b/fs/fuse/inode.c
>>> @@ -1464,7 +1464,13 @@ static int fuse_fill_super_submount(struct super_block *sb,
>>>    	struct fuse_mount *fm = get_fuse_mount_super(sb);
>>>    	struct super_block *parent_sb = parent_fi->inode.i_sb;
>>>    	struct fuse_attr root_attr;
>>> +	struct fuse_inode *fi;
>>>    	struct inode *root;
>>> +	struct inode *parent;
>>> +	struct dentry *pdent;
>>> +	struct fuse_entry_out outarg;
>>> +	bool lookedup = false;
>>> +	int ret;
>>>    	fuse_sb_defaults(sb);
>>>    	fm->sb = sb;
>>> @@ -1480,14 +1486,39 @@ static int fuse_fill_super_submount(struct super_block *sb,
>>>    	if (parent_sb->s_subtype && !sb->s_subtype)
>>>    		return -ENOMEM;
>>> -	fuse_fill_attr_from_inode(&root_attr, parent_fi);
>>> -	root = fuse_iget(sb, parent_fi->nodeid, 0, &root_attr, 0, 0);
>>>    	/*
>>> -	 * This inode is just a duplicate, so it is not looked up and
>>> -	 * its nlookup should not be incremented.  fuse_iget() does
>>> -	 * that, though, so undo it here.
>>> +	 * It is necessary to lookup the parent_if->nodeid in case the dentry
>>> +	 * that triggered the automount of the submount is later evicted.
>>> +	 * If this dentry is evicted without the lookup count getting increased
>>> +	 * on the submount root, then the server can subsequently forget this
>>> +	 * nodeid which leads to errors when trying to access the root of the
>>> +	 * submount.
>>>    	 */
>>> -	get_fuse_inode(root)->nlookup--;
>>> +	parent = &parent_fi->inode;
>>> +	pdent = d_find_alias(parent);
>>> +	if (!pdent)
>>> +		return -EINVAL;
>>> +
>>> +	ret = fuse_dentry_revalidate_lookup(fm, pdent, parent, &outarg,
>>> +	    &lookedup);
>>> +	dput(pdent);
>>> +	/*
>>> +	 * The new root owns this nlookup on success, and it is incremented by
>>> +	 * fuse_iget().  In the case the lookup succeeded but revalidate fails,
>>> +	 * ensure that the lookup count is tracked by the parent.
>>> +	 */
>>> +	if (ret <= 0) {
>>> +		if (lookedup) {
>>> +			fi = get_fuse_inode(parent);
>>> +			spin_lock(&fi->lock);
>>> +			fi->nlookup++;
>>> +			spin_unlock(&fi->lock);
>>> +		}
>>
>> I might be wrong, but doesn't that mean that
>> "get_fuse_inode(root)->nlookup--" needs to be called?
> 
> In the case where ret > 0, the nlookup on get_fuse_inode(root) is set to
> 1 by fuse_iget().  That ensures that the root is forgotten when later
> unmounted.  The code that handles the forget uses the count of nlookup
> to tell the server-side how many references to forget.  (That's in
> fuse_evict_inode()).
> 
> However, if the fuse_dentry_revalidate_lookup() call performs a valid
> lookup but returns an error, this function will return before it fills
> out s_root in the superblock or calls fuse_iget().  If the superblock
> doesn't have a s_root set, then the code in generic_kill_super() won't
> dput() the root dentry and trigger the forget.
> 
> The intention of this code was to handle the case where the lookup had
> succeeded, but the code determined it was still necessary to return an
> error.  In that situation, the reference taken by the lookup has to be
> accounted somewhere, and the parent seemed like a plausible candidate.

Yeah sorry, I had just missed that fuse_iget() also moved and then 
thought it would have increased fi->nlookup already.

> 
> However, after writing up this response, I can see that there's still a
> problem here if d_make_root(root) returns NULL, because we'll also lose
> track of the nlookup in that case.
> 
> If you agree that charging this to the parent on error makes sense, I'll
> re-work the error handling here so that the right thing happens when
> either fuse_dentry_revalidate_lookup() or d_make_root() encounter an
> error.

Oh yeah, I also missed that. Although, iput() calls iput_final, which 
then calls evict and sends the fuse forget - isn't that the right action 
already?

> 
> Thanks for the feedback.

Well, false alarm from my side, sorry again!


Bernd

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ