linux-kernel - Re: [PATCH v2 2/2] btrfs: send: fix sending link commands for existing file paths

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220712201442.GC15169@twin.jikos.cz>
Date:   Tue, 12 Jul 2022 22:14:42 +0200
From:   David Sterba <dsterba@...e.cz>
To:     bingjingc <bingjingc@...ology.com>
Cc:     josef@...icpanda.com, dsterba@...e.com, clm@...com,
        linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org,
        fdmanana@...nel.org, robbieko@...ology.com, bxxxjxxg@...il.com
Subject: Re: [PATCH v2 2/2] btrfs: send: fix sending link commands for
 existing file paths

On Tue, Jul 12, 2022 at 09:36:32AM +0800, bingjingc wrote:
> From: BingJing Chang <bingjingc@...ology.com>
> 
> There is a bug sending link commands for existing file paths. When we're
> processing an inode, we go over all references. All the new file paths are
> added to the "new_refs" list. And all the deleted file paths are added to
> the "deleted_refs" list. In the end, when we finish processing the inode,
> we iterate over all the items in the "new_refs" list and send link commands
> for those file paths. After that, we go over all the items in the
> "deleted_refs" list and send unlink commands for them. If there are
> duplicated file paths in both lists, we will try to create them before we
> remove them. Then the receiver gets an -EEXIST error when trying the link
> operations.
> 
> Example for having duplicated file paths in both list:
> 
>   $ btrfs subvolume create vol
> 
>   # create a file and 2000 hard links to the same inode
>   $ touch vol/foo
>   $ for i in {1..2000}; do link vol/foo vol/$i ; done
> 
>   # take a snapshot for a parent snapshot
>   $ btrfs subvolume snapshot -r vol snap1
> 
>   # remove 2000 hard links and re-create the last 1000 links
>   $ for i in {1..2000}; do rm vol/$i; done;
>   $ for i in {1001..2000}; do link vol/foo vol/$i; done
> 
>   # take another one for a send snapshot
>   $ btrfs subvolume snapshot -r vol snap2
> 
>   $ mkdir receive_dir
>   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
>   At subvol snap2
>   link 1238 -> foo
>   ERROR: link 1238 -> foo failed: File exists
> 
> In this case, we will have the same file paths added to both lists. In the
> parent snapshot, reference paths {1..1237} are stored in inode references,
> but reference paths {1238..2000} are stored in inode extended references.
> In the send snapshot, all reference paths {1001..2000} are stored in inode
> references. During the incremental send, we process their inode references
> first. In record_changed_ref(), we iterate all its inode references in the
> send/parent snapshot. For every inode reference, we also use find_iref() to
> check whether the same file path also appears in the parent/send snapshot
> or not. Inode references {1238..2000} which appear in the send snapshot but
> not in the parent snapshot are added to the "new_refs" list. On the other
> hand, Inode references {1..1000} which appear in the parent snapshot but
> not in the send snapshot are added to the "deleted_refs" list. Next, when
> we process their inode extended references, reference paths {1238..2000}
> are added to the "deleted_refs" list because all of them only appear in the
> parent snapshot. Now two lists contain items as below:
> "new_refs" list: {1238..2000}
> "deleted_refs" list: {1..1000}, {1238..2000}
> 
> Reference paths {1238..2000} appear in both lists. And as the processing
> order talked about before, the receiver gets an -EEXIST error when trying
> the link operations.
> 
> To fix the bug, the intuition is to process the "deleted_refs" list before
> the "new_refs" list. However, it's not easy to reshuffle the processing
> order. For one reason, if we do so, we may unlink all the existing paths
> first, there's no valid path anymore for links. And it's inefficient
> because we do a bunch of unlinks followed by links for the same paths.
> Moreover, it makes less sense to have duplications in both lists. A
> reference path cannot not only be regarded as new but also has been seen in
> the past, or we won't call it a new path. However, it's also not a good
> idea to make find_iref() to check a reference against all inode references
> and all inode extended references because it may result in large disk
> reads. So we introduce two rbtrees to make the references easier for
> lookups. And we also introduce __record_new_ref_if_needed() and
> __record_deleted_ref_if_needed() for changed_ref() to check and remove
> duplicated references early.
> 
> Reviewed-by: Robbie Ko <robbieko@...ology.com>
> Signed-off-by: BingJing Chang <bingjingc@...ology.com>

Added to misc-next with some fixups, thanks.

> +static int rbtree_ref_comp(const void *k, const struct rb_node *node)
> +{
> +	const struct recorded_ref *data = k;
> +	const struct recorded_ref *ref = rb_entry(node, struct recorded_ref, node);
> +	int result;
> +
> +	if (data->dir > ref->dir)
> +		return 1;
> +	if (data->dir < ref->dir)
> +		return -1;
> +	if (data->dir_gen > ref->dir_gen)
> +		return 1;
> +	if (data->dir_gen < ref->dir_gen)
> +		return -1;
> +	if (data->name_len > ref->name_len)
> +		return 1;
> +	if (data->name_len < ref->name_len)
> +		return -1;
> +	result = strcmp(data->name, ref->name);

I think this is the same as

	return strcmp(...);

but I have kept it as is for now.

> +	if (result > 0)
> +		return 1;
> +	if (result < 0)
> +		return -1;
> +	return 0;
> +}
> +
> +static int __record_new_ref_if_needed(int num, u64 dir, int index,

I've dropped the __ and updated all callers, it's not appropriate to be
used here.

> +				      struct fs_path *name, void *ctx)
> +{
> +	int ret = 0;
> +	struct send_ctx *sctx = ctx;
> +	struct rb_node *node = NULL;
> +	struct recorded_ref data;
> +	struct recorded_ref *ref;
> +	u64 dir_gen;
> +
> +	ret = get_inode_info(sctx->send_root, dir, NULL, &dir_gen, NULL,
> +			     NULL, NULL, NULL);

This function got a new parameter in "btrfs: send: add new command
FILEATTR for file attributes", so added NULL in all new instances.

> +	if (ret < 0)
> +		goto out;
> +
> +	data.dir = dir;
> +	data.dir_gen = dir_gen;
> +	set_ref_path(&data, name);
> +	node = rb_find(&data, &sctx->rbtree_deleted_refs, rbtree_ref_comp);
> +	if (node) {
> +		ref = rb_entry(node, struct recorded_ref, node);
> +		recorded_ref_free(ref);
> +	} else {
> +		ret = record_ref_in_tree(&sctx->rbtree_new_refs,
> +					 &sctx->new_refs, name, dir, dir_gen,
> +					 sctx);
> +	}
> +out:
> +	return ret;
> +}
> +
> +static int __record_deleted_ref_if_needed(int num, u64 dir, int index,

Same, dropped __

> +			    struct fs_path *name,
> +			    void *ctx)
> +{
> +	int ret = 0;
> +	struct send_ctx *sctx = ctx;
> +	struct rb_node *node = NULL;
> +	struct recorded_ref data;
> +	struct recorded_ref *ref;
> +	u64 dir_gen;
> +
> +	ret = get_inode_info(sctx->parent_root, dir, NULL, &dir_gen, NULL,
> +			     NULL, NULL, NULL);
> +	if (ret < 0)
> +		goto out;
> +
> +	data.dir = dir;
> +	data.dir_gen = dir_gen;
> +	set_ref_path(&data, name);
> +	node = rb_find(&data, &sctx->rbtree_new_refs, rbtree_ref_comp);
> +	if (node) {
> +		ref = rb_entry(node, struct recorded_ref, node);
> +		recorded_ref_free(ref);
> +	} else {
> +		ret = record_ref_in_tree(&sctx->rbtree_deleted_refs,
> +					 &sctx->deleted_refs, name, dir,
> +					 dir_gen, sctx);
> +	}
> +out:
> +	return ret;
> +}