[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200716185110.GB3703@twin.jikos.cz>
Date: Thu, 16 Jul 2020 20:51:10 +0200
From: David Sterba <dsterba@...e.cz>
To: Boris Burkov <boris@....io>
Cc: Chris Mason <clm@...com>, Josef Bacik <josef@...icpanda.com>,
David Sterba <dsterba@...e.com>, linux-btrfs@...r.kernel.org,
linux-kernel@...r.kernel.org, kernel-team@...com
Subject: Re: [PATCH v2] btrfs: fix mount failure caused by race with umount
On Fri, Jul 10, 2020 at 10:23:04AM -0700, Boris Burkov wrote:
> Here is the sequence laid out in greater detail:
>
> CPU0 CPU1
> down_write sb->s_umount
> btrfs_kill_super
> kill_anon_super(sb)
> generic_shutdown_super(sb);
> shrink_dcache_for_umount(sb);
> sync_filesystem(sb);
> evict_inodes(sb); // SLOW
>
> btrfs_mount_root
> btrfs_scan_one_device
> fs_devices = device->fs_devices
> fs_info->fs_devices = fs_devices
> // fs_devices-opened makes this a no-op
> btrfs_open_devices(fs_devices, mode, fs_type)
> s = sget(fs_type, test, set, flags, fs_info);
> find sb in s_instances
> grab_super(sb);
> down_write(&s->s_umount); // blocks
>
> sop->put_super(sb)
> // sb->fs_devices->opened == 2; no-op
> spin_lock(&sb_lock);
> hlist_del_init(&sb->s_instances);
> spin_unlock(&sb_lock);
> up_write(&sb->s_umount);
> return 0;
> retry lookup
> don't find sb in s_instances (deleted by CPU0)
> s = alloc_super
> return s;
> btrfs_fill_super(s, fs_devices, data)
> open_ctree // fs_devices total_rw_bytes improperly set!
> btrfs_read_chunk_tree
> read_one_dev // increment total_rw_bytes again!!
> super_total_bytes < fs_devices->total_rw_bytes // ERROR!!!
It seems weird that umount and mount can be mixed in such way but with
the VFS locks and structures it's valid, so the devices managed by btrfs
slipped through.
With the suggested fix, the bit BTRFS_DEV_STATE_IN_FS_METADATA becomes
quite important and the synchronization of the device related data.
The semantics seems quite subtle and inconsistent regarding other uses
of set_bit or clear_bit and the total_rw_bytes.
I'm thinkig about unconditional setting of IN_FS_METADATA as it is now,
but recalculating total_rw_size outside of read_one_dev in
btrfs_read_chunk_tree. There it should not matter if the bit was set by
the unmounted or the mounted filesystem, as long as the locking rules
for updating fs_devices hold. For that we have uuid_mutex and
fs_devices::device_list_mutex, this is used elsewhere so fixing it using
existing mechanisms is IMHO better way than relying on subtle
undocumented semantics of the state bit.
Powered by blists - more mailing lists