[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <A869A023-8695-48CC-B577-DAA1C75837C3@dilger.ca>
Date: Fri, 29 Jun 2018 22:36:06 -0600
From: Andreas Dilger <adilger@...ger.ca>
To: Jon Derrick <jonathan.derrick@...el.com>
Cc: Ext4 Developers List <linux-ext4@...r.kernel.org>,
Theodore Ts'o <tytso@....edu>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] ext4: Check superblock mapped prior to committing
On Jun 29, 2018, at 1:36 PM, Jon Derrick <jonathan.derrick@...el.com> wrote:
>
> This patch attempts to close a hole leading to a BUG seen with hot
> removals during writes [1].
>
> A block device (NVME namespace in this test case) is formatted to EXT4
> without partitions. It's mounted and write I/O is run to a file, then
> the device is hot removed from the slot. The superblock attempts to be
> written to the drive which is no longer present.
>
> The typical chain of events leading to the BUG:
> ext4_commit_super()
> __sync_dirty_buffer()
> submit_bh()
> submit_bh_wbc()
> BUG_ON(!buffer_mapped(bh));
>
> This fix checks for the superblock's buffer head being mapped prior to
> syncing.
>
> [1] https://www.spinics.net/lists/linux-ext4/msg56527.html
>
> Signed-off-by: Jon Derrick <jonathan.derrick@...el.com>
> ---
> fs/ext4/super.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 0c4c220..ee33233 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4736,6 +4736,14 @@ static int ext4_commit_super(struct super_block *sb, int sync)
>
> if (!sbh || block_device_ejected(sb))
> return error;
> +
> + /*
> + * The superblock bh should be mapped, but it might not be if the
> + * device was hot-removed. Not much we can do but fail the I/O.
> + */
> + if (!buffer_mapped(sbh))
> + return error;
This still looks a bit racy, based on the stack trace you posted.
There is already a "block_device_ejected()" check a line above,
which makes me think that the PCI device removal should be handled
like an ejected device, so that it is also handled elsewhere.
Even so, the check here in ext4_commit_super() can pass, and the
PCI card can be removed on the next instruction and still trigger
the BUG_ON().
That said, this is probably still an improvement on the existing
situation.
Cheers, Andreas
Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)
Powered by blists - more mailing lists