[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87a8q4k6xn.fsf@openvz.org>
Date: Mon, 23 Nov 2015 19:37:56 +0300
From: Dmitry Monakhov <dmonakhov@...nvz.org>
To: linux-ext4@...r.kernel.org
Cc: jack@...e.cz, tytso@....edu
Subject: Re: [PATCH] ext4: fix race aio-dio vs freeze_fs
Dmitry Monakhov <dmonakhov@...nvz.org> writes:
> After freeze_fs was revoked (from Jan Kara) pages's write-back completion
> is deffered before unwritten conversion, so explicit flush_unwritten_io()
> was removed here: c724585b62411
> But we still may face deferred conversion for aio-dio case
> # Trivial testcase
> for ((i=0;i<60;i++));do fsfreeze -f /mnt ;sleep 1;fsfreeze -u /mnt;done &
> fio --bs=4k --ioengine=libaio --iodepth=128 --size=1g --direct=1 \
> --runtime=60 --filename=/mnt/file --name=rand-write --rw=randwrite
> NOTE: Sane testcase should be integrated to xfstests, but it requires
> changes in common/* code, so let's use this this test at the moment.
>
> In order to fix this race we have to guard journal transaction with explicit
> sb_{start,end}_intwrite() as we do with ext4_evict_inode here:8e8ad8a5
Fairly to say I'm not very happy with the fix because it continues bad
practice of ad-hock fixes for generic journal vs freeze synchronization
Ideal fix would be to move sb_start_intwrite/sb_end_intwrite() to
ext4_journal_start()/ext4_journal_stop() but this is not possible due to
limitations introduced by nojournal mode (described here:8e8ad8a5)
So let's fix nojournal instead. In order to do that we somehow have
store ref_count and pointer to sb inside nojournal_handle.
There are two possible ways to do that.
1) Embed second journal related field to task_struct and guard it with
compile macros definition.
void *journal_info;
+ #ifdef CONFIG_EXTRA_JOURNAL_INFO
+ void *journal_info2;
+ #endif
2) Encode ref and sb in to single long. This can be done by aligning
ext4_sb_info pointer to 4096. So we can embed ref count to lower bits
like follows.
#define EXT4_NOJOURNAL_SHIFT 12
#define EXT4_NOJOURNAL_MAX_REF_COUNT 1 << (EXT4_NOJOURNAL_SHIFT-1)
#define EXT4_NOJOURNAL_MASK (1 << EXT4_NOJOURNAL_SHIFT) -1
#define NOJOURNAL_SB(handle) (handle & ~EXT4_NOJOURNAL_MASK)
#define NOJOURNAL_REF(handle) ((handle & ~EXT4_NOJOURNAL_MASK) >> 1)
static int ext4_handle_valid(handle_t *handle)
{
return !(handle & 0x1);
}
static handle_t *get_nojournal_handle(struct super_block *sb)
{
handle_t *handle = current->journal_info;
struct super_block *old_sb = NOJOURNAL_SB(handle);
unsigned long ref_cnt = NOJOURNAL_REF(handle);
BUG_ON(old_sb && old_sb != sb);
ref++;
current->journal_info = NOJOURNAL_SB(handle);
}
What do you think about this? Are where any better way to fix this?
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@...nvz.org>
> ---
> fs/ext4/extents.c | 7 +++++++
> 1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> index 3a6197a..4cba944 100644
> --- a/fs/ext4/extents.c
> +++ b/fs/ext4/extents.c
> @@ -5040,6 +5040,12 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
> max_blocks = ((EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) -
> map.m_lblk);
> /*
> + * Protect us against freezing - AIO-DIO case. Caller didn't have to
> + * have any protection against it
> + */
> + sb_start_intwrite(inode->i_sb);
> +
> + /*
> * This is somewhat ugly but the idea is clear: When transaction is
> * reserved, everything goes into it. Otherwise we rather start several
> * smaller transactions for conversion of each extent separately.
> @@ -5083,6 +5089,7 @@ int ext4_convert_unwritten_extents(handle_t *handle, struct inode *inode,
> }
> if (!credits)
> ret2 = ext4_journal_stop(handle);
> + sb_end_intwrite(inode->i_sb);
> return ret > 0 ? ret2 : ret;
> }
>
> --
> 1.7.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
Download attachment "signature.asc" of type "application/pgp-signature" (473 bytes)
Powered by blists - more mailing lists