[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOOPZo43cwh5ujm3n-r9Bih=7gS7Oav0B=J_8AepWDgdeBRkYA@mail.gmail.com>
Date: Thu, 21 Oct 2021 10:21:55 +0800
From: Zhengyuan Liu <liuzhengyuang521@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: viro@...iv.linux.org.uk, Andrew Morton <akpm@...ux-foundation.org>,
tytso@....edu, linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
linux-ext4@...r.kernel.org,
刘云 <liuyun01@...inos.cn>,
Zhengyuan Liu <liuzhengyuan@...inos.cn>
Subject: Re: Problem with direct IO
On Thu, Oct 21, 2021 at 1:37 AM Jan Kara <jack@...e.cz> wrote:
>
> On Wed 13-10-21 09:46:46, Zhengyuan Liu wrote:
> > Hi, all
> >
> > we are encounting following Mysql crash problem while importing tables :
> >
> > 2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL]
> > fsync() returned EIO, aborting.
> > 2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB]
> > Assertion failure: ut0ut.cc:555 thread 281472996733168
> >
> > At the same time , we found dmesg had following message:
> >
> > [ 4328.838972] Page cache invalidation failure on direct I/O.
> > Possible data corruption due to collision with buffered I/O!
> > [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID:
> > 625 Comm: kworker/42:1
> >
> > Firstly, we doubled Mysql has operating the file with direct IO and
> > buffered IO interlaced, but after some checking we found it did only
> > do direct IO using aio. The problem is exactly from direct-io
> > interface (__generic_file_write_iter) itself.
> >
> > ssize_t __generic_file_write_iter()
> > {
> > ...
> > if (iocb->ki_flags & IOCB_DIRECT) {
> > loff_t pos, endbyte;
> >
> > written = generic_file_direct_write(iocb, from);
> > /*
> > * If the write stopped short of completing, fall back to
> > * buffered writes. Some filesystems do this for writes to
> > * holes, for example. For DAX files, a buffered write will
> > * not succeed (even if it did, DAX does not handle dirty
> > * page-cache pages correctly).
> > */
> > if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
> > goto out;
> >
> > status = generic_perform_write(file, from, pos = iocb->ki_pos);
> > ...
> > }
> >
> > From above code snippet we can see that direct io could fall back to
> > buffered IO under certain conditions, so even Mysql only did direct IO
> > it could interleave with buffered IO when fall back occurred. I have
> > no idea why FS(ext3) failed the direct IO currently, but it is strange
> > __generic_file_write_iter make direct IO fall back to buffered IO, it
> > seems breaking the semantics of direct IO.
> >
> > The reproduced environment is:
> > Platform: Kunpeng 920 (arm64)
> > Kernel: V5.15-rc
> > PAGESIZE: 64K
> > Mysql: V8.0
> > Innodb_page_size: default(16K)
>
> Thanks for report. I agree this should not happen. How hard is this to
> reproduce? Any idea whether the fallback to buffered IO happens because
> iomap_dio_rw() returns -ENOTBLK or because it returns short write?
It is easy to reproduce in my test environment, as I said in the previous email
replied to Andrew this problem is related to kernel page size.
> Can you post output of "dumpe2fs -h <device>" for the filesystem where the
> problem happens? Thanks!
Sure, the output is:
# dumpe2fs -h /dev/sda3
dumpe2fs 1.45.3 (14-Jul-2019)
Filesystem volume name: <none>
Last mounted on: /data
Filesystem UUID: 09a51146-b325-48bb-be63-c9df539a90a1
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
Filesystem flags: unsigned_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 11034624
Block count: 44138240
Reserved block count: 2206912
Free blocks: 43168100
Free inodes: 11034613
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1013
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Filesystem created: Thu Oct 21 09:42:03 2021
Last mount time: Thu Oct 21 09:43:36 2021
Last write time: Thu Oct 21 09:43:36 2021
Mount count: 1
Maximum mount count: -1
Last checked: Thu Oct 21 09:42:03 2021
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: a7b04e61-1209-496d-ab9d-a51009b51ddb
Journal backup: inode blocks
Journal features: journal_incompat_revoke
Journal size: 1024M
Journal length: 262144
Journal sequence: 0x00000002
Journal start: 1
BTW, we have also tested Ext4 and XFS and didn't see direct write fallback.
Thanks,
Powered by blists - more mailing lists