linux-ext4 - Re: Problem with direct IO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAOOPZo4HtGB5MYETpj_q++m+PvomNqasNdaPa65gp2hsQ5H67A@mail.gmail.com>
Date:   Tue, 19 Oct 2021 11:39:38 +0800
From:   Zhengyuan Liu <liuzhengyuang521@...il.com>
To:     Andrew Morton <akpm@...ux-foundation.org>
Cc:     viro@...iv.linux.org.uk, tytso@....edu,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        mysql@...ts.mysql.com, linux-ext4@...r.kernel.org,
        刘云 <liuyun01@...inos.cn>,
        Zhengyuan Liu <liuzhengyuan@...inos.cn>
Subject: Re: Problem with direct IO

On Tue, Oct 19, 2021 at 2:43 AM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Mon, 18 Oct 2021 09:09:06 +0800 Zhengyuan Liu <liuzhengyuang521@...il.com> wrote:
>
> > Ping.
> >
> > I think this problem is serious and someone may  also encounter it in
> > the future.
> >
> >
> > On Wed, Oct 13, 2021 at 9:46 AM Zhengyuan Liu
> > <liuzhengyuang521@...il.com> wrote:
> > >
> > > Hi, all
> > >
> > > we are encounting following Mysql crash problem while importing tables :
> > >
> > >     2021-09-26T11:22:17.825250Z 0 [ERROR] [MY-013622] [InnoDB] [FATAL]
> > >     fsync() returned EIO, aborting.
> > >     2021-09-26T11:22:17.825315Z 0 [ERROR] [MY-013183] [InnoDB]
> > >     Assertion failure: ut0ut.cc:555 thread 281472996733168
> > >
> > > At the same time , we found dmesg had following message:
> > >
> > >     [ 4328.838972] Page cache invalidation failure on direct I/O.
> > >     Possible data corruption due to collision with buffered I/O!
> > >     [ 4328.850234] File: /data/mysql/data/sysbench/sbtest53.ibd PID:
> > >     625 Comm: kworker/42:1
> > >
> > > Firstly, we doubled Mysql has operating the file with direct IO and
> > > buffered IO interlaced, but after some checking we found it did only
> > > do direct IO using aio. The problem is exactly from direct-io
> > > interface (__generic_file_write_iter) itself.
> > >
> > > ssize_t __generic_file_write_iter()
> > > {
> > > ...
> > >         if (iocb->ki_flags & IOCB_DIRECT) {
> > >                 loff_t pos, endbyte;
> > >
> > >                 written = generic_file_direct_write(iocb, from);
> > >                 /*
> > >                  * If the write stopped short of completing, fall back to
> > >                  * buffered writes.  Some filesystems do this for writes to
> > >                  * holes, for example.  For DAX files, a buffered write will
> > >                  * not succeed (even if it did, DAX does not handle dirty
> > >                  * page-cache pages correctly).
> > >                  */
> > >                 if (written < 0 || !iov_iter_count(from) || IS_DAX(inode))
> > >                         goto out;
> > >
> > >                 status = generic_perform_write(file, from, pos = iocb->ki_pos);
> > > ...
> > > }
> > >
> > > From above code snippet we can see that direct io could fall back to
> > > buffered IO under certain conditions, so even Mysql only did direct IO
> > > it could interleave with buffered IO when fall back occurred. I have
> > > no idea why FS(ext3) failed the direct IO currently, but it is strange
> > > __generic_file_write_iter make direct IO fall back to buffered IO, it
> > > seems  breaking the semantics of direct IO.
>
> That makes sense.
>
> > > The reproduced  environment is:
> > > Platform:  Kunpeng 920 (arm64)
> > > Kernel: V5.15-rc
> > > PAGESIZE: 64K
> > > Mysql:  V8.0
> > > Innodb_page_size: default(16K)
>
> This is all fairly mature code, I think.  Do you know if earlier
> kernels were OK, and if so which versions?

we have tested v4.18 and v4.19 and the problem is still here,  the earlier
version such before v4.12 doesn't support Arm64 well  so we can't test.

I think this problem has something to do with page size,  if we change kernel
page size from 64K to 4k or just set Innodb_page_size to 64K then we cannot
reproduce this problem.  Typically we use 4k as kernel page size and FS block
size, if database use more than 4k as IO unit then it won't interleave for each
IO in kernel page cache as each one will occupy one or more page cache, that
means it is hard to trigger this problem on x84 or other platforms using 4k page
size.  But thing got changed when come to Arm64 64K page size, if database uses
a smaller IO unit, in our Mysql case that is 16K DIO, then two IO
could share one
page cache and if one falls back to buffered IO it can trigger the problem. For
example,  aio got two direct IO which share the same page cache to write , it
dispatched the first one to storage and begin process the second one before
the first one completed, if the second one fall back to buffered IO it will been
copy to page cache and mark the page as dirty, upon that the first one completed
it will check and invalidate it's page cache, if it is dirty then the
problem occured.

If my analysis isn't correct please point it out, thanks.