linux-kernel - Re: [LKP] Re: [ext4] b1b4705d54: filebench.sum_bytes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200325143102.GJ28951@quack2.suse.cz>
Date:   Wed, 25 Mar 2020 15:31:02 +0100
From:   Jan Kara <jack@...e.cz>
To:     Xing Zhengjun <zhengjun.xing@...ux.intel.com>
Cc:     Rong Chen <rong.a.chen@...el.com>, Jan Kara <jack@...e.cz>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        Matthew Bobrowski <mbobrowski@...browski.org>,
        Ritesh Harjani <riteshh@...ux.ibm.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        lkp@...ts.01.org
Subject: Re: [LKP] Re: [ext4] b1b4705d54: filebench.sum_bytes_mb/s -20.2%
 regression

On Wed 25-03-20 13:50:09, Xing Zhengjun wrote:
> ping...
> The issue still exists in v5.6-rc7.

So I have tried again to reproduce this so that I can look into the
regression. When observing what is actually happening in the system I have
to say that this workfile (or actually its implementation in filebench) is
pretty dubious. The problem is that filebench first creates the files by
writing them through ordinary write(2). Then it immediately starts reading
the files with direct IO read. So what happens is that by the time direct
IO read is running, the system is still writing back the create files and
depending on how read vs writes get scheduled, you get different results.
Also direct IO read will first flush the range it is going to read from the
page cache so to some extent this is actually parallel small ranged
fsync(2) benchmark. Finally differences in how we achieve integrity of
direct IO reads with dirty page cache are going to impact this benchmark.

So overall can now see why this commit makes a difference but the workload
is IMHO largely irrelevant. What would make sense is to run filebench once,
then unmount & mount the fs to force files to disk and clear page cache and
then run it again. Filebench will reuse the files in this case and then
parallel direct IO readers without page cache are a sensible workload. But
I didn't see any difference in that (even with rotating disk) on my
machines.

								Honza
> 
> On 3/4/2020 4:15 PM, Xing Zhengjun wrote:
> > Hi Matthew,
> > 
> >   We test it in v5.6-rc4, the issue still exist, do you have time to
> > take a look at this? Thanks.
> > 
> > On 1/8/2020 10:31 AM, Rong Chen wrote:
> > > 
> > > 
> > > On 1/8/20 1:28 AM, Jan Kara wrote:
> > > > On Tue 07-01-20 11:57:08, Theodore Y. Ts'o wrote:
> > > > > On Tue, Jan 07, 2020 at 02:41:06PM +0100, Jan Kara wrote:
> > > > > > Hello,
> > > > > > 
> > > > > > On Tue 24-12-19 08:59:15, kernel test robot wrote:
> > > > > > > FYI, we noticed a -20.2% regression of
> > > > > > > filebench.sum_bytes_mb/s due to commit:
> > > > > > > 
> > > > > > > 
> > > > > > > commit: b1b4705d54abedfd69dcdf42779c521aa1e0fbd3
> > > > > > > ("ext4: introduce direct I/O read using iomap
> > > > > > > infrastructure")
> > > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
> > > > > > > master
> > > > > > > 
> > > > > > > in testcase: filebench
> > > > > > > on test machine: 8 threads Intel(R) Core(TM) i7-4770
> > > > > > > CPU @ 3.40GHz with 8G memory
> > > > > > > with following parameters:
> > > > > > > 
> > > > > > >     disk: 1HDD
> > > > > > >     fs: ext4
> > > > > > >     test: fivestreamreaddirect.f
> > > > > > >     cpufreq_governor: performance
> > > > > > >     ucode: 0x27
> > > > > > I was trying to reproduce this but I failed with my test
> > > > > > VM. I had SATA SSD
> > > > > > as a backing store though so maybe that's what makes a
> > > > > > difference. Maybe
> > > > > > the new code results in somewhat more seeks because the
> > > > > > five threads which
> > > > > > compete in submitting sequential IO end up being more interleaved?
> > > > > A "-20.2% regression" should be read as a "20.2% performance
> > > > > improvement" is zero-day kernel speak.
> > > > Are you sure? I can see:
> > > > 
> > > >       58.30 ±  2%     -20.2%      46.53        filebench.sum_bytes_mb/s
> > > > 
> > > > which implies to me previously the throughput was 58 MB/s and after the
> > > > commit it was 46 MB/s?
> > > > 
> > > > Anyway, in my testing that commit made no difference in that benchmark
> > > > whasoever (getting around 97 MB/s for each thread before and after the
> > > > commit).
> > > >                                 Honza
> > > 
> > > We're sorry for the misunderstanding, "-20.2%" means the change of
> > > filebench.sum_bytes_mb/s,
> > > "regression" means the explanation of this change from LKP.
> > > 
> > > Best Regards,
> > > Rong Chen
> > > _______________________________________________
> > > LKP mailing list -- lkp@...ts.01.org
> > > To unsubscribe send an email to lkp-leave@...ts.01.org
> > 
> 
> -- 
> Zhengjun Xing
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR