linux-kernel - Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200715110437.7D0A3AE051@d06av26.portsmouth.uk.ibm.com>
Date:   Wed, 15 Jul 2020 16:34:36 +0530
From:   Ritesh Harjani <riteshh@...ux.ibm.com>
To:     kernel test robot <rong.a.chen@...el.com>,
        Xing Zhengjun <zhengjun.xing@...ux.intel.com>
Cc:     "Theodore Ts'o" <tytso@....edu>, kbuild test robot <lkp@...el.com>,
        Jan Kara <jack@...e.cz>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org
Subject: Re: [ext4] d3b6f23f71: stress-ng.fiemap.ops_per_sec -60.5% regression

Hello Xing,

On 4/7/20 1:30 PM, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a -60.5% regression of stress-ng.fiemap.ops_per_sec due to commit:
> 
> 
> commit: d3b6f23f71670007817a5d59f3fbafab2b794e8c ("ext4: move ext4_fiemap to use iomap framework")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: stress-ng
> on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
> with following parameters:
> 
> 	nr_threads: 10%
> 	disk: 1HDD
> 	testtime: 1s
> 	class: os
> 	cpufreq_governor: performance
> 	ucode: 0x500002c
> 	fs: ext4

I started looking into this issue. But with my unit testing, I didn't
find any perf issue with fiemap ioctl call. I haven't yet explored about
how stress-ng take fiemap performance numbers, it could be doing
something differently. But in my testing I just made sure to create a
file with large number of extents and used xfs_io -c "fiemap -v" cmd
to check how much time it takes to read all the entries in 1st
and subsequent iterations.

Setup comprised of qemu machine on x86_64 with latest linux branch.

1. created a file of 10G using fallocate. (this allocated unwritten
extents for this file).

2. Then I punched hole on every alternate block of file. This step took
a long time. And after sufficiently long time, I had to cancel it.
for i in $(seq 1 2 xxxxx); do echo $i; fallocate -p -o $(($i*4096)) -l 
4096; done

3. Then issued fiemap call via xfs_io and took the time measurement.
time xfs_io -c "fiemap -v" bigfile > /dev/null

Perf numbers on latest default kernel build for above cmd.

1st iteration
==============
real    0m31.684s
user    0m1.593s
sys     0m24.174s

2nd and subsequent iteration
============================
real    0m3.379s
user    0m1.300s
sys     0m2.080s

4. Then I reverted all the iomap_fiemap patches and re-tested this.
With this the older ext4_fiemap implementation will be tested:-

1st iteration
==============
real    0m31.591s
user    0m1.400s
sys     0m24.243s

2nd and subsequent iteration (had to cancel it since it was taking more 
time then 15m)
============================
^C^C
real    15m49.884s
user    0m0.032s
sys     15m49.722s

I guess the reason why 2nd iteration with older implementation is taking
too much time is since with previous implementation we never cached
extent entries in extent_status tree. And also in 1st iteration the page
cache may get filled with lot of buffer_head entries. So maybe page
reclaims are taking more time.

While with the latest implementation using iomap_fiemap(), the call to 
query extent blocks is done using ext4_map_blocks(). ext4_map_blocks()
by default will also cache the extent entries into extent_status tree.
Hence during 2nd iteration, we will directly read the entries from 
extent_status tree and will not do any disk I/O.

-ritesh