linux-kernel - Re: [ata] 0568e61225: stress-ng.copy-file.ops_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YvXeuCAK780OuJPz@xsang-OptiPlex-9020>
Date:   Fri, 12 Aug 2022 13:01:44 +0800
From:   Oliver Sang <oliver.sang@...el.com>
To:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>
CC:     John Garry <john.garry@...wei.com>, Christoph Hellwig <hch@....de>,
        "Martin K. Petersen" <martin.petersen@...cle.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Linux Memory Management List <linux-mm@...ck.org>,
        <linux-ide@...r.kernel.org>, <lkp@...ts.01.org>, <lkp@...el.com>,
        <ying.huang@...el.com>, <feng.tang@...el.com>,
        <zhengjun.xing@...ux.intel.com>, <fengwei.yin@...el.com>
Subject: Re: [ata] 0568e61225: stress-ng.copy-file.ops_per_sec -15.0%
 regression

hi, Damien Le Moal,

On Tue, Aug 09, 2022 at 07:55:53AM -0700, Damien Le Moal wrote:
> On 2022/08/09 2:58, John Garry wrote:
> > On 08/08/2022 15:52, Damien Le Moal wrote:
> >> On 2022/08/05 1:05, kernel test robot wrote:
> >>>
> >>>
> >>> Greeting,
> >>>
> >>> FYI, we noticed a -15.0% regression of stress-ng.copy-file.ops_per_sec due to commit:
> >>>
> >>>
> >>> commit: 0568e6122574dcc1aded2979cd0245038efe22b6 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> >>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>
> >>> in testcase: stress-ng
> >>> on test machine: 96 threads 2 sockets Ice Lake with 256G memory
> >>> with following parameters:
> >>>
> >>> 	nr_threads: 10%
> >>> 	disk: 1HDD
> >>> 	testtime: 60s
> >>> 	fs: f2fs
> >>> 	class: filesystem
> >>> 	test: copy-file
> >>> 	cpufreq_governor: performance
> >>> 	ucode: 0xb000280
> >>
> >> Without knowing what the device adapter is, hard to say where the problem is. I
> >> suspect that with the patch applied, we may be ending up with a small default
> >> max_sectors value, causing overhead due to more commands than necessary.
> >>
> >> Will check what I see with my test rig.
> > 
> > As far as I can see, this patch should not make a difference unless the 
> > ATA shost driver is setting the max_sectors value unnecessarily low.
> 
> That is my hunch too, hence my question about which host driver is being used
> for this test... That is not apparent from the problem report.

we noticed the commit is already in mainline now, and in our tests, there is
still similar regression and also on other platforms.
could you guide us how to check "which host driver is being used for this
test"? hope to supply some useful information.

> 
> > 
> >>
> >>>
> >>>
> >>>
> >>>
> >>> If you fix the issue, kindly add following tag
> >>> Reported-by: kernel test robot <oliver.sang@...el.com>
> >>>
> >>>
> >>> Details are as below:
> >>> -------------------------------------------------------------------------------------------------->
> >>>
> >>>
> >>> To reproduce:
> >>>
> >>>          git clone https://github.com/intel/lkp-tests.git
> >>>          cd lkp-tests
> >>>          sudo bin/lkp install job.yaml           # job file is attached in this email
> >>>          bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
> >>>          sudo bin/lkp run generated-yaml-file
> >>>
> >>>          # if come across any failure that blocks the test,
> >>>          # please remove ~/.lkp and /lkp dir to run from a clean state.
> >>>
> >>> =========================================================================================
> >>> class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode:
> >>>    filesystem/gcc-11/performance/1HDD/f2fs/x86_64-rhel-8.3/10%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp1/copy-file/stress-ng/60s/0xb000280
> >>>
> >>> commit:
> >>>    4cbfca5f77 ("scsi: scsi_transport_sas: cap shost opt_sectors according to DMA optimal limit")
> >>>    0568e61225 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors")
> >>>
> >>> 4cbfca5f7750520f 0568e6122574dcc1aded2979cd0
> >>> ---------------- ---------------------------
> >>>           %stddev     %change         %stddev
> >>>               \          |                \
> >>>        1627           -14.9%       1385        stress-ng.copy-file.ops
> >>>       27.01           -15.0%      22.96        stress-ng.copy-file.ops_per_sec
> >>>     8935079           -11.9%    7870629        stress-ng.time.file_system_outputs
> >>>       14.88 ±  5%     -31.8%      10.14 ±  3%  stress-ng.time.percent_of_cpu_this_job_got
> >>>       50912           -14.7%      43413        vmstat.io.bo
> >>>       93.78            +1.4%      95.10        iostat.cpu.idle
> >>>        3.89           -31.6%       2.66        iostat.cpu.iowait
> >>>        4.01            -1.3        2.74        mpstat.cpu.all.iowait%
> >>>        0.23 ±  9%      -0.1        0.17 ± 11%  mpstat.cpu.all.sys%
> >>>        1.66 ± 37%      -1.2        0.51 ± 55%  perf-profile.calltrace.cycles-pp.f2fs_write_end.generic_perform_write.f2fs_buffered_write_iter.f2fs_file_write_iter.do_iter_readv_writev
> >>>        1.66 ± 37%      -1.1        0.59 ± 25%  perf-profile.children.cycles-pp.f2fs_write_end
> >>>        1.51 ± 40%      -1.1        0.45 ± 26%  perf-profile.children.cycles-pp.f2fs_dirty_data_folio
> >>>        1.21 ± 49%      -1.0        0.23 ± 33%  perf-profile.children.cycles-pp.f2fs_update_dirty_folio
> >>>        0.88 ± 56%      -0.8        0.04 ±111%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
> >>>        0.14 ± 26%      +0.1        0.25 ± 28%  perf-profile.children.cycles-pp.page_cache_ra_unbounded
> >>>        0.88 ± 56%      -0.8        0.04 ±112%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
> >>>     3164876 ±  9%     -20.2%    2524713 ±  7%  perf-stat.i.cache-misses
> >>>   4.087e+08            -4.6%  3.899e+08        perf-stat.i.dTLB-loads
> >>>      313050 ± 10%     -18.4%     255410 ±  6%  perf-stat.i.node-loads
> >>>      972573 ±  9%     -16.4%     812873 ±  6%  perf-stat.i.node-stores
> >>>     3114748 ±  9%     -20.2%    2484807 ±  7%  perf-stat.ps.cache-misses
> >>>   4.022e+08            -4.6%  3.837e+08        perf-stat.ps.dTLB-loads
> >>>      308178 ± 10%     -18.4%     251418 ±  6%  perf-stat.ps.node-loads
> >>>      956996 ±  9%     -16.4%     799948 ±  6%  perf-stat.ps.node-stores
> >>>      358486            -8.3%     328694        proc-vmstat.nr_active_file
> >>>     1121620           -11.9%     987816        proc-vmstat.nr_dirtied
> >>>      179906            -6.7%     167912        proc-vmstat.nr_dirty
> >>>     1151201            -1.7%    1131322        proc-vmstat.nr_file_pages
> >>>      100181            +9.9%     110078 ±  2%  proc-vmstat.nr_inactive_file
> >>>      846362           -14.6%     722471        proc-vmstat.nr_written
> >>>      358486            -8.3%     328694        proc-vmstat.nr_zone_active_file
> >>>      100181            +9.9%     110078 ±  2%  proc-vmstat.nr_zone_inactive_file
> >>>      180668            -6.8%     168456        proc-vmstat.nr_zone_write_pending
> >>>      556469            -3.5%     536985        proc-vmstat.pgactivate
> >>>     3385454           -14.6%    2889953        proc-vmstat.pgpgout
> >>>
> >>>
> >>>
> >>>
> >>> Disclaimer:
> >>> Results have been estimated based on internal Intel analysis and are provided
> >>> for informational purposes only. Any difference in system hardware or software
> >>> design or configuration may affect actual performance.
> >>>
> >>>
> >>
> >>
> > 
> 
> 
> -- 
> Damien Le Moal
> Western Digital Research