lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Jan 2021 17:58:54 +0800
From:   Ming Lei <ming.lei@...hat.com>
To:     kernel test robot <oliver.sang@...el.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Veronika Kabatova <vkabatov@...hat.com>,
        Christoph Hellwig <hch@....de>, Tejun Heo <tj@...nel.org>,
        Sagi Grimberg <sagi@...mberg.me>,
        Bart Van Assche <bvanassche@....org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...ts.01.org,
        lkp@...el.com, ying.huang@...el.com, feng.tang@...el.com,
        zhengjun.xing@...el.com
Subject: Re: [percpu_ref]  2b0d3d3e4f:  reaim.jobs_per_min -18.4% regression

On Sun, Jan 10, 2021 at 10:32:47PM +0800, kernel test robot wrote:
> 
> Greeting,
> 
> FYI, we noticed a -18.4% regression of reaim.jobs_per_min due to commit:
> 
> 
> commit: 2b0d3d3e4fcfb19d10f9a82910b8f0f05c56ee3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> 
> 
> in testcase: reaim
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
> with following parameters:
> 
> 	runtime: 300s
> 	nr_task: 100%
> 	test: short
> 	cpufreq_governor: performance
> 	ucode: 0x5002f01
> 
> test-description: REAIM is an updated and improved version of AIM 7 benchmark.
> test-url: https://sourceforge.net/projects/re-aim-7/
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -2.8% regression                |
> | test machine     | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
> | test parameters  | cpufreq_governor=performance                                              |
> |                  | runtime=300s                                                              |
> |                  | test=lru-file-mmap-read-rand                                              |
> |                  | ucode=0x5003003                                                           |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops 14.5% improvement            |
> | test machine     | 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory    |
> | test parameters  | cpufreq_governor=performance                                              |
> |                  | mode=process                                                              |
> |                  | nr_task=50%                                                               |
> |                  | test=page_fault2                                                          |
> |                  | ucode=0x16                                                                |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -13.0% regression            |
> | test machine     | 104 threads Skylake with 192G memory                                      |
> | test parameters  | cpufreq_governor=performance                                              |
> |                  | mode=process                                                              |
> |                  | nr_task=50%                                                               |
> |                  | test=malloc1                                                              |
> |                  | ucode=0x2006906                                                           |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | vm-scalability: vm-scalability.throughput -2.3% regression                |
> | test machine     | 96 threads Intel(R) Xeon(R) CPU @ 2.30GHz with 128G memory                |
> | test parameters  | cpufreq_governor=performance                                              |
> |                  | runtime=300s                                                              |
> |                  | test=lru-file-mmap-read-rand                                              |
> |                  | ucode=0x5002f01                                                           |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | fio-basic: fio.read_iops -4.8% regression                                 |
> | test machine     | 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory |
> | test parameters  | bs=4k                                                                     |
> |                  | cpufreq_governor=performance                                              |
> |                  | disk=2pmem                                                                |
> |                  | fs=xfs                                                                    |
> |                  | ioengine=libaio                                                           |
> |                  | nr_task=50%                                                               |
> |                  | runtime=200s                                                              |
> |                  | rw=randread                                                               |
> |                  | test_size=200G                                                            |
> |                  | time_based=tb                                                             |
> |                  | ucode=0x5002f01                                                           |
> +------------------+---------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.stackmmap.ops_per_sec -45.4% regression              |
> | test machine     | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory      |
> | test parameters  | class=memory                                                              |
> |                  | cpufreq_governor=performance                                              |
> |                  | disk=1HDD                                                                 |
> |                  | nr_threads=100%                                                           |
> |                  | testtime=10s                                                              |
> |                  | ucode=0x5002f01                                                           |
> +------------------+---------------------------------------------------------------------------+

Just run a quick test of the last two on 2b0d3d3e4fcf ("percpu_ref: reduce memory footprint of
percpu_ref in fast path) and cf785af19319 ("block: warn if !__GFP_DIRECT_RECLAIM in bio_crypt_set_ctx()").

Not see difference in the two kernel(fio on null_blk with 224 hw queues,
and 'stress-ng --stackmmap-ops') on one 224 cores, dual sockets system.

BTW this patch itself doesn't touch fast path code, so it is supposed to
not affect performance.

Can you double check if the test itself is good?

Note: cf785af19319 is 2b0d3d3e4fcf^



Thanks,
Ming

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ