lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 27 Feb 2018 15:42:33 +0800
From:   kemi <kemi.wang@...el.com>
To:     Jeff Layton <jlayton@...hat.com>,
        Ye Xiaolong <xiaolong.ye@...el.com>
Cc:     David Howells <dhowells@...hat.com>, lkp@...org,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0%
 regression



On 2018年02月26日 20:33, Jeff Layton wrote:
> On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
>> On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
>>> On 02/25, Jeff Layton wrote:
>>>> On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
>>>>> Greeting,
>>>>>
>>>>> FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
>>>>>
>>>>>
>>>>> commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>>>>>
>>>>> in testcase: aim7
>>>>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
>>>>> with following parameters:
>>>>>
>>>>> 	disk: 4BRD_12G
>>>>> 	md: RAID0
>>>>> 	fs: xfs
>>>>> 	test: disk_src
>>>>> 	load: 3000
>>>>> 	cpufreq_governor: performance
>>>>>
>>>>> test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
>>>>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>>>>>
>>>>>
>>>>
>>>> I'm a bit suspicious of this result.
>>>>
>>>> This patch only changes inode_cmp_iversion{+raw} (since renamed to
>>>> inode_eq_iversion{+raw}), and that neither should ever be called from
>>>> xfs. The patch is fairly trivial too, and I wouldn't expect a big
>>>> performance hit.
>>>
>>> I tried to queue 4 more times test for both commit c0cef30e4f and its parent,
>>> the result seems quite stable.
>>>
>>> c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
>>>  "aim7.jobs-per-min": [
>>>     32964.01,
>>>     32938.68,
>>>     33068.18,
>>>     32886.32,
>>>     32843.72,
>>>     32798.83,
>>>     32898.34,
>>>     32952.55
>>>   ],
>>>
>>> 3da90b159b146672f830bcd2489dd3a1f4e9e089:
>>>   "aim7.jobs-per-min": [
>>>     40239.65,
>>>     40163.33,
>>>     40353.32,
>>>     39976.9,
>>>     40185.75,
>>>     40411.3,
>>>     40213.58,
>>>     39900.69
>>>   ],
>>>
>>> Any other test data you may need?
>>>
>>>>
>>>> Is IMA involved here at all? I didn't see any evidence of it, but the
>>>> kernel config did have it enabled.
>>>>
>>>
>>> Sorry, not quite familiar with IMA, could you tell more about how to check it?
>>>
>>
>> Thanks for retesting it, but I'm at a loss for why we're seeing this:
>>
>> IMA is the the integrity management subsystem. It will use the iversion
>> field to determine whether to remeasure files during remeasurement.  It
>> looks like the kernel config has it enabled, but it doesn't look like
>> it's in use, based on the info in the initial report.
>>
>> This patch only affects two inlined functions inode_cmp_iversion and
>> inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
>> out). These functions are only called from IMA and fs-specific code
>> (usually in readdir implementations to detect directory changes).
>>
>> XFS does not call either of these functions however, so I'm a little
>> unclear on how this patch could slow anything down on this test. The
>> only thing I can think to do here would be to profile this and see what
>> stands out.
>>
>> Note that we do need to keep this in perspective too. This 18%
>> regression on this test follows around a ~230% improvement that occurred
>> when we merged the bulk of these patches. It's should still be quite a
>> bit faster than the v4.15 in this regard.
>>
>> Still, it'd be good to understand what's going on here.
>>
>>
> 
> Could we see the dmesg from this boot? It'd be good to confirm that IMA
> is not involved here, as that's the only place that I can see that would
> call into this code at all here.
> 

See attachment for info on dmesg/perf-profile/compare_result.
Feel free to let Xiaolong or me know if anything else you would like to check.

> Thanks,
> Jeff
> 
> 
>>> Thanks,
>>> Xiaolong
>>>>
>>>>>
>>>>> Details are as below:
>>>>> -------------------------------------------------------------------------------------------------->
>>>>>
>>>>>
>>>>> To reproduce:
>>>>>
>>>>>         git clone https://github.com/intel/lkp-tests.git
>>>>>         cd lkp-tests
>>>>>         bin/lkp install job.yaml  # job file is attached in this email
>>>>>         bin/lkp run     job.yaml
>>>>>
>>>>> =========================================================================================
>>>>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>>>>   gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
>>>>>
>>>>> commit: 
>>>>>   3da90b159b (" f2fs-for-4.16-rc1")
>>>>>   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
>>>>>
>>>>> 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
>>>>> ---------------- -------------------------- 
>>>>>          %stddev     %change         %stddev
>>>>>              \          |                \  
>>>>>      40183           -18.0%      32964        aim7.jobs-per-min
>>>>>     448.60           +21.9%     546.68        aim7.time.elapsed_time
>>>>>     448.60           +21.9%     546.68        aim7.time.elapsed_time.max
>>>>>       5615 ±  5%     +33.4%       7489 ±  4%  aim7.time.involuntary_context_switches
>>>>>       3086           +14.0%       3518        aim7.time.system_time
>>>>>   19439782            -5.6%   18359474        aim7.time.voluntary_context_switches
>>>>>     199333           +14.3%     227794 ±  2%  interrupts.CAL:Function_call_interrupts
>>>>>       0.59            -0.1        0.50        mpstat.cpu.usr%
>>>>>    2839401           +16.0%    3293688        softirqs.SCHED
>>>>>    7600068           +15.1%    8747820        softirqs.TIMER
>>>>>     118.00 ± 43%     +98.7%     234.50 ± 15%  vmstat.io.bo
>>>>>      87840           -22.4%      68154        vmstat.system.cs
>>>>>     552798 ±  6%     +15.8%     640107 ±  4%  numa-numastat.node0.local_node
>>>>>     557345 ±  6%     +15.7%     644666 ±  4%  numa-numastat.node0.numa_hit
>>>>>     528341 ±  7%     +21.7%     642933 ±  4%  numa-numastat.node1.local_node
>>>>>     531604 ±  7%     +21.6%     646209 ±  4%  numa-numastat.node1.numa_hit
>>>>>  2.147e+09           -12.4%   1.88e+09        cpuidle.C1.time
>>>>>   13702041           -14.7%   11683737        cpuidle.C1.usage
>>>>>  2.082e+08 ±  4%     +28.1%  2.667e+08 ±  5%  cpuidle.C1E.time
>>>>>  4.719e+08 ±  2%     +23.1%  5.807e+08 ±  4%  cpuidle.C3.time
>>>>>  1.141e+10           +31.0%  1.496e+10        cpuidle.C6.time
>>>>>   15672622           +27.8%   20031028        cpuidle.C6.usage
>>>>>   13520572 ±  3%     +29.5%   17514398 ±  9%  cpuidle.POLL.time
>>>>>     278.25 ±  5%     -46.0%     150.25 ± 73%  numa-vmstat.node0.nr_dirtied
>>>>>       3200 ± 14%     -20.6%       2542 ± 19%  numa-vmstat.node0.nr_mapped
>>>>>     277.75 ±  5%     -46.2%     149.50 ± 73%  numa-vmstat.node0.nr_written
>>>>>      28.50 ± 52%    +448.2%     156.25 ± 70%  numa-vmstat.node1.nr_dirtied
>>>>>       2577 ± 19%     +26.3%       3255 ± 15%  numa-vmstat.node1.nr_mapped
>>>>>     634338 ±  4%      +7.8%     683959 ±  4%  numa-vmstat.node1.numa_hit
>>>>>     457411 ±  6%     +10.8%     506800 ±  5%  numa-vmstat.node1.numa_local
>>>>>       3734 ±  8%     -11.5%       3306 ±  6%  proc-vmstat.numa_hint_faults_local
>>>>>    1114538           +18.3%    1318978        proc-vmstat.numa_hit
>>>>>    1106722           +18.5%    1311136        proc-vmstat.numa_local
>>>>>      22100            +7.5%      23753 ±  4%  proc-vmstat.numa_pages_migrated
>>>>>    1174556           +18.0%    1386359        proc-vmstat.pgalloc_normal
>>>>>    1241445           +18.1%    1466086        proc-vmstat.pgfault
>>>>>    1138310           +19.3%    1358132        proc-vmstat.pgfree
>>>>>      22100            +7.5%      23753 ±  4%  proc-vmstat.pgmigrate_success
>>>>>      53332 ± 43%    +143.0%     129617 ± 14%  proc-vmstat.pgpgout
>>>>>       1.42 ±  2%      +1.7        3.07        perf-stat.branch-miss-rate%
>>>>>  1.064e+10          +123.3%  2.375e+10        perf-stat.branch-misses
>>>>>      10.79            +0.6       11.43        perf-stat.cache-miss-rate%
>>>>>  5.583e+09            +5.9%  5.915e+09        perf-stat.cache-misses
>>>>>   39652092            -5.0%   37662545        perf-stat.context-switches
>>>>>       1.29           +11.7%       1.44        perf-stat.cpi
>>>>>  4.637e+12           +12.8%   5.23e+12        perf-stat.cpu-cycles
>>>>>  8.653e+11            +9.8%  9.498e+11 ±  2%  perf-stat.dTLB-loads
>>>>>  3.654e+11           +12.4%  4.109e+11        perf-stat.dTLB-stores
>>>>>       0.78           -10.5%       0.70        perf-stat.ipc
>>>>>    1214932           +17.9%    1432266        perf-stat.minor-faults
>>>>>  1.334e+09            -1.8%   1.31e+09        perf-stat.node-store-misses
>>>>>  1.651e+09            -1.8%   1.62e+09        perf-stat.node-stores
>>>>>    1214954           +17.9%    1432313        perf-stat.page-faults
>>>>>     256.75          -100.0%       0.00        turbostat.Avg_MHz
>>>>>      21.39           -21.4        0.00        turbostat.Busy%
>>>>>       1200          -100.0%       0.00        turbostat.Bzy_MHz
>>>>>   13695007          -100.0%       0.00        turbostat.C1
>>>>>      11.92           -11.9        0.00        turbostat.C1%
>>>>>    2116683 ±  2%    -100.0%       0.00        turbostat.C1E
>>>>>       1.16 ±  4%      -1.2        0.00        turbostat.C1E%
>>>>>    3112269          -100.0%       0.00        turbostat.C3
>>>>>       2.62 ±  2%      -2.6        0.00        turbostat.C3%
>>>>>   15671277          -100.0%       0.00        turbostat.C6
>>>>>      63.38           -63.4        0.00        turbostat.C6%
>>>>>      49.46          -100.0%       0.00        turbostat.CPU%c1
>>>>>       1.42 ±  2%    -100.0%       0.00        turbostat.CPU%c3
>>>>>      27.73          -100.0%       0.00        turbostat.CPU%c6
>>>>>      31.41          -100.0%       0.00        turbostat.CorWatt
>>>>>      63.25          -100.0%       0.00        turbostat.CoreTmp
>>>>>   18919351          -100.0%       0.00        turbostat.IRQ
>>>>>       1.21 ± 18%    -100.0%       0.00        turbostat.Pkg%pc2
>>>>>       0.67 ± 31%    -100.0%       0.00        turbostat.Pkg%pc6
>>>>>      63.25          -100.0%       0.00        turbostat.PkgTmp
>>>>>      57.63          -100.0%       0.00        turbostat.PkgWatt
>>>>>      30.73          -100.0%       0.00        turbostat.RAMWatt
>>>>>      36030          -100.0%       0.00        turbostat.SMI
>>>>>       3000          -100.0%       0.00        turbostat.TSC_MHz
>>>>>
>>>>>                                                                                 
>>>>>                                   aim7.jobs-per-min                             
>>>>>                                                                                 
>>>>>   41000 +-+-----------------------------------------------------------------+   
>>>>>         |  ..+....+....  ..+....+....+....   ..+....+....+...+....          |   
>>>>>   40000 +-+            +.                 +..                     +         |   
>>>>>   39000 +-+                                                                 |   
>>>>>         |                                                                   |   
>>>>>   38000 +-+                                                                 |   
>>>>>   37000 +-+                                                                 |   
>>>>>         |                                                                   |   
>>>>>   36000 +-+                                                                 |   
>>>>>   35000 +-+                                                                 |   
>>>>>         |                                                                   |   
>>>>>   34000 +-+                                                                 |   
>>>>>   33000 +-+                                         O                  O    |   
>>>>>         O    O    O    O   O    O    O    O    O         O   O    O         O   
>>>>>   32000 +-+-----------------------------------------------------------------+   
>>>>>                                                                                 
>>>>>                                                                                                                                                                                                                                         
>>>>>                                                                                 
>>>>> [*] bisect-good sample
>>>>> [O] bisect-bad  sample
>>>>>
>>>>>
>>>>>
>>>>> Disclaimer:
>>>>> Results have been estimated based on internal Intel analysis and are provided
>>>>> for informational purposes only. Any difference in system hardware or software
>>>>> design or configuration may affect actual performance.
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Xiaolong
>>>>
>>>> -- 
>>>> Jeff Layton <jlayton@...hat.com>
>>
>>
> 

View attachment "compare_result.txt" of type "text/plain" (22589 bytes)

Download attachment "3da90b159b146672f830bcd2489dd3a1f4e9e089_dmesg.xz" of type "application/x-xz" (22044 bytes)

Download attachment "3da90b159b146672f830bcd2489dd3a1f4e9e089_perf-profile.gz" of type "application/gzip" (13635 bytes)

Download attachment "c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e_dmesg.xz" of type "application/x-xz" (22096 bytes)

Download attachment "c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e_perf-profile.gz" of type "application/gzip" (12608 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ