lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1519645434.4443.15.camel@redhat.com>
Date:   Mon, 26 Feb 2018 06:43:54 -0500
From:   Jeff Layton <jlayton@...hat.com>
To:     Ye Xiaolong <xiaolong.ye@...el.com>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp-robot] [iversion]  c0cef30e4f:  aim7.jobs-per-min -18.0%
 regression

On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> On 02/25, Jeff Layton wrote:
> > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > Greeting,
> > > 
> > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
> > > 
> > > 
> > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > in testcase: aim7
> > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> > > with following parameters:
> > > 
> > > 	disk: 4BRD_12G
> > > 	md: RAID0
> > > 	fs: xfs
> > > 	test: disk_src
> > > 	load: 3000
> > > 	cpufreq_governor: performance
> > > 
> > > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > > test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > 
> > > 
> > 
> > I'm a bit suspicious of this result.
> > 
> > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > inode_eq_iversion{+raw}), and that neither should ever be called from
> > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > performance hit.
> 
> I tried to queue 4 more times test for both commit c0cef30e4f and its parent,
> the result seems quite stable.
> 
> c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
>  "aim7.jobs-per-min": [
>     32964.01,
>     32938.68,
>     33068.18,
>     32886.32,
>     32843.72,
>     32798.83,
>     32898.34,
>     32952.55
>   ],
> 
> 3da90b159b146672f830bcd2489dd3a1f4e9e089:
>   "aim7.jobs-per-min": [
>     40239.65,
>     40163.33,
>     40353.32,
>     39976.9,
>     40185.75,
>     40411.3,
>     40213.58,
>     39900.69
>   ],
> 
> Any other test data you may need?
> 
> > 
> > Is IMA involved here at all? I didn't see any evidence of it, but the
> > kernel config did have it enabled.
> > 
> 
> Sorry, not quite familiar with IMA, could you tell more about how to check it?
> 

Thanks for retesting it, but I'm at a loss for why we're seeing this:

IMA is the the integrity management subsystem. It will use the iversion
field to determine whether to remeasure files during remeasurement.  It
looks like the kernel config has it enabled, but it doesn't look like
it's in use, based on the info in the initial report.

This patch only affects two inlined functions inode_cmp_iversion and
inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
out). These functions are only called from IMA and fs-specific code
(usually in readdir implementations to detect directory changes).

XFS does not call either of these functions however, so I'm a little
unclear on how this patch could slow anything down on this test. The
only thing I can think to do here would be to profile this and see what
stands out.

Note that we do need to keep this in perspective too. This 18%
regression on this test follows around a ~230% improvement that occurred
when we merged the bulk of these patches. It's should still be quite a
bit faster than the v4.15 in this regard.

Still, it'd be good to understand what's going on here.


> Thanks,
> Xiaolong
> > 
> > > 
> > > Details are as below:
> > > -------------------------------------------------------------------------------------------------->
> > > 
> > > 
> > > To reproduce:
> > > 
> > >         git clone https://github.com/intel/lkp-tests.git
> > >         cd lkp-tests
> > >         bin/lkp install job.yaml  # job file is attached in this email
> > >         bin/lkp run     job.yaml
> > > 
> > > =========================================================================================
> > > compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
> > >   gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/3000/RAID0/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/disk_src/aim7
> > > 
> > > commit: 
> > >   3da90b159b (" f2fs-for-4.16-rc1")
> > >   c0cef30e4f ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
> > > 
> > > 3da90b159b146672 c0cef30e4ff0dc025f4a1660b8 
> > > ---------------- -------------------------- 
> > >          %stddev     %change         %stddev
> > >              \          |                \  
> > >      40183           -18.0%      32964        aim7.jobs-per-min
> > >     448.60           +21.9%     546.68        aim7.time.elapsed_time
> > >     448.60           +21.9%     546.68        aim7.time.elapsed_time.max
> > >       5615 ±  5%     +33.4%       7489 ±  4%  aim7.time.involuntary_context_switches
> > >       3086           +14.0%       3518        aim7.time.system_time
> > >   19439782            -5.6%   18359474        aim7.time.voluntary_context_switches
> > >     199333           +14.3%     227794 ±  2%  interrupts.CAL:Function_call_interrupts
> > >       0.59            -0.1        0.50        mpstat.cpu.usr%
> > >    2839401           +16.0%    3293688        softirqs.SCHED
> > >    7600068           +15.1%    8747820        softirqs.TIMER
> > >     118.00 ± 43%     +98.7%     234.50 ± 15%  vmstat.io.bo
> > >      87840           -22.4%      68154        vmstat.system.cs
> > >     552798 ±  6%     +15.8%     640107 ±  4%  numa-numastat.node0.local_node
> > >     557345 ±  6%     +15.7%     644666 ±  4%  numa-numastat.node0.numa_hit
> > >     528341 ±  7%     +21.7%     642933 ±  4%  numa-numastat.node1.local_node
> > >     531604 ±  7%     +21.6%     646209 ±  4%  numa-numastat.node1.numa_hit
> > >  2.147e+09           -12.4%   1.88e+09        cpuidle.C1.time
> > >   13702041           -14.7%   11683737        cpuidle.C1.usage
> > >  2.082e+08 ±  4%     +28.1%  2.667e+08 ±  5%  cpuidle.C1E.time
> > >  4.719e+08 ±  2%     +23.1%  5.807e+08 ±  4%  cpuidle.C3.time
> > >  1.141e+10           +31.0%  1.496e+10        cpuidle.C6.time
> > >   15672622           +27.8%   20031028        cpuidle.C6.usage
> > >   13520572 ±  3%     +29.5%   17514398 ±  9%  cpuidle.POLL.time
> > >     278.25 ±  5%     -46.0%     150.25 ± 73%  numa-vmstat.node0.nr_dirtied
> > >       3200 ± 14%     -20.6%       2542 ± 19%  numa-vmstat.node0.nr_mapped
> > >     277.75 ±  5%     -46.2%     149.50 ± 73%  numa-vmstat.node0.nr_written
> > >      28.50 ± 52%    +448.2%     156.25 ± 70%  numa-vmstat.node1.nr_dirtied
> > >       2577 ± 19%     +26.3%       3255 ± 15%  numa-vmstat.node1.nr_mapped
> > >     634338 ±  4%      +7.8%     683959 ±  4%  numa-vmstat.node1.numa_hit
> > >     457411 ±  6%     +10.8%     506800 ±  5%  numa-vmstat.node1.numa_local
> > >       3734 ±  8%     -11.5%       3306 ±  6%  proc-vmstat.numa_hint_faults_local
> > >    1114538           +18.3%    1318978        proc-vmstat.numa_hit
> > >    1106722           +18.5%    1311136        proc-vmstat.numa_local
> > >      22100            +7.5%      23753 ±  4%  proc-vmstat.numa_pages_migrated
> > >    1174556           +18.0%    1386359        proc-vmstat.pgalloc_normal
> > >    1241445           +18.1%    1466086        proc-vmstat.pgfault
> > >    1138310           +19.3%    1358132        proc-vmstat.pgfree
> > >      22100            +7.5%      23753 ±  4%  proc-vmstat.pgmigrate_success
> > >      53332 ± 43%    +143.0%     129617 ± 14%  proc-vmstat.pgpgout
> > >       1.42 ±  2%      +1.7        3.07        perf-stat.branch-miss-rate%
> > >  1.064e+10          +123.3%  2.375e+10        perf-stat.branch-misses
> > >      10.79            +0.6       11.43        perf-stat.cache-miss-rate%
> > >  5.583e+09            +5.9%  5.915e+09        perf-stat.cache-misses
> > >   39652092            -5.0%   37662545        perf-stat.context-switches
> > >       1.29           +11.7%       1.44        perf-stat.cpi
> > >  4.637e+12           +12.8%   5.23e+12        perf-stat.cpu-cycles
> > >  8.653e+11            +9.8%  9.498e+11 ±  2%  perf-stat.dTLB-loads
> > >  3.654e+11           +12.4%  4.109e+11        perf-stat.dTLB-stores
> > >       0.78           -10.5%       0.70        perf-stat.ipc
> > >    1214932           +17.9%    1432266        perf-stat.minor-faults
> > >  1.334e+09            -1.8%   1.31e+09        perf-stat.node-store-misses
> > >  1.651e+09            -1.8%   1.62e+09        perf-stat.node-stores
> > >    1214954           +17.9%    1432313        perf-stat.page-faults
> > >     256.75          -100.0%       0.00        turbostat.Avg_MHz
> > >      21.39           -21.4        0.00        turbostat.Busy%
> > >       1200          -100.0%       0.00        turbostat.Bzy_MHz
> > >   13695007          -100.0%       0.00        turbostat.C1
> > >      11.92           -11.9        0.00        turbostat.C1%
> > >    2116683 ±  2%    -100.0%       0.00        turbostat.C1E
> > >       1.16 ±  4%      -1.2        0.00        turbostat.C1E%
> > >    3112269          -100.0%       0.00        turbostat.C3
> > >       2.62 ±  2%      -2.6        0.00        turbostat.C3%
> > >   15671277          -100.0%       0.00        turbostat.C6
> > >      63.38           -63.4        0.00        turbostat.C6%
> > >      49.46          -100.0%       0.00        turbostat.CPU%c1
> > >       1.42 ±  2%    -100.0%       0.00        turbostat.CPU%c3
> > >      27.73          -100.0%       0.00        turbostat.CPU%c6
> > >      31.41          -100.0%       0.00        turbostat.CorWatt
> > >      63.25          -100.0%       0.00        turbostat.CoreTmp
> > >   18919351          -100.0%       0.00        turbostat.IRQ
> > >       1.21 ± 18%    -100.0%       0.00        turbostat.Pkg%pc2
> > >       0.67 ± 31%    -100.0%       0.00        turbostat.Pkg%pc6
> > >      63.25          -100.0%       0.00        turbostat.PkgTmp
> > >      57.63          -100.0%       0.00        turbostat.PkgWatt
> > >      30.73          -100.0%       0.00        turbostat.RAMWatt
> > >      36030          -100.0%       0.00        turbostat.SMI
> > >       3000          -100.0%       0.00        turbostat.TSC_MHz
> > > 
> > >                                                                                 
> > >                                   aim7.jobs-per-min                             
> > >                                                                                 
> > >   41000 +-+-----------------------------------------------------------------+   
> > >         |  ..+....+....  ..+....+....+....   ..+....+....+...+....          |   
> > >   40000 +-+            +.                 +..                     +         |   
> > >   39000 +-+                                                                 |   
> > >         |                                                                   |   
> > >   38000 +-+                                                                 |   
> > >   37000 +-+                                                                 |   
> > >         |                                                                   |   
> > >   36000 +-+                                                                 |   
> > >   35000 +-+                                                                 |   
> > >         |                                                                   |   
> > >   34000 +-+                                                                 |   
> > >   33000 +-+                                         O                  O    |   
> > >         O    O    O    O   O    O    O    O    O         O   O    O         O   
> > >   32000 +-+-----------------------------------------------------------------+   
> > >                                                                                 
> > >                                                                                                                                                                                                                                         
> > >                                                                                 
> > > [*] bisect-good sample
> > > [O] bisect-bad  sample
> > > 
> > > 
> > > 
> > > Disclaimer:
> > > Results have been estimated based on internal Intel analysis and are provided
> > > for informational purposes only. Any difference in system hardware or software
> > > design or configuration may affect actual performance.
> > > 
> > > 
> > > Thanks,
> > > Xiaolong
> > 
> > -- 
> > Jeff Layton <jlayton@...hat.com>

-- 
Jeff Layton <jlayton@...hat.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ