linux-kernel - Re: [lkp-robot] [mm, vmscan] 5e56dfbd83: fsmark.files_per

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20170223151908.z7disw2es7jlnf7b@suse.de>
Date:   Thu, 23 Feb 2017 15:19:08 +0000
From:   Mel Gorman <mgorman@...e.de>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Ye Xiaolong <xiaolong.ye@...el.com>,
        Stephen Rothwell <sfr@...b.auug.org.au>,
        Minchan Kim <minchan@...nel.org>,
        Hillf Danton <hillf.zj@...baba-inc.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp-robot] [mm, vmscan]  5e56dfbd83:  fsmark.files_per_sec
 -11.1% regression

On Thu, Feb 23, 2017 at 08:35:45AM +0100, Michal Hocko wrote:
> >      57.60 ±  0%      -11.1%      51.20 ±  0%  fsmark.files_per_sec
> >     607.84 ±  0%       +9.0%     662.24 ±  1%  fsmark.time.elapsed_time
> >     607.84 ±  0%       +9.0%     662.24 ±  1%  fsmark.time.elapsed_time.max
> >      14317 ±  6%      -12.2%      12568 ±  7%  fsmark.time.involuntary_context_switches
> >       1864 ±  0%       +0.5%       1873 ±  0%  fsmark.time.maximum_resident_set_size
> >      12425 ±  0%      +23.3%      15320 ±  3%  fsmark.time.minor_page_faults
> >      33.00 ±  3%      -33.9%      21.80 ±  1%  fsmark.time.percent_of_cpu_this_job_got
> >     203.49 ±  3%      -28.1%     146.31 ±  1%  fsmark.time.system_time
> >     605701 ±  0%       +3.6%     627486 ±  0%  fsmark.time.voluntary_context_switches
> >     307106 ±  2%      +20.2%     368992 ±  9%  interrupts.CAL:Function_call_interrupts
> >     183040 ±  0%      +23.2%     225559 ±  3%  softirqs.BLOCK
> >      12203 ± 57%     +236.4%      41056 ±103%  softirqs.NET_RX
> >     186118 ±  0%      +21.9%     226922 ±  2%  softirqs.TASKLET
> >      14317 ±  6%      -12.2%      12568 ±  7%  time.involuntary_context_switches
> >      12425 ±  0%      +23.3%      15320 ±  3%  time.minor_page_faults
> >      33.00 ±  3%      -33.9%      21.80 ±  1%  time.percent_of_cpu_this_job_got
> >     203.49 ±  3%      -28.1%     146.31 ±  1%  time.system_time
> >       3.47 ±  3%      -13.0%       3.02 ±  1%  turbostat.%Busy
> >      99.60 ±  1%       -9.6%      90.00 ±  1%  turbostat.Avg_MHz
> >      78.69 ±  1%       +1.7%      80.01 ±  0%  turbostat.CorWatt
> >       3.56 ± 61%      -91.7%       0.30 ± 76%  turbostat.Pkg%pc2
> >     207790 ±  0%       -8.2%     190654 ±  1%  vmstat.io.bo
> >   30667691 ±  0%      +65.9%   50890669 ±  1%  vmstat.memory.cache
> >   34549892 ±  0%      -58.4%   14378939 ±  4%  vmstat.memory.free
> >       6768 ±  0%       -1.3%       6681 ±  1%  vmstat.system.cs
> >  1.089e+10 ±  2%      +13.4%  1.236e+10 ±  3%  cpuidle.C1E-IVT.time
> >   11475304 ±  2%      +13.4%   13007849 ±  3%  cpuidle.C1E-IVT.usage
> >    2.7e+09 ±  6%      +13.2%  3.057e+09 ±  3%  cpuidle.C3-IVT.time
> >    2954294 ±  6%      +14.3%    3375966 ±  3%  cpuidle.C3-IVT.usage
> >   96963295 ± 14%      +17.5%  1.139e+08 ± 12%  cpuidle.POLL.time
> >       8761 ±  7%      +17.6%      10299 ±  9%  cpuidle.POLL.usage
> >   30454483 ±  0%      +66.4%   50666102 ±  1%  meminfo.Cached
> > 
> > Do you see what's happening?
> 
> not really. All I could see in the previous data was that the memory
> locality was different (and better) with my patch, which I cannot
> explain either because get_scan_count is always per-node thing. Moreover
> the change shouldn't make any difference for normal GFP_KERNEL requests
> on 64b systems because the reclaim index covers all zones so there is
> nothing to skip over.
> 
> > Or is there anything we can do to improve fsmark benchmark setup to
> > make it more reasonable?
> 
> Unfortunatelly I am not an expert on this benchmark. Maybe Mel knows
> better.

There is not much to be an expert on with that benchmark. It creates a
bunch of files of the requested size for a number of iterations. In async
configurations, it can be heavily skewed by the first few iterations until
dirty limits are hit. Once that point is reached, the files/sec drops
rapidly to some value below the writing speed of the underlying device.
Hence, looking at the average performance of it is risky and very sensitive
to exact timing unless this is properly accounted for.

In async configurations, stalls are dominated by balance_dirty_pages
and some filesystem details such as whether it needs to wait for space
in a transaction log. That also limits the overall performance of the
workload. Once the stable phase is reached, there still will be quite some
variability due to the timing of the writeback threads that cause a bit
of jitter as well as the usual concerns with multiple threads writing to
different parts of the disk.

When NUMA is taken into account, it is important to consider the size of
the NUMA nodes as assymetric sizes will affect when remote memory is used
and to a lesser extent when balance_dirty_pages is triggered.

The benchmark is what it is. You can force it to generate stable figures
but it won't have the same behaviour so it all depends on how you define
"reasonable".

At the very minimum, take into account that an average of multiple iterations
will be skewed early in the workloads lifetime by the fact it hasn't hit
dirty limits yet.

-- 
Mel Gorman
SUSE Labs