linux-kernel - Re: [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160602160706.GA24004@cmpxchg.org>
Date:	Thu, 2 Jun 2016 12:07:06 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	kernel test robot <xiaolong.ye@...el.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

Hi,

On Thu, Jun 02, 2016 at 02:45:07PM +0800, kernel test robot wrote:
> FYI, we noticed pixz.throughput -9.1% regression due to commit:
> 
> commit 795ae7a0de6b834a0cc202aa55c190ef81496665 ("mm: scale kswapd watermarks in proportion to memory")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: pixz
> on test machine: ivb43: 48 threads Ivytown Ivy Bridge-EP with 64G memory with following parameters: cpufreq_governor=performance/nr_threads=100%

Xiaolong, thanks for the report.

It looks like the regression stems from a change in NUMA placement:

> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55
> ---------------- -------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>   78505362 ±  0%      -9.1%   71324131 ±  0%  pixz.throughput
>       4530 ±  0%      +1.0%       4575 ±  0%  pixz.time.percent_of_cpu_this_job_got
>      14911 ±  0%      +2.3%      15251 ±  0%  pixz.time.user_time
>    6586930 ±  0%      -7.5%    6093751 ±  1%  pixz.time.voluntary_context_switches
>      49869 ±  1%      -9.0%      45401 ±  0%  vmstat.system.cs
>      26406 ±  4%      -9.4%      23922 ±  5%  numa-meminfo.node0.SReclaimable
>       4803 ± 85%     -87.0%     625.25 ± 16%  numa-meminfo.node1.Inactive(anon)
>     946.75 ±  3%    +775.4%       8288 ±  1%  proc-vmstat.nr_alloc_batch
>    2403080 ±  2%     -58.4%     999765 ±  0%  proc-vmstat.pgalloc_dma32

a bit clearer in the will-it-scale report:

> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55 
> ---------------- -------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>     442409 ±  0%      -8.5%     404670 ±  0%  will-it-scale.per_process_ops
>     397397 ±  0%      -6.2%     372741 ±  0%  will-it-scale.per_thread_ops
>       0.11 ±  1%     -15.1%       0.10 ±  0%  will-it-scale.scalability
>       9933 ± 10%     +17.8%      11696 ±  4%  will-it-scale.time.involuntary_context_switches
>    5158470 ±  3%      +5.4%    5438873 ±  0%  will-it-scale.time.maximum_resident_set_size
>   10701739 ±  0%     -11.6%    9456315 ±  0%  will-it-scale.time.minor_page_faults
>     825.00 ±  0%      +7.8%     889.75 ±  0%  will-it-scale.time.percent_of_cpu_this_job_got
>       2484 ±  0%      +7.8%       2678 ±  0%  will-it-scale.time.system_time
>      81.98 ±  0%      +8.7%      89.08 ±  0%  will-it-scale.time.user_time
>     848972 ±  1%     -13.3%     735967 ±  0%  will-it-scale.time.voluntary_context_switches
>   19395253 ±  0%     -20.0%   15511908 ±  0%  numa-numastat.node0.local_node
>   19400671 ±  0%     -20.0%   15518877 ±  0%  numa-numastat.node0.numa_hit

The way this test is set up (in-memory compression on 48 nodes) I'm
surprised we spill over, though, even with the higher watermarks.

Xiaolong, could you provide the full /proc/zoneinfo of that machine
right before the test is running? I wonder if it's mostly filled with
cache, and the increase in watermarks causes a higher portion of the
anon allocs and frees to spill to the remote node, but never enough to
enter the allocator slowpath and waking kswapd to fix it.

Another suspect is the fair zone allocator, whose allocation batches
increased as well. It shouldn't affect NUMA placement, but I wonder if
there is a bug in there that causes false spilling to foreign nodes
that was only bounded by the allocation batch of the foreign zone.
Mel, does such a symptom sound familiar in any way?

I'll continue to investigate.