linux-kernel - Re: [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160603022531.GB3271@yexl-desktop>
Date:	Fri, 3 Jun 2016 10:25:31 +0800
From:	Ye Xiaolong <xiaolong.ye@...el.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

On Thu, Jun 02, 2016 at 12:07:06PM -0400, Johannes Weiner wrote:
>Hi,
>
>On Thu, Jun 02, 2016 at 02:45:07PM +0800, kernel test robot wrote:
>> FYI, we noticed pixz.throughput -9.1% regression due to commit:
>> 
>> commit 795ae7a0de6b834a0cc202aa55c190ef81496665 ("mm: scale kswapd watermarks in proportion to memory")
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> 
>> in testcase: pixz
>> on test machine: ivb43: 48 threads Ivytown Ivy Bridge-EP with 64G memory with following parameters: cpufreq_governor=performance/nr_threads=100%
>
>Xiaolong, thanks for the report.
>
>It looks like the regression stems from a change in NUMA placement:
>
>> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55
>> ---------------- -------------------------- 
>>          %stddev     %change         %stddev
>>              \          |                \  
>>   78505362 ±  0%      -9.1%   71324131 ±  0%  pixz.throughput
>>       4530 ±  0%      +1.0%       4575 ±  0%  pixz.time.percent_of_cpu_this_job_got
>>      14911 ±  0%      +2.3%      15251 ±  0%  pixz.time.user_time
>>    6586930 ±  0%      -7.5%    6093751 ±  1%  pixz.time.voluntary_context_switches
>>      49869 ±  1%      -9.0%      45401 ±  0%  vmstat.system.cs
>>      26406 ±  4%      -9.4%      23922 ±  5%  numa-meminfo.node0.SReclaimable
>>       4803 ± 85%     -87.0%     625.25 ± 16%  numa-meminfo.node1.Inactive(anon)
>>     946.75 ±  3%    +775.4%       8288 ±  1%  proc-vmstat.nr_alloc_batch
>>    2403080 ±  2%     -58.4%     999765 ±  0%  proc-vmstat.pgalloc_dma32
>
>a bit clearer in the will-it-scale report:
>
>> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55 
>> ---------------- -------------------------- 
>>          %stddev     %change         %stddev
>>              \          |                \  
>>     442409 ±  0%      -8.5%     404670 ±  0%  will-it-scale.per_process_ops
>>     397397 ±  0%      -6.2%     372741 ±  0%  will-it-scale.per_thread_ops
>>       0.11 ±  1%     -15.1%       0.10 ±  0%  will-it-scale.scalability
>>       9933 ± 10%     +17.8%      11696 ±  4%  will-it-scale.time.involuntary_context_switches
>>    5158470 ±  3%      +5.4%    5438873 ±  0%  will-it-scale.time.maximum_resident_set_size
>>   10701739 ±  0%     -11.6%    9456315 ±  0%  will-it-scale.time.minor_page_faults
>>     825.00 ±  0%      +7.8%     889.75 ±  0%  will-it-scale.time.percent_of_cpu_this_job_got
>>       2484 ±  0%      +7.8%       2678 ±  0%  will-it-scale.time.system_time
>>      81.98 ±  0%      +8.7%      89.08 ±  0%  will-it-scale.time.user_time
>>     848972 ±  1%     -13.3%     735967 ±  0%  will-it-scale.time.voluntary_context_switches
>>   19395253 ±  0%     -20.0%   15511908 ±  0%  numa-numastat.node0.local_node
>>   19400671 ±  0%     -20.0%   15518877 ±  0%  numa-numastat.node0.numa_hit
>
>The way this test is set up (in-memory compression on 48 nodes) I'm
>surprised we spill over, though, even with the higher watermarks.
>
>Xiaolong, could you provide the full /proc/zoneinfo of that machine
>right before the test is running? I wonder if it's mostly filled with

Hi, Johannes  

Please refer to attached proc-zoneinfo file which was obtained by cat /proc/zoneinfo
on test machine ivb43 right before the test (pigxz) was launched.

Thanks,
Xiaolong

>cache, and the increase in watermarks causes a higher portion of the
>anon allocs and frees to spill to the remote node, but never enough to
>enter the allocator slowpath and waking kswapd to fix it.
>
>Another suspect is the fair zone allocator, whose allocation batches
>increased as well. It shouldn't affect NUMA placement, but I wonder if
>there is a bug in there that causes false spilling to foreign nodes
>that was only bounded by the allocation batch of the foreign zone.
>Mel, does such a symptom sound familiar in any way?
>
>I'll continue to investigate.

View attachment "proc-zoneinfo" of type "text/plain" (25977 bytes)