lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160602160706.GA24004@cmpxchg.org>
Date:	Thu, 2 Jun 2016 12:07:06 -0400
From:	Johannes Weiner <hannes@...xchg.org>
To:	kernel test robot <xiaolong.ye@...el.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mel Gorman <mgorman@...e.de>, Rik van Riel <riel@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Joonsoo Kim <iamjoonsoo.kim@....com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, lkp@...org
Subject: Re: [lkp] [mm] 795ae7a0de: pixz.throughput -9.1% regression

Hi,

On Thu, Jun 02, 2016 at 02:45:07PM +0800, kernel test robot wrote:
> FYI, we noticed pixz.throughput -9.1% regression due to commit:
> 
> commit 795ae7a0de6b834a0cc202aa55c190ef81496665 ("mm: scale kswapd watermarks in proportion to memory")
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> in testcase: pixz
> on test machine: ivb43: 48 threads Ivytown Ivy Bridge-EP with 64G memory with following parameters: cpufreq_governor=performance/nr_threads=100%

Xiaolong, thanks for the report.

It looks like the regression stems from a change in NUMA placement:

> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55
> ---------------- -------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>   78505362 ±  0%      -9.1%   71324131 ±  0%  pixz.throughput
>       4530 ±  0%      +1.0%       4575 ±  0%  pixz.time.percent_of_cpu_this_job_got
>      14911 ±  0%      +2.3%      15251 ±  0%  pixz.time.user_time
>    6586930 ±  0%      -7.5%    6093751 ±  1%  pixz.time.voluntary_context_switches
>      49869 ±  1%      -9.0%      45401 ±  0%  vmstat.system.cs
>      26406 ±  4%      -9.4%      23922 ±  5%  numa-meminfo.node0.SReclaimable
>       4803 ± 85%     -87.0%     625.25 ± 16%  numa-meminfo.node1.Inactive(anon)
>     946.75 ±  3%    +775.4%       8288 ±  1%  proc-vmstat.nr_alloc_batch
>    2403080 ±  2%     -58.4%     999765 ±  0%  proc-vmstat.pgalloc_dma32

a bit clearer in the will-it-scale report:

> 3ed3a4f0ddffece9 795ae7a0de6b834a0cc202aa55 
> ---------------- -------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>     442409 ±  0%      -8.5%     404670 ±  0%  will-it-scale.per_process_ops
>     397397 ±  0%      -6.2%     372741 ±  0%  will-it-scale.per_thread_ops
>       0.11 ±  1%     -15.1%       0.10 ±  0%  will-it-scale.scalability
>       9933 ± 10%     +17.8%      11696 ±  4%  will-it-scale.time.involuntary_context_switches
>    5158470 ±  3%      +5.4%    5438873 ±  0%  will-it-scale.time.maximum_resident_set_size
>   10701739 ±  0%     -11.6%    9456315 ±  0%  will-it-scale.time.minor_page_faults
>     825.00 ±  0%      +7.8%     889.75 ±  0%  will-it-scale.time.percent_of_cpu_this_job_got
>       2484 ±  0%      +7.8%       2678 ±  0%  will-it-scale.time.system_time
>      81.98 ±  0%      +8.7%      89.08 ±  0%  will-it-scale.time.user_time
>     848972 ±  1%     -13.3%     735967 ±  0%  will-it-scale.time.voluntary_context_switches
>   19395253 ±  0%     -20.0%   15511908 ±  0%  numa-numastat.node0.local_node
>   19400671 ±  0%     -20.0%   15518877 ±  0%  numa-numastat.node0.numa_hit

The way this test is set up (in-memory compression on 48 nodes) I'm
surprised we spill over, though, even with the higher watermarks.

Xiaolong, could you provide the full /proc/zoneinfo of that machine
right before the test is running? I wonder if it's mostly filled with
cache, and the increase in watermarks causes a higher portion of the
anon allocs and frees to spill to the remote node, but never enough to
enter the allocator slowpath and waking kswapd to fix it.

Another suspect is the fair zone allocator, whose allocation batches
increased as well. It shouldn't affect NUMA placement, but I wonder if
there is a bug in there that causes false spilling to foreign nodes
that was only bounded by the allocation batch of the foreign zone.
Mel, does such a symptom sound familiar in any way?

I'll continue to investigate.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ