lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 18 Sep 2017 18:33:20 +0300 From: Tariq Toukan <tariqt@...lanox.com> To: Aaron Lu <aaron.lu@...el.com>, Tariq Toukan <tariqt@...lanox.com> Cc: Jesper Dangaard Brouer <brouer@...hat.com>, David Miller <davem@...emloft.net>, Mel Gorman <mgorman@...hsingularity.net>, Eric Dumazet <eric.dumazet@...il.com>, Alexei Starovoitov <ast@...com>, Saeed Mahameed <saeedm@...lanox.com>, Eran Ben Elisha <eranbe@...lanox.com>, Linux Kernel Network Developers <netdev@...r.kernel.org>, Andrew Morton <akpm@...ux-foundation.org>, Michal Hocko <mhocko@...e.com>, linux-mm <linux-mm@...ck.org>, Dave Hansen <dave.hansen@...el.com> Subject: Re: Page allocator bottleneck On 18/09/2017 10:44 AM, Aaron Lu wrote: > On Mon, Sep 18, 2017 at 03:34:47PM +0800, Aaron Lu wrote: >> On Sun, Sep 17, 2017 at 07:16:15PM +0300, Tariq Toukan wrote: >>> >>> It's nice to have the option to dynamically play with the parameter. >>> But maybe we should also think of changing the default fraction guaranteed >>> to the PCP, so that unaware admins of networking servers would also benefit. >> >> I collected some performance data with will-it-scale/page_fault1 process >> mode on different machines with different pcp->batch sizes, starting >> from the default 31(calculated by zone_batchsize(), 31 is the standard >> value for any zone that has more than 1/2MiB memory), then incremented >> by 31 upwards till 527. PCP's upper limit is 6*batch. >> >> An image is plotted and attached: batch_full.png(full here means the >> number of process started equals to CPU number). > > To be clear: X-axis is the value of batch size(31, 62, 93, ..., 527), > Y-axis is the value of per_process_ops, generated by will-it-scale, > higher is better. > >> >> From the image: >> - For EX machines, they all see throughput increase with increased batch >> size and peaked at around batch_size=310, then fall; >> - For EP machines, Haswell-EP and Broadwell-EP also see throughput >> increase with increased batch size and peaked at batch_size=279, then >> fall, batch_size=310 also delivers pretty good result. Skylake-EP is >> quite different in that it doesn't see any obvious throughput increase >> after batch_size=93, though the trend is still increasing, but in a very >> small way and finally peaked at batch_size=403, then fall. >> Ivybridge EP behaves much like desktop ones. >> - For Desktop machines, they do not see any obvious changes with >> increased batch_size. >> >> So the default batch size(31) doesn't deliver good enough result, we >> probbaly should change the default value. Thanks Aaron for sharing your experiment results. That's a good analysis of the effect of the batch value. I agree with your conclusion. From networking perspective, we should reconsider the defaults to be able to reach the increasing NICs linerates. Not only for pcp->batch, but also for pcp->high. Regards, Tariq
Powered by blists - more mailing lists