lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 12 Dec 2017 09:11:26 +0100
From:   Michal Hocko <>
To:     kemi <>
Cc:     Greg Kroah-Hartman <>,
        Andrew Morton <>,
        Vlastimil Babka <>,
        Mel Gorman <>,
        Johannes Weiner <>,
        Christopher Lameter <>,
        Andrey Ryabinin <>,
        Nikolay Borisov <>,
        Pavel Tatashin <>,
        David Rientjes <>,
        Sebastian Andrzej Siewior <>,
        Dave <>,
        Andi Kleen <>,
        Tim Chen <>,
        Jesper Dangaard Brouer <>,
        Ying Huang <>,
        Aaron Lu <>, Aubrey Li <>,
        Linux MM <>,
        Linux Kernel <>
Subject: Re: [PATCH 1/2] mm: NUMA stats code cleanup and enhancement

On Tue 12-12-17 10:05:26, kemi wrote:
> On 2017年12月08日 16:47, Michal Hocko wrote:
> > On Fri 08-12-17 16:38:46, kemi wrote:
> >>
> >>
> >> On 2017年11月30日 17:45, Michal Hocko wrote:
> >>> On Thu 30-11-17 17:32:08, kemi wrote:
> >>
> >> After thinking about how to optimize our per-node stats more gracefully, 
> >> we may add u64 vm_numa_stat_diff[] in struct per_cpu_nodestat, thus,
> >> we can keep everything in per cpu counter and sum them up when read /proc
> >> or /sys for numa stats. 
> >> What's your idea for that? thanks
> > 
> > I would like to see a strong argument why we cannot make it a _standard_
> > node counter.
> > 
> all right. 
> This issue is first reported and discussed in 2017 MM summit, referred to
> the topic "Provoking and fixing memory bottlenecks -Focused on the page 
> allocator presentation" presented by Jesper.
> 2017-JesperBrouer.pdf (slide 15/16)
> As you know, page allocator is too slow and has becomes a bottleneck
> in high-speed network.
> Jesper also showed some data in that presentation: with micro benchmark 
> stresses order-0 fast path(per CPU pages), *32%* extra CPU cycles cost 
> (143->97) comes from CONFIG_NUMA. 
> When I took a look at this issue, I reproduced this issue and got a
> similar result to Jesper's. Furthermore, with the help from Jesper, 
> the overhead is root caused and the real cause of this overhead comes
> from an extra level of function calls such as zone_statistics() (*10%*,
> nearly 1/3, including __inc_numa_state), policy_zonelist, get_task_policy(),
> policy_nodemask and etc (perf profiling cpu cycles).  zone_statistics() 
> is the biggest one introduced by CONFIG_NUMA in fast path that we can 
> do something for optimizing page allocator. Plus, the overhead of 
> zone_statistics() significantly increase with more and more cpu 
> cores and nodes due to cache bouncing.
> Therefore, we submitted a patch before to mitigate the overhead of 
> zone_statistics() by reducing global NUMA counter update frequency 
> (enlarge threshold size, as suggested by Dave Hansen). I also would
> like to have an implementation of a "_standard_node counter" for NUMA
> stats, but I wonder how we can keep the performance gain at the
> same time.

I understand all that. But we do have a way to put all that overhead
away by disabling the stats altogether. I presume that CPU cycle
sensitive workloads would simply use that option because the stats are
quite limited in their usefulness anyway IMHO. So we are back to: Do
normal workloads care all that much to have 3rd way to account for
events? I haven't heard a sound argument for that.

Michal Hocko

Powered by blists - more mailing lists