[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20171219162029.GD2787@dhcp22.suse.cz>
Date: Tue, 19 Dec 2017 17:20:29 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Christopher Lameter <cl@...ux.com>
Cc: Kemi Wang <kemi.wang@...el.com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>,
Mel Gorman <mgorman@...hsingularity.net>,
Johannes Weiner <hannes@...xchg.org>,
YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
Andrey Ryabinin <aryabinin@...tuozzo.com>,
Nikolay Borisov <nborisov@...e.com>,
Pavel Tatashin <pasha.tatashin@...cle.com>,
David Rientjes <rientjes@...gle.com>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
Dave <dave.hansen@...ux.intel.com>,
Andi Kleen <andi.kleen@...el.com>,
Tim Chen <tim.c.chen@...el.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Ying Huang <ying.huang@...el.com>,
Aaron Lu <aaron.lu@...el.com>, Aubrey Li <aubrey.li@...el.com>,
Linux MM <linux-mm@...ck.org>,
Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 2/5] mm: Extends local cpu counter vm_diff_nodestat
from s8 to s16
On Tue 19-12-17 10:05:48, Cristopher Lameter wrote:
> On Tue, 19 Dec 2017, Kemi Wang wrote:
>
> > The type s8 used for vm_diff_nodestat[] as local cpu counters has the
> > limitation of global counters update frequency, especially for those
> > monotone increasing type of counters like NUMA counters with more and more
> > cpus/nodes. This patch extends the type of vm_diff_nodestat from s8 to s16
> > without any functionality change.
>
> Well the reason for s8 was to keep the data structures small so that they
> fit in the higher level cpu caches. The large these structures become the
> more cachelines are used by the counters and the larger the performance
> influence on the code that should not be impacted by the overhead.
I am not sure I understand. We usually do not access more counters in
the single code path (well, PGALLOC and NUMA counteres is more of an
exception). So it is rarely an advantage that the whole array is in the
same cache line. Besides that this is allocated by the percpu allocator
aligns to the type size rather than cache lines AFAICS.
Maybe it used to be all different back then when the code has been added
but arguing about cache lines seems to be a bit problematic here. Maybe
you have some specific workloads which can prove me wrong?
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists