lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <94187fd5-ad70-eba7-2724-0fe5bed750d6@intel.com>
Date:   Wed, 20 Dec 2017 14:07:35 +0800
From:   kemi <kemi.wang@...el.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Johannes Weiner <hannes@...xchg.org>,
        Christopher Lameter <cl@...ux.com>,
        YASUAKI ISHIMATSU <yasu.isimatu@...il.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Nikolay Borisov <nborisov@...e.com>,
        Pavel Tatashin <pasha.tatashin@...cle.com>,
        David Rientjes <rientjes@...gle.com>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        Dave <dave.hansen@...ux.intel.com>,
        Andi Kleen <andi.kleen@...el.com>,
        Tim Chen <tim.c.chen@...el.com>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Ying Huang <ying.huang@...el.com>,
        Aaron Lu <aaron.lu@...el.com>, Aubrey Li <aubrey.li@...el.com>,
        Linux MM <linux-mm@...ck.org>,
        Linux Kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 4/5] mm: use node_page_state_snapshot to avoid
 deviation



On 2017年12月19日 20:43, Michal Hocko wrote:
> On Tue 19-12-17 14:39:25, Kemi Wang wrote:
>> To avoid deviation, this patch uses node_page_state_snapshot instead of
>> node_page_state for node page stats query.
>> e.g. cat /proc/zoneinfo
>>      cat /sys/devices/system/node/node*/vmstat
>>      cat /sys/devices/system/node/node*/numastat
>>
>> As it is a slow path and would not be read frequently, I would worry about
>> it.
> 
> The changelog doesn't explain why these counters needs any special
> treatment. _snapshot variants where used only for internal handling
> where the precision really mattered. We do not have any in-tree user and
> Jack has removed this by http://lkml.kernel.org/r/20171122094416.26019-1-jack@suse.cz
> which is already sitting in the mmotm tree. We can re-add it but that
> would really require a _very good_ reason.
> 

Assume we have *nr* cpus, and threshold size is *t*. Thus, the maximum deviation is nr*t.
Currently, Skylake platform has hundreds of CPUs numbers and the number is still 
increasing. Also, even the threshold size is kept to 125 at maximum (32765 
for NUMA counters now), the deviation is just a little too big as I have mentioned in 
the log. I tend to sum the number in local cpus up when query the global stats.

Also, node_page_state_snapshot is only called in slow path and I don't think that
would be a big problem. 

Anyway, it is a matter of taste. I just think it's better to have.

>> Signed-off-by: Kemi Wang <kemi.wang@...el.com>
>> ---
>>  drivers/base/node.c | 17 ++++++++++-------
>>  mm/vmstat.c         |  2 +-
>>  2 files changed, 11 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/base/node.c b/drivers/base/node.c
>> index a045ea1..cf303f8 100644
>> --- a/drivers/base/node.c
>> +++ b/drivers/base/node.c
>> @@ -169,12 +169,15 @@ static ssize_t node_read_numastat(struct device *dev,
>>  		       "interleave_hit %lu\n"
>>  		       "local_node %lu\n"
>>  		       "other_node %lu\n",
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_HIT),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_MISS),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_FOREIGN),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_INTERLEAVE_HIT),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_LOCAL),
>> -		       node_page_state(NODE_DATA(dev->id), NUMA_OTHER));
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_HIT),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_MISS),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_FOREIGN),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_INTERLEAVE_HIT),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id), NUMA_LOCAL),
>> +		       node_page_state_snapshot(NODE_DATA(dev->id),
>> +			       NUMA_OTHER));
>>  }
>>  
>>  static DEVICE_ATTR(numastat, S_IRUGO, node_read_numastat, NULL);
>> @@ -194,7 +197,7 @@ static ssize_t node_read_vmstat(struct device *dev,
>>  	for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++)
>>  		n += sprintf(buf+n, "%s %lu\n",
>>  			     vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
>> -			     node_page_state(pgdat, i));
>> +			     node_page_state_snapshot(pgdat, i));
>>  
>>  	return n;
>>  }
>> diff --git a/mm/vmstat.c b/mm/vmstat.c
>> index 64e08ae..d65f28d 100644
>> --- a/mm/vmstat.c
>> +++ b/mm/vmstat.c
>> @@ -1466,7 +1466,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
>>  		for (i = 0; i < NR_VM_NODE_STAT_ITEMS; i++) {
>>  			seq_printf(m, "\n      %-12s %lu",
>>  				vmstat_text[i + NR_VM_ZONE_STAT_ITEMS],
>> -				node_page_state(pgdat, i));
>> +				node_page_state_snapshot(pgdat, i));
>>  		}
>>  	}
>>  	seq_printf(m,
>> -- 
>> 2.7.4
>>
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ