lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200720202021.GE139672@carbon.dhcp.thefacebook.com>
Date:   Mon, 20 Jul 2020 13:20:21 -0700
From:   Roman Gushchin <guro@...com>
To:     Michal Hocko <mhocko@...nel.org>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>, <linux-mm@...ck.org>,
        <kernel-team@...com>, <linux-kernel@...r.kernel.org>,
        Hugh Dickins <hughd@...gle.com>
Subject: Re: [PATCH v2] mm: vmstat: fix /proc/sys/vm/stat_refresh generating
 false warnings

On Mon, Jul 20, 2020 at 10:03:49AM +0200, Michal Hocko wrote:
> On Tue 14-07-20 10:39:20, Roman Gushchin wrote:
> > I've noticed a number of warnings like "vmstat_refresh: nr_free_cma
> > -5" or "vmstat_refresh: nr_zone_write_pending -11" on our production
> > hosts. The numbers of these warnings were relatively low and stable,
> > so it didn't look like we are systematically leaking the counters.
> > The corresponding vmstat counters also looked sane.
> > 
> > These warnings are generated by the vmstat_refresh() function, which
> > assumes that atomic zone and numa counters can't go below zero.
> > However, on a SMP machine it's not quite right: due to per-cpu
> > caching it can in theory be as low as -(zone threshold) * NR_CPUs.
> > 
> > For instance, let's say all cma pages are in use and NR_FREE_CMA_PAGES
> > reached 0. Then we've reclaimed a small number of cma pages on each
> > CPU except CPU0, so that most percpu NR_FREE_CMA_PAGES counters are
> > slightly positive (the atomic counter is still 0). Then somebody on
> > CPU0 consumes all these pages. The number of pages can easily exceed
> > the threshold and a negative value will be committed to the atomic
> > counter.
> > 
> > To fix the problem and avoid generating false warnings, let's just
> > relax the condition and warn only if the value is less than minus
> > the maximum theoretically possible drift value, which is 125 *
> > number of online CPUs. It will still allow to catch systematic leaks,
> > but will not generate bogus warnings.
> > 
> > Signed-off-by: Roman Gushchin <guro@...com>
> > Cc: Hugh Dickins <hughd@...gle.com>
> 
> Acked-by: Michal Hocko <mhocko@...e.com>
> 
> One minor nit which can be handled by a separate patch but now that you
> are touching this code...

Thank you!

> 
> > ---
> >  Documentation/admin-guide/sysctl/vm.rst |  4 ++--
> >  mm/vmstat.c                             | 30 ++++++++++++++++---------
> >  2 files changed, 21 insertions(+), 13 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> > index 4b9d2e8e9142..95fb80d0c606 100644
> > --- a/Documentation/admin-guide/sysctl/vm.rst
> > +++ b/Documentation/admin-guide/sysctl/vm.rst
> > @@ -822,8 +822,8 @@ e.g. cat /proc/sys/vm/stat_refresh /proc/meminfo
> >  
> >  As a side-effect, it also checks for negative totals (elsewhere reported
> >  as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
> > -(At time of writing, a few stats are known sometimes to be found negative,
> > -with no ill effects: errors and warnings on these stats are suppressed.)
> > +(On a SMP machine some stats can temporarily become negative, with no ill
> > +effects: errors and warnings on these stats are suppressed.)
> >  
> >  
> >  numa_stat
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index a21140373edb..8f0ef8aaf8ee 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -169,6 +169,8 @@ EXPORT_SYMBOL(vm_node_stat);
> >  
> >  #ifdef CONFIG_SMP
> >  
> > +#define MAX_THRESHOLD 125
> 
> This would deserve a comment. 88f5acf88ae6a didn't really explain why
> this specific value has been selected but the specific value shouldn't
> really matter much. I would go with the following at least.
> "
> Maximum sync threshold for per-cpu vmstat counters. 
> "

Agree. Below is the diff to be squashed into the original patch.

Thanks!

--

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 08e415e0a15d..ddc59b533599 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -167,6 +167,9 @@ EXPORT_SYMBOL(vm_zone_stat);
 EXPORT_SYMBOL(vm_numa_stat);
 EXPORT_SYMBOL(vm_node_stat);
 
+/*
+ * Maximum sync threshold for per-cpu vmstat counters.
+ */
 #ifdef CONFIG_SMP
 #define MAX_THRESHOLD 125
 #else

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ