lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aEgCA3lJDWVu32Gc@tiehlicka>
Date: Tue, 10 Jun 2025 11:59:31 +0200
From: Michal Hocko <mhocko@...e.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
	Vlastimil Babka <vbabka@...e.cz>,
	"Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
	Baolin Wang <baolin.wang@...ux.alibaba.com>, david@...hat.com,
	lorenzo.stoakes@...cle.com, Liam.Howlett@...cle.com,
	rppt@...nel.org, surenb@...gle.com, donettom@...ux.ibm.com,
	aboorvad@...ux.ibm.com, sj@...nel.org, linux-mm@...ck.org,
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] mm: fix the inaccurate memory statistics issue for
 users

On Mon 09-06-25 17:45:05, Shakeel Butt wrote:
> On Mon, Jun 09, 2025 at 05:17:58PM -0700, Andrew Morton wrote:
> > On Mon, 9 Jun 2025 10:56:46 +0200 Vlastimil Babka <vbabka@...e.cz> wrote:
> > 
> > > On 6/9/25 10:52 AM, Vlastimil Babka wrote:
> > > > On 6/9/25 10:31 AM, Ritesh Harjani (IBM) wrote:
> > > >> Baolin Wang <baolin.wang@...ux.alibaba.com> writes:
> > > >>
> > > >>> On 2025/6/9 15:35, Michal Hocko wrote:
> > > >>>> On Mon 09-06-25 10:57:41, Ritesh Harjani wrote:
> > > >>>>>
> > > >>>>> Any reason why we dropped the Fixes tag? I see there were a series of
> > > >>>>> discussion on v1 and it got concluded that the fix was correct, then why
> > > >>>>> drop the fixes tag?
> > > >>>>
> > > >>>> This seems more like an improvement than a bug fix.
> > > >>>
> > > >>> Yes. I don't have a strong opinion on this, but we (Alibaba) will 
> > > >>> backport it manually,
> > > >>>
> > > >>> because some of user-space monitoring tools depend 
> > > >>> on these statistics.
> > > >>
> > > >> That sounds like a regression then, isn't it?
> > > > 
> > > > Hm if counters were accurate before f1a7941243c1 and not afterwards, and
> > > > this is making them accurate again, and some userspace depends on it,
> > > > then Fixes: and stable is probably warranted then. If this was just a
> > > > perf improvement, then not. But AFAIU f1a7941243c1 was the perf
> > > > improvement...
> > > 
> > > Dang, should have re-read the commit log of f1a7941243c1 first. It seems
> > > like the error margin due to batching existed also before f1a7941243c1.
> > > 
> > > " This patch converts the rss_stats into percpu_counter to convert the
> > > error  margin from (nr_threads * 64) to approximately (nr_cpus ^ 2)."
> > > 
> > > so if on some systems this means worse margin than before, the above
> > > "if" chain of thought might still hold.
> > 
> > f1a7941243c1 seems like a good enough place to tell -stable
> > maintainers where to insert the patch (why does this sound rude).
> > 
> > The patch is simple enough.  I'll add fixes:f1a7941243c1 and cc:stable
> > and, as the problem has been there for years, I'll leave the patch in
> > mm-unstable so it will eventually get into LTS, in a well tested state.
> 
> One thing f1a7941243c1 noted was that the percpu counter conversion
> enabled us to get more accurate stats with some cpu cost and in this
> patch Baolin has shown that the cpu cost of accurate stats is
> reasonable, so seems safe for stable backport. Also it seems like
> multiple users are impacted by this issue, so I am fine with stable
> backport.

Fair point.

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ