linux-kernel - Re: [PATCH v3 00/35] Memory allocation profiling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240213231115.GF6184@frogsfrogsfrogs>
Date: Tue, 13 Feb 2024 15:11:15 -0800
From: "Darrick J. Wong" <djwong@...nel.org>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: David Hildenbrand <david@...hat.com>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Michal Hocko <mhocko@...e.com>, akpm@...ux-foundation.org,
	vbabka@...e.cz, hannes@...xchg.org, roman.gushchin@...ux.dev,
	mgorman@...e.de, dave@...olabs.net, willy@...radead.org,
	liam.howlett@...cle.com, corbet@....net, void@...ifault.com,
	peterz@...radead.org, juri.lelli@...hat.com,
	catalin.marinas@....com, will@...nel.org, arnd@...db.de,
	tglx@...utronix.de, mingo@...hat.com, dave.hansen@...ux.intel.com,
	x86@...nel.org, peterx@...hat.com, axboe@...nel.dk,
	mcgrof@...nel.org, masahiroy@...nel.org, nathan@...nel.org,
	dennis@...nel.org, tj@...nel.org, muchun.song@...ux.dev,
	rppt@...nel.org, paulmck@...nel.org, pasha.tatashin@...een.com,
	yosryahmed@...gle.com, yuzhao@...gle.com, dhowells@...hat.com,
	hughd@...gle.com, andreyknvl@...il.com, keescook@...omium.org,
	ndesaulniers@...gle.com, vvvvvv@...gle.com,
	gregkh@...uxfoundation.org, ebiggers@...gle.com, ytcoode@...il.com,
	vincent.guittot@...aro.org, dietmar.eggemann@....com,
	rostedt@...dmis.org, bsegall@...gle.com, bristot@...hat.com,
	vschneid@...hat.com, cl@...ux.com, penberg@...nel.org,
	iamjoonsoo.kim@....com, 42.hyeyoo@...il.com, glider@...gle.com,
	elver@...gle.com, dvyukov@...gle.com, shakeelb@...gle.com,
	songmuchun@...edance.com, jbaron@...mai.com, rientjes@...gle.com,
	minchan@...gle.com, kaleshsingh@...gle.com, kernel-team@...roid.com,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	iommu@...ts.linux.dev, linux-arch@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
	linux-modules@...r.kernel.org, kasan-dev@...glegroups.com,
	cgroups@...r.kernel.org
Subject: Re: [PATCH v3 00/35] Memory allocation profiling

On Tue, Feb 13, 2024 at 05:29:03PM -0500, Kent Overstreet wrote:
> On Tue, Feb 13, 2024 at 11:17:32PM +0100, David Hildenbrand wrote:
> > On 13.02.24 23:09, Kent Overstreet wrote:
> > > On Tue, Feb 13, 2024 at 11:04:58PM +0100, David Hildenbrand wrote:
> > > > On 13.02.24 22:58, Suren Baghdasaryan wrote:
> > > > > On Tue, Feb 13, 2024 at 4:24 AM Michal Hocko <mhocko@...e.com> wrote:
> > > > > > 
> > > > > > On Mon 12-02-24 13:38:46, Suren Baghdasaryan wrote:
> > > > > > [...]
> > > > > > > We're aiming to get this in the next merge window, for 6.9. The feedback
> > > > > > > we've gotten has been that even out of tree this patchset has already
> > > > > > > been useful, and there's a significant amount of other work gated on the
> > > > > > > code tagging functionality included in this patchset [2].
> > > > > > 
> > > > > > I suspect it will not come as a surprise that I really dislike the
> > > > > > implementation proposed here. I will not repeat my arguments, I have
> > > > > > done so on several occasions already.
> > > > > > 
> > > > > > Anyway, I didn't go as far as to nak it even though I _strongly_ believe
> > > > > > this debugging feature will add a maintenance overhead for a very long
> > > > > > time. I can live with all the downsides of the proposed implementation
> > > > > > _as long as_ there is a wider agreement from the MM community as this is
> > > > > > where the maintenance cost will be payed. So far I have not seen (m)any
> > > > > > acks by MM developers so aiming into the next merge window is more than
> > > > > > little rushed.
> > > > >
> > > > > We tried other previously proposed approaches and all have their
> > > > > downsides without making maintenance much easier. Your position is
> > > > > understandable and I think it's fair. Let's see if others see more
> > > > > benefit than cost here.
> > > > 
> > > > Would it make sense to discuss that at LSF/MM once again, especially
> > > > covering why proposed alternatives did not work out? LSF/MM is not "too far"
> > > > away (May).

You want to stall this effort for *three months* to schedule a meeting?

I would love to have better profiling of memory allocations inside XFS
so that we can answer questions like "What's going on with these
allocation stalls?" or "What memory is getting used, and where?" more
quickly than we can now.

Right now I get to stare at tracepoints and printk crap until my eyes
bleed, and maybe dchinner comes to my rescue and figures out what's
going on sooner than I do.  More often we just never figure it out
because only the customer can reproduce the problem, the reams of data
produced by ftrace is unmanageable, and BPF isn't always available.

I'm not thrilled by the large increase in macro crap in the allocation
paths, but I don't know of a better way to instrument things.  Our
attempts to use _RET_IP in XFS to instrument its code paths have never
worked quite right w.r.t. inlined functions and whatnot.

> > > > I recall that the last LSF/MM session on this topic was a bit unfortunate
> > > > (IMHO not as productive as it could have been). Maybe we can finally reach a
> > > > consensus on this.

>From my outsider's perspective, nine months have gone by since the last
LSF.  Who has come up with a cleaner/better/faster way to do what Suren
and Kent have done?  Were those code changes integrated into this
patchset?  Or why not?

Most of what I saw in 2023 involved compiler changes (great; now I have
to wait until RHEL 11/Debian 14 to use it) and/or still utilize fugly
macros.

Recalling all the way back to suggestions made in 2022, who wrote the
prototype for doing this via ftrace?  Or BPF?  How well did that go for
counting allocation events and the like?  I saw Tejun saying something
about how they use BPF aggressively inside Meta, but that was about it.

Were any of those solutions significantly better than what's in front of
us here?

I get it, a giant patch forcing everyone to know the difference between
alloc_foo and alloc_foo_noperf offends my (yours?) stylistic
sensibilities.  On the other side, making analysis easier during
customer escalations means we kernel people get data, answers, and
solutions sooner instead of watching all our time get eaten up on L4
support and backporting hell.

> > > I'd rather not delay for more bikeshedding. Before agreeing to LSF I'd
> > > need to see a serious proposl - what we had at the last LSF was people
> > > jumping in with half baked alternative proposals that very much hadn't
> > > been thought through, and I see no need to repeat that.
> > > 
> > > Like I mentioned, there's other work gated on this patchset; if people
> > > want to hold this up for more discussion they better be putting forth
> > > something to discuss.
> > 
> > I'm thinking of ways on how to achieve Michal's request: "as long as there
> > is a wider agreement from the MM community". If we can achieve that without
> > LSF, great! (a bi-weekly MM meeting might also be an option)
> 
> A meeting wouldn't be out of the question, _if_ there is an agenda, but:
> 
> What's that coffeee mug say? I just survived another meeting that
> could've been an email?

I congratulate you on your memory of my kitchen mug.  Yes, that's what
it says.

> What exactly is the outcome we're looking for?
> 
> Is there info that people are looking for? I think we summed things up
> pretty well in the cover letter; if there are specifics that people
> want to discuss, that's why we emailed the series out.
> 
> There's people in this thread who've used this patchset in production
> and diagnosed real issues (gigabytes of memory gone missing, I heard the
> other day); I'm personally looking for them to chime in on this thread
> (Johannes, Pasha).
> 
> If it's just grumbling about "maintenance overhead" we need to get past
> - well, people are going to have to accept that we can't deliver
> features without writing code, and I'm confident that the hooking in
> particular is about as clean as it's going to get, _regardless_ of
> toolchain support; and moreover it addresses what's been historically a
> pretty gaping hole in our ability to profile and understand the code we
> write.

Are you and Suren willing to pay whatever maintenance overhead there is?

--D