lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 26 Apr 2022 02:45:22 -0400
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Dave Chinner <dchinner@...hat.com>
Cc:     Roman Gushchin <roman.gushchin@...ux.dev>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Yang Shi <shy828301@...il.com>,
        Hillf Danton <hdanton@...a.com>
Subject: Re: [PATCH v2 0/7] mm: introduce shrinker debugfs interface

On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote:
> This just seems like a solution looking for a problem to solve.
> Can you please describe the problem this infrastructure is going
> to solve?

A point I was making over VC is that memcg is completely irrelevant to debugging
most of these issues; all the issues we've been talking about can be easily
reproduced in a single test VM without memcg.

Yet we don't even have the tooling to debug the simple stuff.

Why are we trying to make big and complicated stuff when we can't even debug the
simple cases? And I've been getting _really_ tired of the stock answer of "that
use case isn't interesting to the big cloud providers".

A: If you're a Linux kernel developer at this level, you have earned a great
deal of trust and it is incumbent upon you to be a good steward of the code you
have been entrusted with, instead of just spending all your time chasing fat
bonuses from your employer while ignoring what's good for the codebase as a
whole. That's pissing all over the commons that came long before you and will
hopefully still be around long after you.

B: Even aside from that, it's incredibly shortsighted and a poor use of time and
resources. When I was at Google I saw, over and over again, people rushing to do
something big and complicated and new because that was how they could get a
promotion, instead of working on basic stuff like refactoring core IO paths (and
it's been my experience over and over again that when you just try to make code
saner and more understandable, you almost always find big performance
improvements along the way... but that's not as exciting as rushing to find the
biggest coolest optimization or all-the-bells-and-whistles interface).

So yeah, this patchset screams of someone looking for a promotion to me.

Meanwhile, the status of visibility into the _basics_ of what goes on in MM is
utter dogshit. There's just too many _basic_ questions that are a pain in the
ass to answer - even just profiling memory usage by file:line number is a
shitshow.

One thing that I run into a lot is people rush to say "tracepoints!" for a lot
of problems - but tracepoints aren't a good answer for a lot of problems because
having them on all the time is problematic.

What I would like to see is more lighter weight collection of statistics, and
some basic library code for things like latency measurements of important
operations broken out by quantiles, with rate & frequence - this is something
that's helped in bcachefs. If anyone's interested, the code for that starts
here:

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/bcachefs.h#n322

Specifically for shrinkers, I'd like if we had rolling averages over the past
few seconds for e.g. _rate_ of objects requested to be freed vs. actually freed.
If we collect those kinds of rate measurements (and perhaps latency too, to show
stalls) at various places in the MM code, perhaps we'd be able to see what's
getting stuck when we OOM.

We should have rate of objects getting added, too, and we should be collecting
data from the list_lru code as well, like you were mentioning the other night.

And if we collect this data in such a way that it can be displayed in sysfs, but
done with the to_text() methods I've been talking about, it'll also be trivial
to include that in the show_mem() report when we OOM.

Anyways, that's my two cents.... I can't claim to have any brilliant insights
here, but I hope Roman will start taking ideas from more people (and Dave's been
a real wealth of information on this topic! I'd pick his brain if I were you,
Roman).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ