lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20231130041444.7cbhbvkvwpcbbhyd@moria.home.lan>
Date:   Wed, 29 Nov 2023 23:14:44 -0500
From:   Kent Overstreet <kent.overstreet@...ux.dev>
To:     Qi Zheng <zhengqi.arch@...edance.com>
Cc:     Michal Hocko <mhocko@...e.com>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Linux-MM <linux-mm@...ck.org>, linux-kernel@...r.kernel.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH 2/7] mm: shrinker: Add a .to_text() method for shrinkers

On Thu, Nov 30, 2023 at 11:42:45AM +0800, Qi Zheng wrote:
> > Similarly for shrinkers, we're not going to be printing all of them -
> > the patchset picks the top 10 by objects and prints those. Could
> > probably be ~4, there's fewer shrinkers than slabs; also if we can get
> > shrinkers to report on memory owned in bytes, that will help too with
> > deciding what information is pertinent.
> 
> I'm not worried about the shrinker's general data. What I'm worried
> about is the shrinker's private data. Except for the corresponding
> developers, others don't know the meaning of the private statistical
> data, and we have no control over the printing quantity and form of
> the private data. This may indeed cause OOM log confusion and failure
> to automatically parse. For this, any thoughts?

If a shrinker is responsible for the OOM, then that private state is
exactly what is needed to debug the issue.

I explained this earlier - shrinkers can skip reclaiming objects for a
variety of reasons; in bcachefs's case that could be trylock() failure,
an IO in flight, the node being dirty, and more. Unlock the system inode
shrinker, it's much too expensive to move objects on and off the
shrinker list whenever they're touched. Again, this all comes from real
world experience.

The show_mem report is already full of numbers with zero explanation of
how they're relevant for debugging OOMS; we really need to improve how
that is presented as well.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ