lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 6 Jul 2017 17:43:12 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Debabrata Banerjee <dbavatar@...il.com>
Cc:     Roman Gushchin <guro@...com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...e.com>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Rik van Riel <riel@...hat.com>, kernel-team@...com,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: make allocation counters per-order

On Thu, Jul 06, 2017 at 12:12:47PM -0400, Debabrata Banerjee wrote:
> On Thu, Jul 6, 2017 at 11:51 AM, Mel Gorman <mgorman@...hsingularity.net> wrote:
> >
> > These counters do not actually help you solve that particular problem.
> > Knowing how many allocations happened since the system booted doesn't tell
> > you much about how many failed or why they failed. You don't even know
> > what frequency they occured at unless you monitor it constantly so you're
> > back to square one whether this information is available from proc or not.
> > There even is a tracepoint that can be used to track information related
> > to events that degrade fragmentation (trace_mm_page_alloc_extfrag) although
> > the primary thing it tells you is that "the probability that an allocation
> > will fail due to fragmentation in the future is potentially higher".
> 
> I agree these counters don't have enough information, but there a
> start to a first order approximation of the current state of memory.

That incurs a universal cost on the off-chance of debugging and ultimately
the debugging is only useful in combination with developing kernel patches
in which case it could be behind a kconfig option.

> buddyinfo and pagetypeinfo basically show no information now, because

They can be used to calculate a fragmentation index at a given point in
time. Admittedly, building a bigger picture requires a full scan of memory
(and that's what was required when fragmentation avoidance was first
being implemented).

> they only involve the small amount of free memory under the watermark
> and all our machines are in this state. As second order approximation,
> it would be nice to be able to get answers like: "There are
> reclaimable high order allocations of at least this order" and "None
> of this order allocation can become available due to unmovable and
> unreclaimable allocations"

Which this patch doesn't provide as what you are looking for requires
a full scan of memory to determine. I've done it in the past using a
severe abuse of systemtap to load a module that scans all of memory with
a variation of PAGE_OWNER to identify stack traces of pages that "don't
belonw" within a pageblock.

Even *with* that information, your options for tuning an unmodified kernel
are basically limited to increasing min_free_kbytes, altering THP's level
of aggression when compacting or brute forcing with either drop_caches,
compact_node or both. All other options after that require kernel patches
-- altering annotations, altering fallback mechanisms, altering compaction,
improving support for pages that can be migrated etc.

-- 
Mel Gorman
SUSE Labs

Powered by blists - more mailing lists