linux-kernel - Re: [PATCH] mm: warn about allocations which stall for too long

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160926081200.GB27030@dhcp22.suse.cz>
Date:   Mon, 26 Sep 2016 10:12:01 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: warn about allocations which stall for too long

On Fri 23-09-16 10:34:01, Dave Hansen wrote:
> On 09/23/2016 01:15 AM, Michal Hocko wrote:
> > +	/* Make sure we know about allocations which stall for too long */
> > +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> > +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> > +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> > +				order, gfp_mask, &gfp_mask);
> > +		stall_timeout += 10 * HZ;
> > +		dump_stack();
> > +	}
> 
> This would make an awesome tracepoint.  There's probably still plenty of
> value to having it in dmesg, but the configurability of tracepoints is
> hard to beat.

Currently we only have trace_mm_page_alloc in __alloc_pages_nodemask. I
think we want to add another one to mark the beginning of the allocation
so that we can track allocation latencies per allocation context and
ideally drop them down into sources - congestion waits, reclaim path,
slab reclaim etc. Janani Ravichandran is working on a script to do that
http://lkml.kernel.org/r/20160911222411.GA2854@janani-Inspiron-3521

But this sounds a bit orthogonal to my proposal here because I would
really like to warn unconditionally when an allocation stalls for
unreasonably long. Tracepoints are not an ideal tool for that because
you have to start collecting tracing output before this situations
happen. Moreover in my experience I often had to replace my local
debugging trace_printks by regular printks because the prior ones just
got lost under a heavy memory pressure.
-- 
Michal Hocko
SUSE Labs