linux-kernel - Re: [PATCH] mm: warn about allocations which stall for too long

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160926081314.GC27030@dhcp22.suse.cz>
Date:   Mon, 26 Sep 2016 10:13:14 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Balbir Singh <bsingharora@...il.com>
Cc:     Dave Hansen <dave.hansen@...el.com>, linux-mm@...ck.org,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Mel Gorman <mgorman@...e.de>,
        Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: warn about allocations which stall for too long

On Sat 24-09-16 23:19:04, Balbir Singh wrote:
> 
> 
> On 24/09/16 03:34, Dave Hansen wrote:
> > On 09/23/2016 01:15 AM, Michal Hocko wrote:
> >> +	/* Make sure we know about allocations which stall for too long */
> >> +	if (!(gfp_mask & __GFP_NOWARN) && time_after(jiffies, alloc_start + stall_timeout)) {
> >> +		pr_warn("%s: page alloction stalls for %ums: order:%u mode:%#x(%pGg)\n",
> >> +				current->comm, jiffies_to_msecs(jiffies-alloc_start),
> >> +				order, gfp_mask, &gfp_mask);
> >> +		stall_timeout += 10 * HZ;
> >> +		dump_stack();
> >> +	}
> > 
> > This would make an awesome tracepoint.  There's probably still plenty of
> > value to having it in dmesg, but the configurability of tracepoints is
> > hard to beat.
> 
> An awesome tracepoint and a great place to trigger other tracepoints. With stall timeout
> increasing every time, do we only care about the first instance when we exceeded stall_timeout?
> Do we debug just that instance?

I am not sure I understand you here. The stall_timeout is increased to
see whether the situation is permanent of ephemeral. This is similar to
RCU lockup reports.
-- 
Michal Hocko
SUSE Labs