linux-kernel - Re: [PATCH] mm: warn about allocations which stall for too long

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160929084815.GD408@dhcp22.suse.cz>
Date:   Thu, 29 Sep 2016 10:48:15 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:     linux-mm@...ck.org, akpm@...ux-foundation.org, hannes@...xchg.org,
        mgorman@...e.de, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: warn about allocations which stall for too long

On Tue 27-09-16 21:57:26, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > > > ) rather than by line number, and surround __warn_memalloc_stall() call with
> > > > > mutex in order to serialize warning messages because it is possible that
> > > > > multiple allocation requests are stalling?
> > > > 
> > > > we do not use any lock in warn_alloc_failed so why this should be any
> > > > different?
> > > 
> > > warn_alloc_failed() is called for both __GFP_DIRECT_RECLAIM and
> > > !__GFP_DIRECT_RECLAIM allocation requests, and it is not allowed
> > > to sleep if !__GFP_DIRECT_RECLAIM. Thus, we have to tolerate that
> > > concurrent memory allocation failure messages make dmesg output
> > > unreadable. But __warn_memalloc_stall() is called for only
> > > __GFP_DIRECT_RECLAIM allocation requests. Thus, we are allowed to
> > > sleep in order to serialize concurrent memory allocation stall
> > > messages.
> > 
> > I still do not see a point. A single line about the warning and locked
> > dump_stack sounds sufficient to me.
> 
> printk() is slow operation. It is possible that two allocation requests
> start within time period needed for completing warn_alloc_failed().
> It is possible that multiple concurrent allocations are stalling when
> one of them cannot be satisfied. The consequence is multiple concurrent
> timeouts corrupting dmesg.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-nolock.txt.xz
> (Please ignore Oops at do_task_stat(); it is irrelevant to this topic.)
> 
> If we guard it with mutex_lock(&oom_lock)/mutex_unlock(&oom_lock),
> no corruption.
> http://I-love.SAKURA.ne.jp/tmp/serial-20160927-lock.txt.xz

I have just posted v2 which reuses warn_alloc_failed infrastructure. If
we want to have a lock there then it should be a separate patch imho.
Ideally with and example from your above kernel log.
-- 
Michal Hocko
SUSE Labs