linux-kernel - Re: [RFC] panic_on_oom

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <201506112345.HBE32188.LJMOOFtVHOFSQF@I-love.SAKURA.ne.jp>
Date:	Thu, 11 Jun 2015 23:45:26 +0900
From:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:	mhocko@...e.cz
Cc:	linux-mm@...ck.org, rientjes@...gle.com, hannes@...xchg.org,
	tj@...nel.org, akpm@...ux-foundation.org,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC] panic_on_oom_timeout

Michal Hocko wrote:
> On Thu 11-06-15 22:12:40, Tetsuo Handa wrote:
> > Michal Hocko wrote:
> [...]
> > > > The moom_work used by SysRq-f sometimes cannot be executed
> > > > because some work which is processed before the moom_work is processed is
> > > > stalled for unbounded amount of time due to looping inside the memory
> > > > allocator.
> > > 
> > > Wouldn't wq code pick up another worker thread to execute the work.
> > > There is also a rescuer thread as the last resort AFAIR.
> > > 
> > 
> > Below is an example of moom_work lockup in v4.1-rc7 from
> > http://I-love.SAKURA.ne.jp/tmp/serial-20150611.txt.xz
> > 
> > ----------
> > [  171.710406] sysrq: SysRq : Manual OOM execution
> > [  171.720193] kworker/2:9 invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
> > [  171.722699] kworker/2:9 cpuset=/ mems_allowed=0
> > [  171.724603] CPU: 2 PID: 11016 Comm: kworker/2:9 Not tainted 4.1.0-rc7 #3
> > [  171.726817] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
> > [  171.729727] Workqueue: events moom_callback
> > (...snipped...)
> > [  258.302016] sysrq: SysRq : Manual OOM execution
> 
> Wow, this is a _lot_. I was aware that workqueues might be overloaded.
> We have seen that in real loads and that led to
> http://marc.info/?l=linux-kernel&m=141456398425553 wher the rescuer
> didn't handle pending work properly. I thought that the fix helped in
> the end. But 1.5 minutes is indeed unexpected for me.

Excuse me, but you misunderstood the log. The logs for uptime = 171 and
uptime = 258 are cases where SysRq-f (indicated by "sysrq: SysRq : Manual
OOM execution" message) immediately invoked the OOM killer (indicated by
"invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0" message).

What you should check is uptime > 301. Until I do SysRq-b at uptime = 707,
the "sysrq: SysRq : Manual OOM execution" message is printed but the
"invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0" message is not
printed. During this period (so far 5 minutes, presumably forever),
moom_callback() remained pending.

> 
> This of course disqualifies DELAYED_WORK for anything that has at least
> reasonable time expectations which is the case here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/