linux-kernel - Re: [PATCH] locking/hung_task: Defer showing held locks

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Tue, 28 Feb 2017 11:05:06 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
To:     vegard.nossum@...il.com
Cc:     linux-kernel@...r.kernel.org, vegard.nossum@...cle.com,
        akpm@...ux-foundation.org, torvalds@...ux-foundation.org,
        msb@...omium.org, paulmck@...ux.vnet.ibm.com, peterz@...radead.org,
        tglx@...utronix.de, mingo@...nel.org
Subject: Re: [PATCH] locking/hung_task: Defer showing held locks

Vegard, can I send this patch?

Tetsuo Handa wrote:
> Vegard Nossum wrote:
> > On 13 December 2016 at 15:45, Tetsuo Handa
> > <penguin-kernel@...ove.sakura.ne.jp> wrote:
> > > When I was running my testcase which may block hundreds of threads
> > > on fs locks, I got lockup due to output from debug_show_all_locks()
> > > added by commit b2d4c2edb2e4f89a ("locking/hung_task: Show all locks").
> > >
> > > I think we don't need to call debug_show_all_locks() on each blocked
> > > thread. Let's defer calling debug_show_all_locks() till before panic()
> > > or leaving for_each_process_thread() loop.
> >
> > First of all, sorry for not answering earlier.
> 
> No problem.
> 
> >
> > I'm not sure I fully understand the problem, you say the "output from
> > debug_show_all_locks()" caused a lockup, but was the problem simply
> > that the amount of output caused it to stall for a long time?
> 
> In Linux 4.9, in order to tell administrator that something might be wrong
> with memory allocation, warn_alloc() which calls printk() periodically
> when memory allocation is stalling for too long was added. However, since
> printk() waits until all pending data is sent to console using cond_resched(),
> printk() continues waiting as long as somebody else calls printk() when
> cond_resched() is called. This is problematic under OOM situation.
> 
> Since the OOM killer calls printk() with oom_lock held, it happened that
> printk() called from the OOM killer is forever unable to return because
> warn_alloc() periodically calls printk() since the OOM killer is holding
> oom_lock.
> 
> And it happened that khungtaskd is another source which calls printk()
> periodically when threads are blocked on fs locks waiting for memory
> allocation. debug_show_all_locks() generates far more amount of output
> compared to warn_alloc() if debug_show_all_locks() is called on each
> thread blocked on fs locks waiting for memory allocation. Therefore,
> we should avoid calling debug_show_all_locks() on each blocked thread.
> 
> Full story starts at http://lkml.kernel.org/r/1481020439-5867-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp but
> I appreciate if you can join on http://lkml.kernel.org/r/1478416501-10104-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp .
> 
> >
> > Could we instead
> >
> > 1) move the debug_show_all_locks() into the if
> > (sysctl_hung_task_panic) bit unconditionally
> >
> > 2) call something (touch_nmi_watchdog()?) inside debug_show_all_locks()
> >
> > 3) in another way make debug_show_all_locks() more robust so it doesn't "lockup"
> >
> > ?
> 
> Yes, that might be an improvement. But not needed for this patch.
>