lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180730145425.GE1206094@devbig004.ftw2.facebook.com>
Date:   Mon, 30 Jul 2018 07:54:25 -0700
From:   Tejun Heo <tj@...nel.org>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
        Roman Gushchin <guro@...com>,
        Johannes Weiner <hannes@...xchg.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        David Rientjes <rientjes@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
 should_reclaim_retry().

Hello,

On Mon, Jul 30, 2018 at 04:46:47PM +0200, Michal Hocko wrote:
> On Mon 30-07-18 23:34:23, Tetsuo Handa wrote:
> > On 2018/07/30 18:32, Michal Hocko wrote:
> [...]
> > > This one is waiting for draining and we are in mm_percpu_wq WQ context
> > > which has its rescuer so no other activity can block us for ever. So
> > > this certainly shouldn't deadlock. It can be dead slow but well, this is
> > > what you will get when your shoot your system to death.
> > 
> > We need schedule_timeout_*() to allow such WQ_MEM_RECLAIM workqueues to wake up. (Tejun,
> > is my understanding correct?) Lack of schedule_timeout_*() does block WQ_MEM_RECLAIM
> > workqueues forever.
> 
> Hmm. This doesn't match my understanding of what WQ_MEM_RECLAIM actually
> guarantees. If you are right then the whole thing sounds quite fragile
> to me TBH.

Workqueue doesn't think the cpu is stalled as long as one of the
per-cpu kworkers is running.  The assumption is that kernel threads
are not supposed to be busy-looping indefinitely (and they really
shouldn't).  We can add timeout mechanism to workqueue so that it
kicks off other kworkers if one of them is in running state for too
long, but idk, if there's an indefinite busy loop condition in kernel
threads, we really should get rid of them and hung task watchdog is
pretty effective at finding these cases (at least with preemption
disabled).

Thanks.

-- 
tejun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ