lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201109075058.GC12240@dhcp22.suse.cz>
Date:   Mon, 9 Nov 2020 08:50:58 +0100
From:   Michal Hocko <mhocko@...e.com>
To:     NeilBrown <neilb@...e.de>
Cc:     Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Trond Myklebust <trondmy@...merspace.com>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH rfc] workqueue: honour cond_resched() more effectively.

On Mon 09-11-20 13:54:59, Neil Brown wrote:
> 
> If a worker task for a normal (bound, non-CPU-intensive) calls
> cond_resched(), this allows other non-workqueue processes to run, but
> does *not* allow other workqueue workers to run.  This is because
> workqueue will only attempt to run one task at a time on each CPU,
> unless the current task actually sleeps.
> 
> This is already a problem for should_reclaim_retry() in mm/page_alloc.c,
> which inserts a small sleep just to convince workqueue to allow other
> workers to run.
> 
> It can also be a problem for NFS when closing a very large file (e.g.
> 100 million pages in memory).  NFS can call the final iput() from a
> workqueue, which can then take long enough to trigger a workqueue-lockup
> warning, and long enough for performance problems to be observed.
> 
> I added a WARN_ON_ONCE() in cond_resched() to report when it is run from
> a workqueue, and got about 20 hits during boot, many of system_wq (aka
> "events") which suggests there is a real chance that worker are being
> delayed unnecessarily be tasks which are written to politely relinquish
> the CPU.
> 
> I think that once a worker calls cond_resched(), it should be treated as
> though it was run from a WQ_CPU_INTENSIVE queue, because only cpu-intensive
> tasks need to call cond_resched().  This would allow other workers to be
> scheduled.
> 
> The following patch achieves this I believe.

I cannot really judge the implementation because my understanding of the
WQ concurrency control is very superficial but I echo that the existing
behavior is really nonintuitive. It certainly burnt me for the oom
situations where the page allocator cannot make much progress to reclaim
memory and it has to retry really hard. Having to handle worker context
explicitly/differently is error prone and as your example of final iput
in NFS shows that the allocator is not the only path affected so having
a general solution is better.

That being said I would really love to see cond_resched to work
transparently.

Thanks!
-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ