linux-kernel - Re: [PATCH] [RFC] vmscan.c: add a sysctl entry for controlling memory reclaim IO congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190919082208.GB15782@dhcp22.suse.cz>
Date:   Thu, 19 Sep 2019 10:22:08 +0200
From:   Michal Hocko <mhocko@...nel.org>
To:     Lin Feng <linf@...gsu.com>
Cc:     Matthew Wilcox <willy@...radead.org>, corbet@....net,
        mcgrof@...nel.org, akpm@...ux-foundation.org,
        linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        keescook@...omium.org, mchehab+samsung@...nel.org,
        mgorman@...hsingularity.net, vbabka@...e.cz, ktkhai@...tuozzo.com,
        hannes@...xchg.org, Jens Axboe <axboe@...nel.dk>,
        Omar Sandoval <osandov@...com>, Ming Lei <ming.lei@...hat.com>
Subject: Re: [PATCH] [RFC] vmscan.c: add a sysctl entry for controlling
 memory reclaim IO congestion_wait length

On Thu 19-09-19 15:46:11, Lin Feng wrote:
> 
> 
> On 9/19/19 11:49, Matthew Wilcox wrote:
> > On Thu, Sep 19, 2019 at 10:33:10AM +0800, Lin Feng wrote:
> > > On 9/18/19 20:33, Michal Hocko wrote:
> > > > I absolutely agree here. From you changelog it is also not clear what is
> > > > the underlying problem. Both congestion_wait and wait_iff_congested
> > > > should wake up early if the congestion is handled. Is this not the case?
> > > 
> > > For now I don't know why, codes seem should work as you said, maybe I need to
> > > trace more of the internals.
> > > But weird thing is that once I set the people-disliked-tunable iowait
> > > drop down instantly, this is contradictory to the code design.
> > 
> > Yes, this is quite strange.  If setting a smaller timeout makes a
> > difference, that indicates we're not waking up soon enough.  I see
> > two possibilities; one is that a wakeup is missing somewhere -- ie the
> > conditions under which we call clear_wb_congested() are wrong.  Or we
> > need to wake up sooner.
> > 
> > Umm.  We have clear_wb_congested() called from exactly one spot --
> > clear_bdi_congested().  That is only called from:
> > 
> > drivers/block/pktcdvd.c
> > fs/ceph/addr.c
> > fs/fuse/control.c
> > fs/fuse/dev.c
> > fs/nfs/write.c
> > 
> > Jens, is something supposed to be calling clear_bdi_congested() in the
> > block layer?  blk_clear_congested() used to exist until October 29th
> > last year.  Or is something else supposed to be waking up tasks that
> > are sleeping on congestion?
> > 
> 
> IIUC it looks like after commit a1ce35fa49852db60fc6e268038530be533c5b15,

This is something for Jens to comment on. Not waiting up on congestion
indeed sounds like a bug.

> besides those *.c places as you mentioned above, vmscan codes will always
> wait as long as 100ms and nobody wakes them up.

Yes this is true but you should realize that this path is triggered only
under heavy memory reclaim cases where there is nothing to reclaim
because there are too many pages already isolated and we are waiting for
reclaimers to make some progress on them. It is also possible that there
are simply no reclaimable pages at all and we are heading the OOM
situation. In both cases waiting a bit shouldn't be critical because
this is really a cold path. It would be much better to have a mechanism
to wake up earlier but this is likely to be non trivial and I am not
sure worth the effort considering how rare this should be.
-- 
Michal Hocko
SUSE Labs