linux-kernel - Re: [RFC PATCH 0/5] Remove dependency on congestion

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20210921204621.GY2361455@dread.disaster.area>
Date:   Wed, 22 Sep 2021 06:46:21 +1000
From:   Dave Chinner <david@...morbit.com>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Linux-MM <linux-mm@...ck.org>, NeilBrown <neilb@...e.de>,
        Theodore Ts'o <tytso@....edu>,
        Andreas Dilger <adilger.kernel@...ger.ca>,
        "Darrick J . Wong" <djwong@...nel.org>,
        Matthew Wilcox <willy@...radead.org>,
        Michal Hocko <mhocko@...e.com>,
        Rik van Riel <riel@...riel.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Johannes Weiner <hannes@...xchg.org>,
        Jonathan Corbet <corbet@....net>,
        Linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/

On Mon, Sep 20, 2021 at 09:54:31AM +0100, Mel Gorman wrote:
> Cc list similar to "congestion_wait() and GFP_NOFAIL" as they're loosely
> related.
> 
> This is a prototype series that removes all calls to congestion_wait
> in mm/ and deletes wait_iff_congested. It's not a clever
> implementation but congestion_wait has been broken for a long time
> (https://lore.kernel.org/linux-mm/45d8b7a6-8548-65f5-cccf-9f451d4ae3d4@kernel.dk/).
> Even if it worked, it was never a great idea. While excessive
> dirty/writeback pages at the tail of the LRU is one possibility that
> reclaim may be slow, there is also the problem of too many pages being
> isolated and reclaim failing for other reasons (elevated references,
> too many pages isolated, excessive LRU contention etc).
> 
> This series replaces the reclaim conditions with event driven ones
> 
> o If there are too many dirty/writeback pages, sleep until a timeout
>   or enough pages get cleaned
> o If too many pages are isolated, sleep until enough isolated pages
>   are either reclaimed or put back on the LRU
> o If no progress is being made, let direct reclaim tasks sleep until
>   another task makes progress
> 
> This has been lightly tested only and the testing was useless as the
> relevant code was not executed. The workload configurations I had that
> used to trigger these corner cases no longer work (yey?) and I'll need
> to implement a new synthetic workload. If someone is aware of a realistic
> workload that forces reclaim activity to the point where reclaim stalls
> then kindly share the details.

Got a git tree pointer so I can pull it into a test kernel so I can
see what impact it has on behaviour before I try to make sense of
the code?

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com