[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210921204621.GY2361455@dread.disaster.area>
Date: Wed, 22 Sep 2021 06:46:21 +1000
From: Dave Chinner <david@...morbit.com>
To: Mel Gorman <mgorman@...hsingularity.net>
Cc: Linux-MM <linux-mm@...ck.org>, NeilBrown <neilb@...e.de>,
Theodore Ts'o <tytso@....edu>,
Andreas Dilger <adilger.kernel@...ger.ca>,
"Darrick J . Wong" <djwong@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Michal Hocko <mhocko@...e.com>,
Rik van Riel <riel@...riel.com>,
Vlastimil Babka <vbabka@...e.cz>,
Johannes Weiner <hannes@...xchg.org>,
Jonathan Corbet <corbet@....net>,
Linux-fsdevel <linux-fsdevel@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [RFC PATCH 0/5] Remove dependency on congestion_wait in mm/
On Mon, Sep 20, 2021 at 09:54:31AM +0100, Mel Gorman wrote:
> Cc list similar to "congestion_wait() and GFP_NOFAIL" as they're loosely
> related.
>
> This is a prototype series that removes all calls to congestion_wait
> in mm/ and deletes wait_iff_congested. It's not a clever
> implementation but congestion_wait has been broken for a long time
> (https://lore.kernel.org/linux-mm/45d8b7a6-8548-65f5-cccf-9f451d4ae3d4@kernel.dk/).
> Even if it worked, it was never a great idea. While excessive
> dirty/writeback pages at the tail of the LRU is one possibility that
> reclaim may be slow, there is also the problem of too many pages being
> isolated and reclaim failing for other reasons (elevated references,
> too many pages isolated, excessive LRU contention etc).
>
> This series replaces the reclaim conditions with event driven ones
>
> o If there are too many dirty/writeback pages, sleep until a timeout
> or enough pages get cleaned
> o If too many pages are isolated, sleep until enough isolated pages
> are either reclaimed or put back on the LRU
> o If no progress is being made, let direct reclaim tasks sleep until
> another task makes progress
>
> This has been lightly tested only and the testing was useless as the
> relevant code was not executed. The workload configurations I had that
> used to trigger these corner cases no longer work (yey?) and I'll need
> to implement a new synthetic workload. If someone is aware of a realistic
> workload that forces reclaim activity to the point where reclaim stalls
> then kindly share the details.
Got a git tree pointer so I can pull it into a test kernel so I can
see what impact it has on behaviour before I try to make sense of
the code?
Cheers,
Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists