linux-kernel - Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending full flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <728d4141-8d73-97fb-de08-90671c2897da@kernel.dk>
Date:   Thu, 21 Sep 2017 09:36:45 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Christoph Hellwig <hch@...radead.org>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, hannes@...xchg.org, clm@...com, jack@...e.cz
Subject: Re: [PATCH 7/7] fs-writeback: only allow one inflight and pending
 full flush

On 09/21/2017 09:05 AM, Christoph Hellwig wrote:
> On Wed, Sep 20, 2017 at 09:33:02AM -0600, Jens Axboe wrote:
>> When someone calls wakeup_flusher_threads() or
>> wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
>> pages in the system (or on that bdi). If we are tight on memory, we
>> can get tons of these queued from kswapd/vmscan. This causes (at
>> least) two problems:
>>
>> 1) We consume a ton of memory just allocating writeback work items.
>> 2) We spend so much time processing these work items, that we
>>    introduce a softlockup in writeback processing.
>>
>> Fix this by adding a 'start_all' bit to the writeback structure, and
>> set that when someone attempts to flush all dirty page.  The bit is
>> cleared when we start writeback on that work item. If the bit is
>> already set when we attempt to queue !nr_pages writeback, then we
>> simply ignore it.
>>
>> This provides us one full flush in flight, with one pending as well,
>> and makes for more efficient handling of this type of writeback.
>>
>> Acked-by: Johannes Weiner <hannes@...xchg.org>
>> Tested-by: Chris Mason <clm@...com>
>> Reviewed-by: Jan Kara <jack@...e.cz>
>> Signed-off-by: Jens Axboe <axboe@...nel.dk>
>> ---
>>  fs/fs-writeback.c                | 24 ++++++++++++++++++++++++
>>  include/linux/backing-dev-defs.h |  1 +
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 3916ea2484ae..6205319d0c24 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -53,6 +53,7 @@ struct wb_writeback_work {
>>  	unsigned int for_background:1;
>>  	unsigned int for_sync:1;	/* sync(2) WB_SYNC_ALL writeback */
>>  	unsigned int auto_free:1;	/* free on completion */
>> +	unsigned int start_all:1;	/* nr_pages == 0 (all) writeback */
>>  	enum wb_reason reason;		/* why was writeback initiated? */
>>  
>>  	struct list_head list;		/* pending work list */
>> @@ -953,12 +954,26 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic,
>>  		return;
>>  
>>  	/*
>> +	 * All callers of this function want to start writeback of all
>> +	 * dirty pages. Places like vmscan can call this at a very
>> +	 * high frequency, causing pointless allocations of tons of
>> +	 * work items and keeping the flusher threads busy retrieving
>> +	 * that work. Ensure that we only allow one of them pending and
>> +	 * inflight at the time
>> +	 */
>> +	if (test_bit(WB_start_all, &wb->state))
>> +		return;
>> +
>> +	set_bit(WB_start_all, &wb->state);
> 
> This should be test_and_set_bit here..

That's on purpose, doesn't matter if we race here, and if we're
being hammered with flusher thread wakeups, then we don't want to
turn that unlocked test into a locked instruction.

> But more importantly once we are not guaranteed that we only have
> a single global wb_writeback_work per bdi_writeback we should just
> embedd that into struct bdi_writeback instead of dynamically
> allocating it.
We could do this as a followup. But right now the logic is that we
can have on started (inflight), and still have one new queued.

-- 
Jens Axboe