linux-kernel - Re: [PATCH 10/12] writeback: only allow one inflight and pending full flush

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20170928144100.e11801ef742521e0e3f4b8df@linux-foundation.org>
Date:   Thu, 28 Sep 2017 14:41:00 -0700
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Jens Axboe <axboe@...nel.dk>
Cc:     linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        hannes@...xchg.org, jack@...e.cz, torvalds@...ux-foundation.org
Subject: Re: [PATCH 10/12] writeback: only allow one inflight and pending
 full flush

On Wed, 27 Sep 2017 14:13:57 -0600 Jens Axboe <axboe@...nel.dk> wrote:

> When someone calls wakeup_flusher_threads() or
> wakeup_flusher_threads_bdi(), they schedule writeback of all dirty
> pages in the system (or on that bdi). If we are tight on memory, we
> can get tons of these queued from kswapd/vmscan. This causes (at
> least) two problems:
> 
> 1) We consume a ton of memory just allocating writeback work items.
>    We've seen as much as 600 million of these writeback work items
>    pending. That's a lot of memory to pointlessly hold hostage,
>    while the box is under memory pressure.
> 
> 2) We spend so much time processing these work items, that we
>    introduce a softlockup in writeback processing. This is because
>    each of the writeback work items don't end up doing any work (it's
>    hard when you have millions of identical ones coming in to the
>    flush machinery), so we just sit in a tight loop pulling work
>    items and deleting/freeing them.
> 
> Fix this by adding a 'start_all' bit to the writeback structure, and
> set that when someone attempts to flush all dirty pages. The bit is
> cleared when we start writeback on that work item. If the bit is
> already set when we attempt to queue !nr_pages writeback, then we
> simply ignore it.
> 
> This provides us one full flush in flight, with one pending as well,
> and makes for more efficient handling of this type of writeback.
> 
> ...
>
> @@ -953,12 +954,27 @@ static void wb_start_writeback(struct bdi_writeback *wb, bool range_cyclic,
>  		return;
>  
>  	/*
> +	 * All callers of this function want to start writeback of all
> +	 * dirty pages. Places like vmscan can call this at a very
> +	 * high frequency, causing pointless allocations of tons of
> +	 * work items and keeping the flusher threads busy retrieving
> +	 * that work. Ensure that we only allow one of them pending and
> +	 * inflight at the time. It doesn't matter if we race a little
> +	 * bit on this, so use the faster separate test/set bit variants.
> +	 */
> +	if (test_bit(WB_start_all, &wb->state))
> +		return;
> +
> +	set_bit(WB_start_all, &wb->state);

test_and_set_bit()?