lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 27 May 2010 14:33:41 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Dave Chinner <david@...morbit.com>
Cc:	linux-kernel@...r.kernel.org, xfs@....sgi.com,
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
	tytso@....edu, jens.axboe@...cle.com
Subject: Re: [PATCH 6/6] writeback: limit write_cache_pages integrity
 scanning to current EOF

On Tue, 25 May 2010 20:54:12 +1000
Dave Chinner <david@...morbit.com> wrote:

> From: Dave Chinner <dchinner@...hat.com>
> 
> sync can currently take a really long time if a concurrent writer is
> extending a file. The problem is that the dirty pages on the address
> space grow in the same direction as write_cache_pages scans, so if
> the writer keeps ahead of writeback, the writeback will not
> terminate until the writer stops adding dirty pages.

<looks at Jens>

The really was a pretty basic bug.  It's writeback 101 to test that case :(

> For a data integrity sync, we only need to write the pages dirty at
> the time we start the writeback, so we can stop scanning once we get
> to the page that was at the end of the file at the time the scan
> started.
> 
> This will prevent operations like copying a large file preventing
> sync from completing as it will not write back pages that were
> dirtied after the sync was started. This does not impact the
> existing integrity guarantees, as any dirty page (old or new)
> within the EOF range at the start of the scan will still be
> captured.
> 
> Signed-off-by: Dave Chinner <dchinner@...hat.com>
> ---
>  mm/page-writeback.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0fe713d..c97e973 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -855,7 +855,22 @@ int write_cache_pages(struct address_space *mapping,
>  		if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX)
>  			range_whole = 1;
>  		cycled = 1; /* ignore range_cyclic tests */
> +
> +		/*
> +		 * If this is a data integrity sync, cap the writeback to the
> +		 * current end of file. Any extension to the file that occurs
> +		 * after this is a new write and we don't need to write those
> +		 * pages out to fulfil our data integrity requirements. If we
> +		 * try to write them out, we can get stuck in this scan until
> +		 * the concurrent writer stops adding dirty pages and extending
> +		 * EOF.
> +		 */
> +		if (wbc->sync_mode == WB_SYNC_ALL &&
> +		    wbc->range_end == LLONG_MAX) {
> +			end = i_size_read(mapping->host) >> PAGE_CACHE_SHIFT;
> +		}
>  	}
> +

This is somewhat inefficient.  It's really trivial and fast to find the
highest-index dirty page by walking straight down the
PAGECACHE_TAG_DIRTY-tagged nodes.

However pagevec_lookup_tag(..., PAGECACHE_TAG_DIRTY) should do a pretty
good job of skipping over the (millions of) pages between the (last
dirty page before `end') and (`end').  So it _should_ be OK.  Some thought
and runtime testing would be good.



That being said, I think the patch is insufficient.  If I create an
enormous (possibly sparse) file with a 16TB hole (or a run of clean
pages) in the middle and then start busily writing into that hole (run
of clean pages), the problem will still occur.

One obvious fix for that (a) would be to add another radix-tree tag and
do two passes across the radix-tree.

Another fix (b) would be to track the number of dirty pages per
adddress_space, and only write that number of pages.

Another fix would be to work out how the code handled this situation
before we broke it, and restore that in some fashion.  I guess fix (b)
above kinda does that.


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ