lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 23 Nov 2010 09:49:49 -0500
From:	Ben Gamari <bgamari.foss@...il.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Minchan Kim <minchan.kim@...il.com>, rsync@...ts.samba.org
Cc:	linux-kernel@...r.kernel.org, linux-mm@...ck.org,
	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Johannes Weiner <hannes@...xchg.org>,
	Nick Piggin <npiggin@...nel.dk>
Subject: [RFC PATCH] fadvise support in rsync


Warning for kernel folks: I'm not much of an mm person; let me know if I got
anything horribly wrong.

Many folks use rsync in their nightly backup jobs. In these applications, speed
is of minimal concern and should be sacrificed in order to minimize the effect
of rsync on the rest of the machine. When rsync is working on a large directory
it can quickly fill the page cache with written data, displacing the rest of
the system's working set. The solution for this is to inform the kernel that
our written pages will no longer be needed and should be expelled to disk. The
POSIX interface for this, posix_fadvise, has existed for some time, but there
has been no useable implementation in any of the major operating systems.

Attempts have been made in the past[1] to use the fadvise interface, but kernel
limitations have made this quite messy. In particular, the kernel supports
FADV_DONTNEED as a single-shot hint; i.e. if the page is clean when the hint is
given it will be freed, otherwise the hint is ignored. For this reason it is
necessary to fdatasync() against dirtied pages before giving the hint. This,
however, requires that rsync do some accounting, calling fdatasync() and
fadvise() only after giving the kernel an opportunity to flush the data itself.

Moreover, fadvise(DONTNEED) frees pages regardless of whether the hinting
process is the only referrer. For this reason, the previous fadvise patch also
used mincore to identify which pages are needed by other processes. Altogether,
this makes using fadvise very expensive from a complexity standpoint. This is
very unfortunately since the interface could be quite usable with a few minor
changes.

I recently asked about this on the LKML[2], where Minchan Kim was nice enough
to put together a patch improving support for the FADV_DONTNEED hint. His patch
adds invalidated flagged pages to the inactive list. This obviates the need for
fdatasync() since the page will be reclaimed by the kernel in the standard
inactive reclaim path. Moreover, by adding hinted pages to the head of the
inactive list, other processes are given ample time to call the pages back to
the active list, eliminating the need for the previous mincore() hack.

Here is my attempt at adding fadvise support to rsync (against v3.0.7). I do
this in both the sender (hinting after match_sums()) and the receiver (hinting
after receive_data()). In principle we could get better granularity if this was
hooked up within match_sums() (or even the map_ptr() interface) and the receive
loop in receive_data(), but I wanted to keep things simple at first (any
comments on these ideas?) . At the moment is for little more than testing.
Considering the potential negative effects of using FADV_DONTNEED on older
kernels, it is likely we will want this functionality off by default with a
command line flag to enable.

Cheers,

- Ben


[1] http://insights.oetiker.ch/linux/fadvise.html
[2] http://lkml.org/lkml/2010/11/21/59

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ