lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20120622142627.b6184eda.akpm@linux-foundation.org>
Date:	Fri, 22 Jun 2012 14:26:27 -0700
From:	Andrew Morton <akpm@...ux-foundation.org>
To:	Paolo Bonzini <pbonzini@...hat.com>
Cc:	linux-kernel@...r.kernel.org, Hugh Dickins <hughd@...gle.com>,
	Chris Friesen <chris.friesen@...band.com>
Subject: Re: [PATCH v2 2/2] msync: start async writeout when MS_ASYNC

On Fri, 15 Jun 2012 17:12:59 +0200
Paolo Bonzini <pbonzini@...hat.com> wrote:

> msync.c says that applications had better use fsync() or fadvise(FADV_DONTNEED)
> instead of MS_ASYNC.  Both advices are really bad:
> 
> * fsync() can be a replacement for MS_SYNC, not for MS_ASYNC;
> 
> * fadvise(FADV_DONTNEED) invalidates the pages completely, which will make
>   later accesses expensive.
> 
> Even sync_file_range would not be a replacement, because the writeout is
> done synchronously and can block for an extended period of time.

This is just wrong.  sync_file_range() is, within limits, asynchronous
when SYNC_FILE_RANGE_WAIT_* are not used.

> Having the possibility to schedule a writeback immediately is an advantage
> for the applications.

Having this forced upon them is also a disadvantage.  The syscall will
now take longer, consuming more CPU: starting all that IO will add
latency.  It also moves work away from the flusher threads and into the
calling process thus increasing overall runtime and reducing SMP
utilisation.

And as bdi_wrte_congested() is a best-effort, sometime-gets-it-wrong
thing, the patch will introduce quite rare but very long delays where
msync(MS_ASYNC) waits on IO.

>  They can do the same thing that fadvise does,
> but without the invalidation part.  The implementation is also similar
> to fadvise, but with tag-and-write enabled.
> 
> One example is if you are implementing a persistent dirty bitmap.
> Whenever you set bits to 1 you need to synchronize it with MS_SYNC, so
> that dirtiness is reported properly after a host crash.  If you have set
> any bits to 0, getting them to disk is not needed for correctness, but
> it is still desirable to save some work after a host crash.  You could
> simply use MS_SYNC in a separate thread, but MS_ASYNC provides exactly
> the desired semantics and is easily done in the kernel.

This is already the case.  The current msync(MS_ASYNC) will mark the
pages for writeout within a dirty_expire_centisecs period (default 30
seconds).  This has always been why we consider the current MS_ASYNC
implementation to be standards-compliant.

If you think that some applications will *benefit* from having that 30
seconds changed to zero seconds under their feet then please describe
the reasoning.

> If the application does not want to start I/O, it can simply call msync
> with flags equal to MS_INVALIDATE.  This one remains a no-op, as it should
> be on a reasonable implementation.

Using MS_INVALIDATE is a bit of a hack.


I'm just not seeing it, sorry.  The change has risks and downsides and
forces the application to do things which it could already have done,
had it so chosen.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ