linux-kernel - Re: sync_file_range(SYNC_FILE_RANGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0806020906001.9534@blonde.site>
Date:	Mon, 2 Jun 2008 09:43:05 +0100 (BST)
From:	Hugh Dickins <hugh@...itas.com>
To:	Andrew Morton <akpm@...ux-foundation.org>
cc:	Pavel Machek <pavel@...e.cz>, mtk.manpages@...il.com,
	kernel list <linux-kernel@...r.kernel.org>,
	"Rafael J. Wysocki" <rjw@...k.pl>
Subject: Re: sync_file_range(SYNC_FILE_RANGE_WRITE) blocks?

On Sun, 1 Jun 2008, Andrew Morton wrote:
> On Mon, 2 Jun 2008 01:00:40 +0200 Pavel Machek <pavel@...e.cz> wrote:
> > > How about this:
> > > 
> > > - Add a new SYNC_FILE_RANGE_NON_BLOCKING
> > > 
> > > - If userspace set that flag, turn on writeback_control.nonblocking
> > >   in __filemap_fdatawrite_range().
> > > 
> > > - test it a lot.
> > 
> > Works for me. Is the expectation that I code this? I can certainly
> > provide testing ;-).
> 
> Something like this:

Though this fits very easily into the current kernel implementation,
I don't think it's the right interface for userspace.

If we do go this kind of a way, then I'd say SYNC_FILE_RANGE_NON_BLOCKING
needs to tell the caller how far it got before giving up, rather than just
success or failure.  Why? um, um, because it feels right; and would help
the caller help the kernel by not overloading it with needlessly repeated
loop ranges - any stronger reasons?  But sync_file_range() was defined
to return int rather than ssize_t, so that becomes awkward.

Never mind, I don't think it is the right way anyway.  We don't need
additions to the existing sync_file_range() interface, we just need it
to behave as naive people like Pavel and I expected it to behave in the
first place: SYNC_FILE_RANGE_WRITE should be nonblocking (with respect
to queue congestion, and maybe page locking also).

I was imagining that where the existing nonblocking code just gives up,
the SYNC_FILE_RANGE_WRITE case should schedule the remaining work to be
done a little later: possibly by poking and/or leaving info for pdflush.

I guess there may be some resource fairness issues, with either approach:
it ought not to be unreasonable for a process to proceed by writing a
page then SYNC_FILE_RANGE_WRITing that page, page by page, should it?

But once we claim nonblocking at the user interface, I expect we'll
come up against the raciness in the current nonblocking treatment:
just because bdi is not congested when it's tested doesn't mean we
won't block when the write is submitted.  Perhaps a BIO_RW_NONBLOCK
could fix that up?

Hugh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/