lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 19 Mar 2013 01:38:05 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Jan Kara <jack@...e.cz>
Cc:	David Howells <dhowells@...hat.com>,
	Miklos Szeredi <miklos@...redi.hu>,
	torvalds@...ux-foundation.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, hch@...radead.org,
	akpm@...ux-foundation.org, apw@...onical.com, nbd@...nwrt.org,
	neilb@...e.de, jordipujolp@...il.com, ezk@....cs.sunysb.edu,
	sedat.dilek@...glemail.com, hooanon05@...oo.co.jp, mszeredi@...e.cz
Subject: Re: [PATCH 2/9] vfs: export do_splice_direct() to modules

On Mon, Mar 18, 2013 at 11:01:03PM +0000, Al Viro wrote:

> I'm looking at the existing callers and I really wonder if we ought to
> push sb_start_write() from ->splice_write()/->aio_write()/etc. into the
> callers.
> 
> Something like file_start_write()/file_end_write(), with check for file
> being regular one might be a good starting point.  As it is, copyup is
> really fucked both in unionmount and overlayfs...

BTW, I wonder what's the right locking for that sucker; overlayfs is probably
too heavy - we are talking about copying a file from one fs to another, which
can obviously take quite a while, so holding ->i_mutex on _parent_ all along
is asking for very serious contention.  OTOH, there's a pile of unpleasant
races that need to be dealt with; consider e.g. chmod("lower_file", 0644) vs.
unlink("lower_file").  The former means creating a copy of lower_file in the
covering layer (and chmod of that copy once it's finished).  The latter
means creating a whiteout in the covering layer.  No matter which comes first,
the result *must* be whiteout in directory + no stray copies left in covering
layer.  chmod() may legitimately return -ENOENT or 0, but resulting state of
fs is unambiguous.  Holding ->i_mutex on parent would, of course, suffice to
serialize them, but it's not particulary nice thing to do wrt contention.

Another fun issue is copyup vs. copyup - we want to sit and wait for copyup
attempt in progress to complete, rather than start another one in parallel.
And whoever comes the second must check if copyup has succeeded, obviously -
it's possible to have user run into 5% limit and fail copyup, followed by
root doing it successfully.

Another one: overwriting rename() vs. copyup.  Similar to unlink() vs. copyup().

Another one: read-only open() vs. copyup().  Hell knows - we obviously don't
want to open a partially copied file; might want to wait or pretend that this
open() has come first and give it the underlying file.  The same goes for
stat() vs. copyup().

FWIW, something like "lock parent, ->create(), ->unlink(), unlock parent,
copy data and metadata, lock parent, allocate a new dentry in covering layer
and do ->lookup() on it, verify that it is negative and not a whiteout, lock
child, use ->link() to put it into directory, unlock everything" would
probably DTRT for unlink/copyup and rename/copyup.  The rest... hell knows;
->i_mutex on child is a non-starter, since we couldn't use the normal ->write()
or ->splice_write() under it.  One possibility is to allocate a structure for
copyup in progress, make it easily located (hash by lower dentry/upper sb
pair) and serialize on that.  Hell knows...

One potential unpleasantness here is dcache lookup coming between ->create()
and ->unlink(); OTOH, there had been fairly reasonable requests for something
like atomic combination of open and unlink and it's not like _that_ would be
hard to implement as a new method - pretty much everything local could just
take part of ->create() and that would be it.  Which might make sense on its
own - open() flag creating a new temporary file, unlinked from the very
beginning and thus killed when we close the damn thing.  Objections?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ