[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161029161230.GR19539@ZenIV.linux.org.uk>
Date: Sat, 29 Oct 2016 17:12:30 +0100
From: Al Viro <viro@...IV.linux.org.uk>
To: Christoph Hellwig <hch@....de>
Cc: torvalds@...ux-foundation.org, jack@...e.cz, dmonakhov@...nvz.org,
jmoyer@...hat.com, linux-fsdevel@...r.kernel.org,
linux-aio@...ck.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] aio: fix a user triggered use after free (and fix freeze
protection of aio writes)
On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote:
> > How about taking this chunk (i.e. telling lockdep that we are not holding this
> > thing) past the iter_op() call, where file_end_write() used to be?
>
> We can't as that would not fix the use after free (at least for the lockdep
> case - otherwise the call is a no-op). Once iter_op returns aio_complete
> might have dropped our reference to the file, and another thread might
> have closed the fd so that the fput from aio_complete was the last one.
>
> This is something that xfstests/323 can reproduce under the right conditions.
Eww... IOW, as soon as we'd submitted an async iocb, nobody can so much as
look at struct file *or* iocb, right? Or underlying inode, or any fs-private
data structures attached to it...
I certainly agree that it's a bug, but IMO you are just papering over it.
Just look at e.g.
written = mapping->a_ops->direct_IO(iocb, &data);
/*
* Finally, try again to invalidate clean pages which might have been
* cached by non-direct readahead, or faulted in by get_user_pages()
* if the source of the write was an mmap'ed region of the file
* we're writing. Either one is a pretty crazy thing to do,
* so we don't support it 100%. If this invalidation
* fails, tough, the write still worked...
*/
if (mapping->nrpages) {
invalidate_inode_pages2_range(mapping,
pos >> PAGE_SHIFT, end);
}
in generic_file_direct_write() - who said that mapping doesn't point into
freed memory by that point? Or that
inode_lock(inode);
ret = generic_write_checks(iocb, from);
if (ret > 0)
ret = __generic_file_write_iter(iocb, from);
inode_unlock(inode);
in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode?
Or that anything of that sort won't happen in other ->write_iter instances,
for that matter?
NAK, with apologies for not having looked at that earlier. The bug is real,
all right, but this is not a solution - both incomplete and far too brittle.
Why do we play that kind of insane games, anyway? Why not e.g. refcount
the (async) iocb and have kiocb_free() drop the reference, with io_submit_one()
holding one across the call of aio_run_iocb()? Cacheline bouncing issues?
Anything more subtle?
Powered by blists - more mailing lists