linux-kernel - Re: [PATCH] aio: fix a user triggered use after free (and fix freeze protection of aio writes)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20161029161230.GR19539@ZenIV.linux.org.uk>
Date:   Sat, 29 Oct 2016 17:12:30 +0100
From:   Al Viro <viro@...IV.linux.org.uk>
To:     Christoph Hellwig <hch@....de>
Cc:     torvalds@...ux-foundation.org, jack@...e.cz, dmonakhov@...nvz.org,
        jmoyer@...hat.com, linux-fsdevel@...r.kernel.org,
        linux-aio@...ck.org, linux-kernel@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH] aio: fix a user triggered use after free (and fix freeze
 protection of aio writes)

On Sat, Oct 29, 2016 at 05:20:17PM +0200, Christoph Hellwig wrote:
> On Sat, Oct 29, 2016 at 01:24:51PM +0100, Al Viro wrote:
> > How about taking this chunk (i.e. telling lockdep that we are not holding this
> > thing) past the iter_op() call, where file_end_write() used to be?
> 
> We can't as that would not fix the use after free (at least for the lockdep
> case - otherwise the call is a no-op).  Once iter_op returns aio_complete
> might have dropped our reference to the file, and another thread might
> have closed the fd so that the fput from aio_complete was the last one.
> 
> This is something that xfstests/323 can reproduce under the right conditions.

Eww...  IOW, as soon as we'd submitted an async iocb, nobody can so much as
look at struct file *or* iocb, right?  Or underlying inode, or any fs-private
data structures attached to it...

I certainly agree that it's a bug, but IMO you are just papering over it.
Just look at e.g.
        written = mapping->a_ops->direct_IO(iocb, &data);

        /*
         * Finally, try again to invalidate clean pages which might have been
         * cached by non-direct readahead, or faulted in by get_user_pages()
         * if the source of the write was an mmap'ed region of the file
         * we're writing.  Either one is a pretty crazy thing to do,
         * so we don't support it 100%.  If this invalidation
         * fails, tough, the write still worked...
         */
        if (mapping->nrpages) {
                invalidate_inode_pages2_range(mapping,
                                              pos >> PAGE_SHIFT, end);
        }
in generic_file_direct_write() - who said that mapping doesn't point into
freed memory by that point?  Or that
        inode_lock(inode);
        ret = generic_write_checks(iocb, from);
        if (ret > 0)
                ret = __generic_file_write_iter(iocb, from);
        inode_unlock(inode);
in generic_file_write_iter() won't blow up on inode_unlock() of a freed inode?
Or that anything of that sort won't happen in other ->write_iter instances,
for that matter?

NAK, with apologies for not having looked at that earlier.  The bug is real,
all right, but this is not a solution - both incomplete and far too brittle.

Why do we play that kind of insane games, anyway?  Why not e.g. refcount
the (async) iocb and have kiocb_free() drop the reference, with io_submit_one()
holding one across the call of aio_run_iocb()?  Cacheline bouncing issues?
Anything more subtle?