linux-kernel - Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170425225936.GA29655@linux.intel.com>
Date:   Tue, 25 Apr 2017 16:59:36 -0600
From:   Ross Zwisler <ross.zwisler@...ux.intel.com>
To:     Jan Kara <jack@...e.cz>
Cc:     Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        linux-kernel@...r.kernel.org,
        Alexander Viro <viro@...iv.linux.org.uk>,
        Alexey Kuznetsov <kuznet@...tuozzo.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>,
        Anna Schumaker <anna.schumaker@...app.com>,
        Christoph Hellwig <hch@....de>,
        Dan Williams <dan.j.williams@...el.com>,
        "Darrick J. Wong" <darrick.wong@...cle.com>,
        Eric Van Hensbergen <ericvh@...il.com>,
        Jens Axboe <axboe@...nel.dk>,
        Johannes Weiner <hannes@...xchg.org>,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        Latchesar Ionkov <lucho@...kov.net>,
        linux-cifs@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-mm@...ck.org, linux-nfs@...r.kernel.org,
        linux-nvdimm@...ts.01.org, Matthew Wilcox <mawilcox@...rosoft.com>,
        Ron Minnich <rminnich@...dia.gov>,
        samba-technical@...ts.samba.org, Steve French <sfrench@...ba.org>,
        Trond Myklebust <trond.myklebust@...marydata.com>,
        v9fs-developer@...ts.sourceforge.net
Subject: Re: [PATCH 2/2] dax: fix data corruption due to stale mmap reads

On Tue, Apr 25, 2017 at 01:10:43PM +0200, Jan Kara wrote:
<>
> Hum, but now thinking more about it I have hard time figuring out why write
> vs fault cannot actually still race:
> 
> CPU1 - write(2)				CPU2 - read fault
> 
> 					dax_iomap_pte_fault()
> 					  ->iomap_begin() - sees hole
> dax_iomap_rw()
>   iomap_apply()
>     ->iomap_begin - allocates blocks
>     dax_iomap_actor()
>       invalidate_inode_pages2_range()
>         - there's nothing to invalidate
> 					  grab_mapping_entry()
> 					  - we add zero page in the radix
> 					    tree & map it to page tables
> 
> Similarly read vs write fault may end up racing in a wrong way and try to
> replace already existing exceptional entry with a hole page?

Yep, this race seems real to me, too.  This seems very much like the issues
that exist when a thread is doing direct I/O.  One thread is doing I/O to an
intermediate buffer (page cache for direct I/O case, zero page for us), and
the other is going around it directly to media, and they can get out of sync.

IIRC the direct I/O code looked something like:

1/ invalidate existing mappings
2/ do direct I/O to media
3/ invalidate mappings again, just in case.  Should be cheap if there weren't
   any conflicting faults.  This makes sure any new allocations we made are
   faulted in.

I guess one option would be to replicate that logic in the DAX I/O path, or we
could try and enhance our locking so page faults can't race with I/O since
both can allocate blocks.

I'm not sure, but will think on it.