linux-kernel - Re: [PATCH 3/3] dax: Handle write faults more efficiently

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrXdCWtsLxhx_DqNPEma4mo71iTXF-FTVzSOfD9HDaiqhg@mail.gmail.com>
Date:	Tue, 26 Jan 2016 22:01:08 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	Matthew Wilcox <willy@...ux.intel.com>
Cc:	Matthew Wilcox <matthew.r.wilcox@...el.com>,
	Ingo Molnar <mingo@...hat.com>,
	Kees Cook <keescook@...omium.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>
Subject: Re: [PATCH 3/3] dax: Handle write faults more efficiently

On Tue, Jan 26, 2016 at 8:17 PM, Matthew Wilcox <willy@...ux.intel.com> wrote:
> On Mon, Jan 25, 2016 at 09:38:19AM -0800, Andy Lutomirski wrote:
>> On Mon, Jan 25, 2016 at 9:25 AM, Matthew Wilcox
>> <matthew.r.wilcox@...el.com> wrote:
>> > From: Matthew Wilcox <willy@...ux.intel.com>
>> >
>> > When we handle a write-fault on a DAX mapping, we currently insert a
>> > read-only mapping and then take the page fault again to convert it to
>> > a writable mapping.  This is necessary for the case where we cover a
>> > hole with a read-only zero page, but when we have a data block already
>> > allocated, it is inefficient.
>> >
>> > Use the recently added vmf_insert_pfn_prot() to insert a writable mapping,
>> > even though the default VM flags say to use a read-only mapping.
>>
>> Conceptually, I like this.  Do you need to make sure to do all the
>> do_wp_page work, though?  (E.g. we currently update mtime in there.
>> Some day I'll fix that, but it'll be replaced with a set_bit to force
>> a deferred mtime update.)
>
> We update mtime in the ->fault handler of filesystems which support DAX
> like this:
>
>         if (vmf->flags & FAULT_FLAG_WRITE) {
>                 sb_start_pagefault(inode->i_sb);
>                 file_update_time(vma->vm_file);
>         }
>
> so I think we're covered.

A question that came up on IRC: if the page is a reflinked page on XFS
(whenever that feature lands), then presumably XFS has real work to do
in page_mkwrite.  If so, what ensures that page_mkwrite gets called?

As a half-baked alternative to this patch, there's a generic
optimization for this case.  do_shared_fault normally calls
do_page_mkwrite and installs the resulting page with the writable bit
set.  But if __do_fault returns VM_FAULT_NOPAGE, then this
optimization is skipped.  Could be add VM_FAULT_NOPAGE_READONLY (or
VM_FAULT_NOPAGE | VM_FAULT_READONLY) as a hint that a page was
installed but that it was installed readonly?  If we did that, then
do_shared_fault could check that bit and go through the wp_page logic
rather than returning to userspace.

--Andy