linux-kernel - Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for byte-addressable updates to pmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVY38h2ajpod2U_2pdHSp8zO4mG2p19h=OnnHmhGTairw@mail.gmail.com>
Date:   Sat, 17 Jun 2017 22:05:45 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     Dan Williams <dan.j.williams@...el.com>,
        Ross Zwisler <ross.zwisler@...ux.intel.com>,
        andy.rudoff@...el.com
Cc:     Andy Lutomirski <luto@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jan Kara <jack@...e.cz>,
        linux-nvdimm <linux-nvdimm@...ts.01.org>,
        Linux API <linux-api@...r.kernel.org>,
        Dave Chinner <david@...morbit.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        Jeff Moyer <jmoyer@...hat.com>,
        Linux FS Devel <linux-fsdevel@...r.kernel.org>,
        Christoph Hellwig <hch@....de>
Subject: Re: [RFC PATCH 2/2] mm, fs: daxfile, an interface for
 byte-addressable updates to pmem

On Sat, Jun 17, 2017 at 8:15 PM, Dan Williams <dan.j.williams@...el.com> wrote:
> On Sat, Jun 17, 2017 at 4:50 PM, Andy Lutomirski <luto@...nel.org> wrote:
>> My other objection is that the syscall intentionally leaks a reference
>> to the file.  This means it needs overflow protection and it probably
>> shouldn't ever be allowed to use it without privilege.
>
> We only hold the one reference while S_DAXFILE is set, so I think the
> protection is there, and per Dave's original proposal this requires
> CAP_LINUX_IMMUTABLE.
>
>> Why can't the underlying issue be easily fixed, though?  Could
>> .page_mkwrite just make sure that metadata is synced when the FS uses
>> DAX?
>
> Yes, it most definitely could and that idea has been floated.
>
>> On a DAX fs, syncing metadata should be extremely fast.  This
>> could be conditioned on an madvise or mmap flag if performance might
>> be an issue.  As far as I know, this change alone should be
>> sufficient.
>
> The hang up is that it requires per-fs enabling as it needs to be
> careful to manage mmap_sem vs fs journal locks for example. I know the
> in-development NOVA [1] filesystem is planning to support this out of
> the gate. ext4 would be open to implementing it, but I think xfs is
> cold on the idea. Christoph originally proposed it here [2], before
> Dave went on to propose immutable semantics.

Hmm.  Given a choice between a very clean API that works without
privilege but is awkward to implement on XFS and an awkward-to-use
API, I'd personally choose the former.

Dave, even with the lock ordering issue, couldn't XFS implement
MAP_PMEM_AWARE by having .page_mkwrite work roughly like this:

if (metadata is dirty) {
  up_write(&mmap_sem);
  sync the metadata;
  down_write(&mmap_sem);
  return 0;  /* retry the fault */
} else {
  return whatever success code;
}

This might require returning VM_FAULT_RETRY instead of 0 and it might
require auditing the core mm code to make sure that it can handle
mmap_sem being dropped like this.  I don't see why it couldn't work in
principle, though.