lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161221004031.GF9865@birch.djwong.org>
Date:   Tue, 20 Dec 2016 16:40:32 -0800
From:   "Darrick J. Wong" <darrick.wong@...cle.com>
To:     Dan Williams <dan.j.williams@...el.com>
Cc:     Ross Zwisler <ross.zwisler@...ux.intel.com>,
        Nicholas Piggin <npiggin@...il.com>,
        Dave Chinner <david@...morbit.com>, Jan Kara <jack@...e.cz>,
        Jeff Moyer <jmoyer@...hat.com>,
        Yumei Huang <yuhuang@...hat.com>,
        Michal Hocko <mhocko@...e.com>,
        Xiaof Guangrong <guangrong.xiao@...ux.intel.com>,
        KVM list <kvm@...r.kernel.org>,
        Dave Hansen <dave.hansen@...el.com>,
        Gleb Natapov <gleb@...nel.org>,
        "linux-nvdimm@...ts.01.org" <linux-nvdimm@...1.01.org>,
        mtosatti@...hat.com,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Christoph Hellwig <hch@...radead.org>,
        Linux MM <linux-mm@...ck.org>,
        Stefan Hajnoczi <stefanha@...hat.com>,
        linux-fsdevel <linux-fsdevel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: DAX mapping detection (was: Re: [PATCH] Fix region lost in
 /proc/self/smaps)

On Mon, Dec 19, 2016 at 05:18:40PM -0800, Dan Williams wrote:
> On Mon, Dec 19, 2016 at 5:09 PM, Darrick J. Wong
> <darrick.wong@...cle.com> wrote:
> > On Mon, Dec 19, 2016 at 02:11:49PM -0700, Ross Zwisler wrote:
> >> On Fri, Sep 16, 2016 at 03:54:05PM +1000, Nicholas Piggin wrote:
> >> <>
> >> > Definitely the first step would be your simple preallocated per
> >> > inode approach until it is shown to be insufficient.
> >>
> >> Reviving this thread a few months later...
> >>
> >> Dave, we're interested in taking a serious look at what it would take to get
> >> PMEM_IMMUTABLE working.  Do you still hold the opinion that this is (or could
> >> become, with some amount of work) a workable solution?
> >>
> >> We're happy to do the grunt work for this feature, but we will probably need
> >> guidance from someone with more XFS experience.  With you out on extended leave
> >> the first half of 2017, who would be the best person to ask for this guidance?
> >> Darrick?
> >
> > Yes, probably. :)
> >
> > I think where we left off with this (on the XFS side) is some sort of
> > fallocate mode that would allocate blocks, zero them, and then set the
> > DAX and PMEM_IMMUTABLE on-disk inode flags.  After that, you'd mmap the
> > file and thereby gain the ability to control write persistents behavior
> > without having to worry about fs metadata updates.  As an added plus, I
> > think zeroing the pmem also clears media errors, or something like that.
> >
> > <shrug> Is that a reasonable starting point?  My memory is a little foggy.
> >
> > Hmm, I see Dan just posted something about blockdev fallocate.  I'll go
> > read that.
> 
> That's for device-dax, which is basically a poor man's PMEM_IMMUTABLE
> via a character device interface. It's useful for cases where you want
> an entire nvdimm namespace/volume in "no fs-metadata to worry about"
> mode.  But, for sub-allocations of a namespace and support for
> existing tooling, PMEM_IMMUTABLE is much more usable.

Well sure... but otoh I was thinking that it'd be pretty neat if we
could use the same code regardless of whether the target file was a
dax-device or an xfs file:

fd = open("<some path>", O_RDWR);
fstat(fd, &statbuf):
fallocate(fd, FALLOC_FL_PMEM_IMMUTABLE, 0, statbuf.st_size);
p = mmap(NULL, statbuf.st_size, PROT_READ | PROT_WRITE, fd, 0);

*(p + 42) = 0xDEADBEEF;
asm { clflush; } /* or whatever */

...so perhaps it would be a good idea to design the fallocate primitive
around "prepare this fd for mmap-only pmem semantics" and let it the
backend do zeroing and inode flag changes as necessary to make it
happen.  We'd need to do some bikeshedding about what the other falloc
flags mean when we're dealing with pmem files and devices, but I think
we should try to keep the userland presentation the same unless there's
a really good reason not to.

--D

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ