lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150929021807.GB27164@dastard>
Date:	Tue, 29 Sep 2015 12:18:08 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Dan Williams <dan.j.williams@...el.com>
Cc:	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Alexander Viro <viro@...iv.linux.org.uk>,
	Matthew Wilcox <willy@...ux.intel.com>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	Jan Kara <jack@...e.cz>
Subject: Re: [PATCH] dax: fix deadlock in __dax_fault

On Mon, Sep 28, 2015 at 03:57:29PM -0700, Dan Williams wrote:
> On Mon, Sep 28, 2015 at 2:35 PM, Dave Chinner <david@...morbit.com> wrote:
> > On Mon, Sep 28, 2015 at 05:13:50AM -0700, Dan Williams wrote:
> >> On Sun, Sep 27, 2015 at 5:59 PM, Dave Chinner <david@...morbit.com> wrote:
> >> > On Fri, Sep 25, 2015 at 09:17:45PM -0600, Ross Zwisler wrote:
> >> >> On Fri, Sep 25, 2015 at 12:53:57PM +1000, Dave Chinner wrote:
> >> [..]
> >> >> Does this sound like a reasonable path forward for v4.3?  Dave, and Jan, can
> >> >> you guys can provide guidance and code reviews for the XFS and ext4 bits?
> >> >
> >> > IMO, it's way too much to get into 4.3. I'd much prefer we revert
> >> > the bad changes in 4.3, and then work towards fixing this for the
> >> > 4.4 merge window. If someone needs this for 4.3, then they can
> >> > backport the 4.4 code to 4.3-stable.
> >> >
> >>
> >> If the proposal is to step back and get a running start at these fixes
> >> for 4.4, then it is worth considering what the state of allocating
> >> pages for DAX mappings will be in 4.4.
> >
> > Oh, do tell. I haven't seen any published design, code, etc,
> 
> This is via the devm_memremap_pages() api that went into 4.2 [1] and
> my v1 (RFC quality) series using it for dax get_user_pages() [2].
> 
> [1]: https://lkml.org/lkml/2015/8/25/841
> [2]: https://lkml.org/lkml/2015/9/23/11

I'll have a look at some point when I'm not trying to put out fires.

> > And, quite frankly, I'm not enabling any new DAX behaviour/subsystem
> > in XFS until I've had time to review, test and fix it so it works
> > without deadlocking or corrupting data.
> 
> I'm in violent agreement, to the point where I'm pondering whether
> CONFIG_FS_DAX should just depend on CONFIG_BROKEN in 4.3 until we've
> convinced ourselves of all the fixes in 4.4.  It's not clear to me
> that we have a stable baseline to which we can revert this "still in
> development" implementation, did you have one in mind?

XFS warns that DAX is experimental when you mount with that option,
so there is no need to do that:

[  686.055780] XFS (ram0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[  686.058464] XFS (ram0): Mounting V5 Filesystem
[  686.062857] XFS (ram0): Ending clean mount

> >> It's already that case that
> >> allocating struct page for DAX mappings is the only solution on the
> >> horizon for enabling a get_user_pages() solution for persistent
> >> memory.  We of course need to get the page-less DAX path fixed up, but
> >> the near-term path to full functionality and safety is when struct
> >> page is available to enable the typical synchronization mechanics.
> >
> > And we do so at the expense of medium to long term complexity and
> > maintenance. I'm no fan of using struct pages to track terabytes to
> > petabytes of persistent memory, and I'm even less of a fan of having
> > to simultaneously support both struct page and pfn based DAX
> > subsystems...
> 
> I'm no fan of tracking petabytes of persistent memory with struct
> page, but we're in the near term space (hardware technology-wise) of
> how to enable DMA/RDMA to 100s of gigabytes to a few terabytes of
> persistent memory.

Don't think I don't know that - as I said to someone a few hours
ago on IRC:

[29/09/15 07:41] <dchinner> I'm sure they do, but they have a hard requirement to support RDMA from persistent memory
[29/09/15 07:41] <dchinner> and that's what seems to be driving the "we need to use struct pages" design

> A page-less solution to that problem is not on the
> horizon as far as I can tell.  In short, I am concerned we are
> spending time working around the lack of struct page to get to a
> stable page-less solution that is still missing support for the use
> cases that are expected to "just work".

I'm concerned with making what we have work before we go and change
everything. You might want to move really quickly, but without sane
filesystem support you can't ship anything worth a damn. There's all
sorts of issues here, and introducing struct pages doesn't solve all
of them.

Let's concentrate on ensuring the basic operation of DAX is robust
first - get the page fault vs extent manipulations serialised, sane
and scalable before we start changing anything else. If we don't
solve these problems, then nothing else we do will be reliable, and
the problems exist regardless of whether we are using struct pages
or not. Hence these are the critical problems we need to fix before
anything else.

Once we have these issues sorted out, switching between struct page
and pfn should be much simpler because we don't have to worry about
different locking strategies to protect against truncate, racing
page faults, etc.

> I do not think introducing page-back persistent memory sets us back to
> square 1.  Instead, given the functionality that is enabled when pages
> are present I think it is safe to assume most platforms will arrange
> for page backed persistent memory.

Sure, but it will take a little time to get there. Moving fast
doesn't help us here - it only results in stuff we have to revert or
redo in the near future and that means progress is much slower than
it should be. Let's solve the DAX problems in the right order - it
will make things simpler and faster down the road.

Cheers,

Dave
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ