[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150319181725.GA17411@infradead.org>
Date: Thu, 19 Mar 2015 11:17:25 -0700
From: Christoph Hellwig <hch@...radead.org>
To: Matthew Wilcox <willy@...ux.intel.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Dan Williams <dan.j.williams@...el.com>,
linux-kernel@...r.kernel.org, linux-arch@...r.kernel.org,
axboe@...nel.dk, riel@...hat.com, linux-nvdimm@...1.01.org,
Dave Hansen <dave.hansen@...ux.intel.com>,
linux-raid@...r.kernel.org, mgorman@...e.de, hch@...radead.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [RFC PATCH 0/7] evacuate struct page from the block layer
On Thu, Mar 19, 2015 at 09:43:13AM -0400, Matthew Wilcox wrote:
> Dan missed "Support O_DIRECT to a mapped DAX file". More generally, if we
> want to be able to do any kind of I/O directly to persistent memory,
> and I think we do, we need to do one of:
>
> 1. Construct struct pages for persistent memory
> 1a. Permanently
> 1b. While the pages are under I/O
> 2. Teach the I/O layers to deal in PFNs instead of struct pages
> 3. Replace struct page with some other structure that can represent both
> DRAM and PMEM
>
> I'm personally a fan of #3, and I was looking at the scatterlist as
> my preferred data structure. I now believe the scatterlist as it is
> currently defined isn't sufficient, so we probably end up needing a new
> data structure. I think Dan's preferred method of replacing struct
> pages with PFNs is actually less instrusive, but doesn't give us as
> much advantage (an entirely new data structure would let us move to an
> extent based system at the same time, instead of sticking with an array
> of pages). Clearly Boaz prefers 1a, which works well enough for the
> 8GB NV-DIMMs, but not well enough for the 400GB NV-DIMMs.
>
> What's your preference? I guess option 0 is "force all I/O to go
> through the page cache and then get copied", but that feels like a nasty
> performance hit.
In addition to the options there's also a time line. At least for the
short term where we want to get something going 1a seems like the
absolutely be option. It works perfectly fine for the lots of small
capacity dram-like nvdimms, and it works funtionally fine for the
special huge ones, although the resource use for it is highly annoying.
If it turns out to be too annoying we can also offer a no I/O possible
option for them in the short run.
In the long run option 2) sounds like a good plan to me, but not as a
parallel I/O path, but as the main one. Doing so will in fact give us
options to experiment with 3). Given that we're moving towards an
increasinly huge page using world replacing the good old struct page
with something extent-like and/or temporary might be needed for dram
as well in the future.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists