linux-kernel - Re: [PATCH v2 00/10] evacuate struct page from the block layer, introduce __pfn

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 7 May 2015 08:00:05 -0700
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Dan Williams <dan.j.williams@...el.com>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Boaz Harrosh <boaz@...xistor.com>, Jan Kara <jack@...e.cz>,
	Mike Snitzer <snitzer@...hat.com>, Neil Brown <neilb@...e.de>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Heiko Carstens <heiko.carstens@...ibm.com>,
	Chris Mason <clm@...com>, Paul Mackerras <paulus@...ba.org>,
	"H. Peter Anvin" <hpa@...or.com>, Christoph Hellwig <hch@....de>,
	Alasdair Kergon <agk@...hat.com>,
	"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
	Ingo Molnar <mingo@...nel.org>, Mel Gorman <mgorman@...e.de>,
	Matthew Wilcox <willy@...ux.intel.com>,
	Ross Zwisler <ross.zwisler@...ux.intel.com>,
	Rik van Riel <riel@...hat.com>,
	Martin Schwidefsky <schwidefsky@...ibm.com>,
	Jens Axboe <axboe@...nel.dk>, "Theodore Ts'o" <tytso@....edu>,
	"Martin K. Petersen" <martin.petersen@...cle.com>,
	Julia Lawall <Julia.Lawall@...6.fr>, Tejun Heo <tj@...nel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer,
 introduce __pfn_t

On Wed, May 6, 2015 at 7:36 PM, Dan Williams <dan.j.williams@...el.com> wrote:
>
> My pet concrete example is covered by __pfn_t.  Referencing persistent
> memory in an md/dm hierarchical storage configuration.  Setting aside
> the thrash to get existing block users to do "bvec_set_page(page)"
> instead of "bvec->page = page" the onus is on that md/dm
> implementation and backing storage device driver to operate on
> __pfn_t.  That use case is simple because there is no use of page
> locking or refcounting in that path, just dma_map_page() and
> kmap_atomic().

So clarify for me: are you trying to make the IO stack in general be
able to use the persistent memory as a source (or destination) for IO
to _other_ devices, or are you talking about just internally shuffling
things around for something like RAID on top of persistent memory?

Because I think those are two very different things.

For example, one of the things I worry about is for people doing IO
from persistent memory directly to some "slow stable storage" (aka
disk). That was what I thought you were aiming for: infrastructure so
that you can make a bio for a *disk* device contain a page list that
is the persistent memory.

And I think that is a very dangerous operation to do, because the
persistent memory itself is going to have some filesystem on it, so
anything that looks up the persistent memory pages is *not* going to
have a stable pfn: the pfn will point to a fixed part of the
persistent memory, but the file that was there may be deleted and the
memory reassigned to something else.

That's the kind of thing that "struct page" helps with for normal IO
devices. It's both a source of serialization and indirection, so that
when somebody does a "truncate()" on a file, we don't end up doing IO
to random stale locations on the disk that got reassigned to another
file.

So "struct page" is very fundamental. It's *not* just a "this is the
physical source/drain of the data you are doing IO on".

So if you are looking at some kind of "zero-copy IO", where you can do
IO from a filesystem on persistent storage to *another* filesystem on
(say, a big rotational disk used for long-term storage) by just doing
a bo that targets the disk, but has the persistent memory as the
source memory, I really want to understand how you are going to
serialize this.

So *that* is what I meant by "What is the primary thing that is
driving this need? Do we have a very concrete example?"

I abvsolutely do *not* want to teach the bio subsystem to just
randomly be able to take the source/destination of the IO as being
some random pfn without knowing what the actual uses are and how these
IO's are generated in the first place.

I was assuming that you wanted to do something where you mmap() the
persistent memory, and then write it out to another device (possibly
using aio_write()). But that really does require some kind of
serialization at a higher level, because you can't just look up the
pfn's in the page table and assume they are stable: they are *not*
stable.

                         Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/