[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150507201815.GD5966@gmail.com>
Date: Thu, 7 May 2015 16:18:17 -0400
From: Jerome Glisse <j.glisse@...il.com>
To: Ingo Molnar <mingo@...nel.org>
Cc: Dave Hansen <dave.hansen@...ux.intel.com>,
Dan Williams <dan.j.williams@...el.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Boaz Harrosh <boaz@...xistor.com>, Jan Kara <jack@...e.cz>,
Mike Snitzer <snitzer@...hat.com>, Neil Brown <neilb@...e.de>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>,
Heiko Carstens <heiko.carstens@...ibm.com>,
Chris Mason <clm@...com>, Paul Mackerras <paulus@...ba.org>,
"H. Peter Anvin" <hpa@...or.com>, Christoph Hellwig <hch@....de>,
Alasdair Kergon <agk@...hat.com>,
"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
Mel Gorman <mgorman@...e.de>,
Matthew Wilcox <willy@...ux.intel.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
Rik van Riel <riel@...hat.com>,
Martin Schwidefsky <schwidefsky@...ibm.com>,
Jens Axboe <axboe@...nel.dk>, Theodore Ts'o <tytso@....edu>,
"Martin K. Petersen" <martin.petersen@...cle.com>,
Julia Lawall <Julia.Lawall@...6.fr>, Tejun Heo <tj@...nel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
paulmck@...ux.vnet.ibm.com
Subject: Re: [PATCH v2 00/10] evacuate struct page from the block layer,
introduce __pfn_t
On Thu, May 07, 2015 at 09:53:13PM +0200, Ingo Molnar wrote:
>
> * Ingo Molnar <mingo@...nel.org> wrote:
>
> > > Is handling kernel pagefault on the vmemmap completely out of the
> > > picture ? So we would carveout a chunck of kernel address space
> > > for those pfn and use it for vmemmap and handle pagefault on it.
> >
> > That's pretty clever. The page fault doesn't even have to do remote
> > TLB shootdown, because it only establishes mappings - so it's pretty
> > atomic, a bit like the minor vmalloc() area faults we are doing.
> >
> > Some sort of LRA (least recently allocated) scheme could unmap the
> > area in chunks if it's beyond a certain size, to keep a limit on
> > size. Done from the same context and would use remote TLB shootdown.
> >
> > The only limitation I can see is that such faults would have to be
> > able to sleep, to do the allocation. So pfn_to_page() could not be
> > used in arbitrary contexts.
>
> So another complication would be that we cannot just unmap such pages
> when we want to recycle them, because the struct page in them might be
> in use - so all struct page uses would have to refcount the underlying
> page. We don't really do that today: code just looks up struct pages
> and assumes they never go away.
I still think this is doable, like i said in another email, i think we
should introduce a special pfn_to_page_dev|pmem|waffle|somethingyoulike()
to place that are allowed to allocate the underlying struct page.
For instance we can use a default page to backup all this special vmem
range with some specialy crafted struct page that says that its is
invalid memory (make this mapping read only so all write to this
special struct page is forbidden).
Now once an authorized user comes along and need a real struct page it
trigger a page allocation that replace the page full of fake invalid
struct page with a page with correct valid struct page that can be
manipulated by other part of the kernel.
So regular pfn_to_page() would test against special vmemmap and if
special test the content of struct page for some flag. If it's the
invalid page flag it returns 0.
But once a proper struct page is allocated then pfn_page would return
the struct page as expected.
That way you will catch all invalid user of such page ie user that use
the page after its lifetime is done. You will also limit the creation
of the underlying proper struct page to only code that are legitimate
to ask for a proper struct page for given pfn.
Also you would get kernel write fault on the page full of fake struct
page and that would allow to catch further wrong use.
Anyway this is how i envision this and i think it would work for my
usecase too (GPU it is for me :))
Cheers,
Jérôme
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists