[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1448316903.19320.46.camel@hpe.com>
Date: Mon, 23 Nov 2015 15:15:03 -0700
From: Toshi Kani <toshi.kani@....com>
To: Dan Williams <dan.j.williams@...el.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
Matthew Wilcox <willy@...ux.intel.com>,
Ross Zwisler <ross.zwisler@...ux.intel.com>,
mauricio.porto@....com, Linux MM <linux-mm@...ck.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
"linux-nvdimm@...ts.01.org" <linux-nvdimm@...ts.01.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping
On Mon, 2015-11-23 at 12:53 -0800, Dan Williams wrote:
> On Mon, Nov 23, 2015 at 12:04 PM, Toshi Kani <toshi.kani@....com> wrote:
> > The following oops was observed when mmap() with MAP_POPULATE
> > pre-faulted pmd mappings of a DAX file. follow_trans_huge_pmd()
> > expects that a target address has a struct page.
> >
> > BUG: unable to handle kernel paging request at ffffea0012220000
> > follow_trans_huge_pmd+0xba/0x390
> > follow_page_mask+0x33d/0x420
> > __get_user_pages+0xdc/0x800
> > populate_vma_page_range+0xb5/0xe0
> > __mm_populate+0xc5/0x150
> > vm_mmap_pgoff+0xd5/0xe0
> > SyS_mmap_pgoff+0x1c1/0x290
> > SyS_mmap+0x1b/0x30
> >
> > Fix it by making the PMD pre-fault handling consistent with PTE.
> > After pre-faulted in faultin_page(), follow_page_mask() calls
> > follow_trans_huge_pmd(), which is changed to call follow_pfn_pmd()
> > for VM_PFNMAP or VM_MIXEDMAP. follow_pfn_pmd() handles FOLL_TOUCH
> > and returns with -EEXIST.
>
> As of 4.4.-rc2 DAX pmd mappings are disabled. So we have time to do
> something more comprehensive in 4.5.
Yes, I noticed during my testing that I could not use pmd...
> > Reported-by: Mauricio Porto <mauricio.porto@....com>
> > Signed-off-by: Toshi Kani <toshi.kani@....com>
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> > Cc: Matthew Wilcox <willy@...ux.intel.com>
> > Cc: Dan Williams <dan.j.williams@...el.com>
> > Cc: Ross Zwisler <ross.zwisler@...ux.intel.com>
> > ---
> > mm/huge_memory.c | 34 ++++++++++++++++++++++++++++++++++
> > 1 file changed, 34 insertions(+)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d5b8920..f56e034 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> [..]
> > @@ -1288,6 +1315,13 @@ struct page *follow_trans_huge_pmd(struct
> > vm_area_struct *vma,
> > if ((flags & FOLL_NUMA) && pmd_protnone(*pmd))
> > goto out;
> >
> > + /* pfn map does not have a struct page */
> > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) {
> > + ret = follow_pfn_pmd(vma, addr, pmd, flags);
> > + page = ERR_PTR(ret);
> > + goto out;
> > + }
> > +
> > page = pmd_page(*pmd);
> > VM_BUG_ON_PAGE(!PageHead(page), page);
> > if (flags & FOLL_TOUCH) {
>
> I think it is already problematic that dax pmd mappings are getting
> confused with transparent huge pages.
We had the same issue with dax pte mapping [1], and this change extends the pfn
map handling to pmd. So, this problem is not specific to pmd.
[1] https://lkml.org/lkml/2015/6/23/181
> They're more closely related to
> a hugetlbfs pmd mappings in that they are mapping an explicit
> allocation. I have some pending patches to address this dax-pmd vs
> hugetlb-pmd vs thp-pmd classification that I will post shortly.
Not sure which way is better, but I am certainly interested in your changes.
> By the way, I'm collecting DAX pmd regression tests [1], is this just
> a simple crash upon using MAP_POPULATE?
>
> [1]: https://github.com/pmem/ndctl/blob/master/lib/test-dax-pmd.c
Yes, this issue is easy to reproduce with MAP_POPULATE. In case it helps,
attached are the test I used for testing the patches. Sorry, the code is messy
since it was only intended for my internal use...
- The test was originally written for the pte change [1] and comments in
test.sh (ex. mlock fail, ok) reflect the results without the pte change.
- For the pmd test, I modified test-mmap.c to call posix_memalign() before
mmap(). By calling free(), the 2MB-aligned address from posix_memalign() can be
used for mmap(). This keeps the mmap'd address aligned on 2MB.
- I created test file(s) with dd (i.e. all blocks written) in my test.
- The other infinite loop issue (fixed by my other patch) was found by the test
case with option "-LMSr".
Thanks,
-Toshi
Download attachment "test.sh" of type "application/x-shellscript" (2586 bytes)
View attachment "test-mmap.c" of type "text/x-csrc" (4334 bytes)
Powered by blists - more mailing lists