[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1520548101.2693.106.camel@hpe.com>
Date: Thu, 8 Mar 2018 21:43:25 +0000
From: "Kani, Toshi" <toshi.kani@....com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"gratian.crisan@...com" <gratian.crisan@...com>
CC: "mingo@...nel.org" <mingo@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"julia.cartwright@...com" <julia.cartwright@...com>,
"torvalds@...ux-foundation.org" <torvalds@...ux-foundation.org>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"bp@...e.de" <bp@...e.de>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"hpa@...or.com" <hpa@...or.com>,
"brgerst@...il.com" <brgerst@...il.com>,
"luto@...nel.org" <luto@...nel.org>,
"dave.hansen@...el.com" <dave.hansen@...el.com>,
"dvlasenk@...hat.com" <dvlasenk@...hat.com>,
"gratian@...il.com" <gratian@...il.com>
Subject: Re: Kernel page fault in vmalloc_fault() after a preempted ioremap
On Thu, 2018-03-08 at 14:34 -0600, Gratian Crisan wrote:
> Hi all,
>
> We are seeing kernel page faults happening on module loads with certain
> drivers like the i915 video driver[1]. This was initially discovered on
> a 4.9 PREEMPT_RT kernel. It takes 5 days on average to reproduce using a
> simple reboot loop test. Looking at the code paths involved I believe
> the issue is still present in the latest vanilla kernel.
>
> Some relevant points are:
>
> * x86_64 CPU: Intel Atom E3940
>
> * CONFIG_HUGETLBFS is not set (which also gates CONFIG_HUGETLB_PAGE)
>
> Based on function traces I was able to gather the sequence of events is:
>
> 1. Driver starts a ioremap operation for a region that is PMD_SIZE in
> size (or PUD_SIZE).
>
> 2. The ioremap() operation is preempted while it's in the middle of
> setting up the page mappings:
> ioremap_page_range->...->ioremap_pmd_range->pmd_set_huge <<preempted>>
>
> 3. Unrelated tasks run. Traces also include some cross core scheduling
> IPI calls.
>
> 4. Driver resumes execution finishes the ioremap operation and tries to
> access the newly mapped IO region. This triggers a vmalloc fault.
>
> 5. The vmalloc_fault() function hits a kernel page fault when trying to
> dereference a non-existent *pte_ref.
>
> The reason this happens is the code paths called from ioremap_page_range()
> make different assumptions about when a large page (pud/pmd) mapping can be
> used versus the code paths in vmalloc_fault().
>
> Using the PMD sized ioremap case as an example (the PUD case is similar):
> ioremap_pmd_range() calls ioremap_pmd_enabled() which is gated by
> CONFIG_HAVE_ARCH_HUGE_VMAP. On x86_64 this will return true unless the
> "nohugeiomap" kernel boot parameter is passed in.
>
> On the other hand, in the rare case when a page fault happens in the
> ioremap'ed region, vmalloc_fault() calls the pmd_huge() function to check
> if a PMD page is marked huge or if it should go on and get a reference to
> the PTE. However pmd_huge() is conditionally compiled based on the user
> configured CONFIG_HUGETLB_PAGE selected by CONFIG_HUGETLBFS. If the
> CONFIG_HUGETLBFS option is not enabled pmd_huge() is always defined to be
> 0.
>
> The end result is an OOPS in vmalloc_fault() when the non-existent pte_ref
> is dereferenced because the test for pmd_huge() failed.
>
> Commit f4eafd8bcd52 ("x86/mm: Fix vmalloc_fault() to handle large pages
> properly") attempted to fix the mismatch between ioremap() and
> vmalloc_fault() with regards to huge page handling but it missed this use
> case.
>
> I am working on a simpler reproducing case however so far I've been
> unsuccessful in re-creating the conditions that trigger the vmalloc fault
> in the first place. Adding explicit scheduling points in
> ioremap_pmd_range/pmd_set_huge doesn't seem to be sufficient. Ideas
> appreciated.
>
> Any thoughts on what a correct fix would look like? Should the ioremap
> code paths respect the HUGETLBFS config or would it be better for the
> vmalloc fault code paths to match the tests used in ioremap and not rely
> on the HUGETLBFS option being enabled?
Thanks for the report and analysis! I believe pud_large() and
pmd_large() should have been used here. I will try to reproduce the
issue and verify the fix.
-Toshi
Powered by blists - more mailing lists