[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA9_cmf7=aGXKoQFkzS_UJtznfRtWofitDpV2AyGwpaRGKyQkg@mail.gmail.com>
Date: Thu, 20 Apr 2017 14:46:51 -0700
From: Dan Williams <dan.j.williams@...el.com>
To: Catalin Marinas <catalin.marinas@....com>,
aneesh.kumar@...ux.vnet.ibm.com, steve.capper@...aro.org,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, dave.hansen@...el.com,
Borislav Petkov <bp@...en8.de>, Rik van Riel <riel@...hat.com>,
dann.frazier@...onical.com,
Linus Torvalds <torvalds@...ux-foundation.org>,
Michal Hocko <mhocko@...e.cz>
Cc: linux-tip-commits@...r.kernel.org
Subject: Re: [tip:x86/mm] x86/mm/gup: Switch GUP to the generic
get_user_page_fast() implementation
On Sat, Mar 18, 2017 at 2:52 AM, tip-bot for Kirill A. Shutemov
<tipbot@...or.com> wrote:
> Commit-ID: 2947ba054a4dabbd82848728d765346886050029
> Gitweb: http://git.kernel.org/tip/2947ba054a4dabbd82848728d765346886050029
> Author: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> AuthorDate: Fri, 17 Mar 2017 00:39:06 +0300
> Committer: Ingo Molnar <mingo@...nel.org>
> CommitDate: Sat, 18 Mar 2017 09:48:03 +0100
>
> x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation
>
> This patch provides all required callbacks required by the generic
> get_user_pages_fast() code and switches x86 over - and removes
> the platform specific implementation.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@...ux.intel.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Aneesh Kumar K . V <aneesh.kumar@...ux.vnet.ibm.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Dann Frazier <dann.frazier@...onical.com>
> Cc: Dave Hansen <dave.hansen@...el.com>
> Cc: H. Peter Anvin <hpa@...or.com>
> Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: Steve Capper <steve.capper@...aro.org>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: linux-arch@...r.kernel.org
> Cc: linux-mm@...ck.org
> Link: http://lkml.kernel.org/r/20170316213906.89528-1-kirill.shutemov@linux.intel.com
> [ Minor readability edits. ]
> Signed-off-by: Ingo Molnar <mingo@...nel.org>
I'm still trying to spot the bug, but bisect points to this patch as
the point at which my unit tests start failing with the following
signature:
[ 35.423841] WARNING: CPU: 8 PID: 245 at lib/percpu-refcount.c:155
percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
[ 35.425328] percpu ref (dax_pmem_percpu_release [dax_pmem]) <= 0
(0) after switching to atomic
[ 35.425329] Modules linked in: ip6t_rpfilter ip6t_REJECT
nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip
6table_mangle ip6table_raw ip6table_security iptable_nat
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
iptable_mangle iptable_raw iptable_security ebtable_filter ebtables
ip6table_filter ip6_tables crct10dif_pclmul crc32_pclmul crc32c_intel
ghash_clmulni_intel nd_pmem(O) dax_pmem(O) nd_btt(O) dax(O) serio_raw
nfit(O) nd_e820(O) libnvdimm(O) tpm_tis tpm_tis_co
re tpm nfit_test_iomap(O) nfsd nfs_acl
[ 35.433683] CPU: 8 PID: 245 Comm: rcuos/29 Tainted: G O
4.11.0-rc2+ #55
[ 35.435538] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.9.3-1.fc25 04/01/2014
[ 35.437500] Call Trace:
[ 35.438270] dump_stack+0x86/0xc3
[ 35.439156] __warn+0xcb/0xf0
[ 35.439995] warn_slowpath_fmt+0x5f/0x80
[ 35.440962] ? rcu_nocb_kthread+0x27a/0x500
[ 35.441957] ? dax_pmem_percpu_exit+0x50/0x50 [dax_pmem]
[ 35.443107] percpu_ref_switch_to_atomic_rcu+0x1f5/0x200
[ 35.444251] ? percpu_ref_exit+0x60/0x60
[ 35.445206] rcu_nocb_kthread+0x327/0x500
[ 35.446186] ? rcu_nocb_kthread+0x27a/0x500
[ 35.447188] kthread+0x10c/0x140
[ 35.448058] ? rcu_eqs_enter+0x50/0x50
[ 35.448990] ? kthread_create_on_node+0x60/0x60
[ 35.450038] ret_from_fork+0x31/0x40
[ 35.450976] ---[ end trace eaa40898a09519b5 ]---
This is similar to the backtrace when we were not properly handling
pud faults and was fixed with this commit: 220ced1676c4 "mm: fix
get_user_pages() vs device-dax pud mappings"
I've found some missing _devmap checks in the generic
get_user_pages_fast() path, but this does not fix the regression:
diff --git a/mm/gup.c b/mm/gup.c
index 2559a3987de7..89156cd59cbc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1475,7 +1475,8 @@ static int gup_pmd_range(pud_t pud, unsigned
long addr, unsigned long end,
if (pmd_none(pmd))
return 0;
- if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd))) {
+ if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd)
+ || pmd_devmap(pmd))) {
/*
* NUMA hinting faults need to be handled in the GUP
* slowpath for accounting purposes and so that they
@@ -1516,7 +1517,7 @@ static int gup_pud_range(p4d_t p4d, unsigned
long addr, unsigned long end,
next = pud_addr_end(addr, end);
if (pud_none(pud))
return 0;
- if (unlikely(pud_huge(pud))) {
+ if (unlikely(pud_huge(pud) || pud_devmap(pud))) {
if (!gup_huge_pud(pud, pudp, addr, next, write,
pages, nr))
return 0;
...more hunting tomorrow.
Powered by blists - more mailing lists