[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1345647560-30387-6-git-send-email-aarcange@redhat.com>
Date: Wed, 22 Aug 2012 16:58:49 +0200
From: Andrea Arcangeli <aarcange@...hat.com>
To: linux-kernel@...r.kernel.org, linux-mm@...ck.org
Cc: Hillf Danton <dhillf@...il.com>, Dan Smith <danms@...ibm.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...e.hu>, Paul Turner <pjt@...gle.com>,
Suresh Siddha <suresh.b.siddha@...el.com>,
Mike Galbraith <efault@....de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Lai Jiangshan <laijs@...fujitsu.com>,
Bharata B Rao <bharata.rao@...il.com>,
Lee Schermerhorn <Lee.Schermerhorn@...com>,
Rik van Riel <riel@...hat.com>,
Johannes Weiner <hannes@...xchg.org>,
Srivatsa Vaddagiri <vatsa@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Alex Shi <alex.shi@...el.com>,
Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>,
Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
Don Morris <don.morris@...com>,
Benjamin Herrenschmidt <benh@...nel.crashing.org>
Subject: [PATCH 05/36] autonuma: teach gup_fast about pmd_numa
In the special "pmd" mode of knuma_scand
(/sys/kernel/mm/autonuma/knuma_scand/pmd == 1), the pmd may be of numa
type (_PAGE_PRESENT not set), however the pte might be
present. Therefore, gup_pmd_range() must return 0 in this case to
avoid losing a NUMA hinting page fault during gup_fast.
Note: gup_fast will skip over non present ptes (like numa types), so
no explicit check is needed for the pte_numa case. gup_fast will also
skip over THP when the trans huge pmd is non present. So, the pmd_numa
case will also be correctly skipped with no additional code changes
required.
Acked-by: Rik van Riel <riel@...hat.com>
Signed-off-by: Andrea Arcangeli <aarcange@...hat.com>
---
arch/x86/mm/gup.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..02c5ec5 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -163,8 +163,19 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
* can't because it has irq disabled and
* wait_split_huge_page() would never return as the
* tlb flush IPI wouldn't run.
+ *
+ * The pmd_numa() check is needed because the code
+ * doesn't check the _PAGE_PRESENT bit of the pmd if
+ * the gup_pte_range() path is taken. NOTE: not all
+ * gup_fast users will will access the page contents
+ * using the CPU through the NUMA memory channels like
+ * KVM does. So we're forced to trigger NUMA hinting
+ * page faults unconditionally for all gup_fast users
+ * even though NUMA hinting page faults aren't useful
+ * to I/O drivers that will access the page with DMA
+ * and not with the CPU.
*/
- if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd) || pmd_numa(pmd))
return 0;
if (unlikely(pmd_large(pmd))) {
if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists