lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140114154457.GD4963@suse.de>
Date:	Tue, 14 Jan 2014 15:44:57 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Alex Thorlton <athorlton@....com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	"Kirill A. Shutemov" <kirill@...temov.name>, linux-mm@...ck.org,
	Ingo Molnar <mingo@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Rik van Riel <riel@...hat.com>,
	Naoya Horiguchi <n-horiguchi@...jp.nec.com>,
	Oleg Nesterov <oleg@...hat.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Andy Lutomirski <luto@...capital.net>,
	Al Viro <viro@...iv.linux.org.uk>,
	Kees Cook <keescook@...omium.org>,
	Andrea Arcangeli <aarcange@...hat.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] mm: thp: Add per-mm_struct flag to control THP

On Fri, Jan 10, 2014 at 04:39:09PM -0600, Alex Thorlton wrote:
> On Fri, Jan 10, 2014 at 11:10:10PM +0100, Peter Zijlstra wrote:
> > We already have the information to determine if a page is shared across
> > nodes, Mel even had some prototype code to do splits under those
> > conditions.
> 
> I'm aware that we can determine if pages are shared across nodes, but I
> thought that Mel's code to split pages under these conditions had some
> performance issues.  I know I've seen the code that Mel wrote to do
> this, but I can't seem to dig it up right now.  Could you point me to
> it?
> 

It was a lot of revisions ago! The git branches no longer exist but the
diff from the monolithic patches is below. The baseline was v3.10 and
this will no longer apply but you'll see the two places where I added a
split_huge_page and prevented khugepaged collapsing them again. At the
time, the performance with it applied was much worse but it was a 10
minute patch as a distraction. There was a range of basic problems that
had to be tackled before there was any point looking at splitting THP due
to locality. I did not pursue it further and have not revisited it since.

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c8b25a8..2b80abe 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1317,6 +1317,23 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
 	last_nidpid = page_nidpid_last(page);
 	target_nid = mpol_misplaced(page, vma, haddr);
 	if (target_nid == -1) {
+		int last_pid = nidpid_to_pid(last_nidpid);
+
+		/*
+		 * If the fault failed to pass the two-stage filter but is on
+		 * a remote node then it could be due to false sharing of the
+		 * THP page. Remote accesses are more expensive than base
+		 * page TLB accesses so split the huge page and return to
+		 * retry the fault.
+		 */
+		if (!nidpid_nid_unset(last_nidpid) &&
+		    src_nid != page_to_nid(page) &&
+		    last_pid != (current->pid & LAST__PID_MASK)) {
+			spin_unlock(&mm->page_table_lock);
+			split_huge_page(page);
+			put_page(page);
+			return 0;
+		}
 		put_page(page);
 		goto clear_pmdnuma;
 	}
@@ -2398,6 +2415,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 	unsigned long _address;
 	spinlock_t *ptl;
 	int node = NUMA_NO_NODE;
+	int hint_node = NUMA_NO_NODE;
 
 	VM_BUG_ON(address & ~HPAGE_PMD_MASK);
 
@@ -2427,8 +2445,20 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
 		 * be more sophisticated and look at more pages,
 		 * but isn't for now.
 		 */
-		if (node == NUMA_NO_NODE)
+		if (node == NUMA_NO_NODE) {
+			int nidpid = page_nidpid_last(page);
 			node = page_to_nid(page);
+			hint_node = nidpid_to_nid(nidpid);
+		}
+
+		/*
+		 * If the range is receiving hinting faults from CPUs on
+		 * different nodes then prioritise locality over TLB
+		 * misses
+		 */
+		if (nidpid_to_nid(page_nidpid_last(page)) != hint_node)
+			goto out_unmap;
+
 		VM_BUG_ON(PageCompound(page));
 		if (!PageLRU(page) || PageLocked(page) || !PageAnon(page))
 			goto out_unmap;

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ