linux-kernel - Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures occur

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150319232326.GM10105@dastard>
Date:	Fri, 20 Mar 2015 10:23:26 +1100
From:	Dave Chinner <david@...morbit.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Mel Gorman <mgorman@...e.de>, Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Aneesh Kumar <aneesh.kumar@...ux.vnet.ibm.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linux-MM <linux-mm@...ck.org>, xfs@....sgi.com,
	ppc-dev <linuxppc-dev@...ts.ozlabs.org>
Subject: Re: [PATCH 4/4] mm: numa: Slow PTE scan rate if migration failures
 occur

On Thu, Mar 19, 2015 at 04:05:46PM -0700, Linus Torvalds wrote:
> On Thu, Mar 19, 2015 at 3:41 PM, Dave Chinner <david@...morbit.com> wrote:
> >
> > My recollection wasn't faulty - I pulled it from an earlier email.
> > That said, the original measurement might have been faulty. I ran
> > the numbers again on the 3.19 kernel I saved away from the original
> > testing. That came up at 235k, which is pretty much the same as
> > yesterday's test. The runtime,however, is unchanged from my original
> > measurements of 4m54s (pte_hack came in at 5m20s).
> 
> Ok. Good. So the "more than an order of magnitude difference" was
> really about measurement differences, not quite as real. Looks like
> more a "factor of two" than a factor of 20.
> 
> Did you do the profiles the same way? Because that would explain the
> differences in the TLB flush percentages too (the "1.4% from
> tlb_invalidate_range()" vs "pretty much everything from migration").

No, the profiles all came from steady state. The profiles from the
initial startup phase hammer the mmap_sem because of page fault vs
mprotect contention (glibc runs mprotect() on every chunk of
memory it allocates). It's not until the cache reaches "full" and it
starts recycling old buffers rather than allocating new ones that
the tlb flush problem dominates the profiles.

> The runtime variation does show that there's some *big* subtle
> difference for the numa balancing in the exact TNF_NO_GROUP details.
> It must be *very* unstable for it to make that big of a difference.
> But I feel at least a *bit* better about "unstable algorithm changes a
> small varioation into a factor-of-two" vs that crazy factor-of-20.
> 
> Can you try Mel's change to make it use
> 
>         if (!(vma->vm_flags & VM_WRITE))
> 
> instead of the pte details? Again, on otherwise plain 3.19, just so
> that we have a baseline. I'd be *so* much happer with checking the vma
> details over per-pte details, especially ones that change over the
> lifetime of the pte entry, and the NUMA code explicitly mucks with.

Yup, will do. might take an hour or two before I get to it, though...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/