lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 23 Jul 2015 17:16:45 +0100
From:	Catalin Marinas <catalin.marinas@....com>
To:	Dave Hansen <dave.hansen@...el.com>
Cc:	Andrea Arcangeli <aarcange@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	linux-mm <linux-mm@...ck.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm: Flush the TLB for a single address in a huge page

On Thu, Jul 23, 2015 at 03:41:24PM +0100, Dave Hansen wrote:
> On 07/23/2015 07:13 AM, Andrea Arcangeli wrote:
> > On Thu, Jul 23, 2015 at 11:49:38AM +0100, Catalin Marinas wrote:
> >> On Thu, Jul 23, 2015 at 12:05:21AM +0100, Dave Hansen wrote:
> >>> On 07/22/2015 03:48 PM, Catalin Marinas wrote:
> >>>> You are right, on x86 the tlb_single_page_flush_ceiling seems to be
> >>>> 33, so for an HPAGE_SIZE range the code does a local_flush_tlb()
> >>>> always. I would say a single page TLB flush is more efficient than a
> >>>> whole TLB flush but I'm not familiar enough with x86.
> >>>
> >>> The last time I looked, the instruction to invalidate a single page is
> >>> more expensive than the instruction to flush the entire TLB. 
> >>
> >> I was thinking of the overall cost of re-populating the TLB after being
> >> nuked rather than the instruction itself.
> > 
> > Unless I'm not aware about timing differences in flushing 2MB TLB
> > entries vs flushing 4kb TLB entries with invlpg, the benchmarks that
> > have been run to tune the optimal tlb_single_page_flush_ceiling value,
> > should already guarantee us that this is a valid optimization (as we
> > just got one entry, we're not even close to the 33 ceiling that makes
> > it more a grey area).
> 
> We had a discussion about this a few weeks ago:
> 
> 	https://lkml.org/lkml/2015/6/25/666
> 
> The argument is that the CPU is so good at refilling the TLB that it
> rarely waits on it, so the "cost" can be very very low.

Interesting thread. I can see from Ingo's benchmarks that invlpg is much
more expensive than the cr3 write but I can't really comment on the
refill cost (it may be small with page table caching in L1/L2). The
problem with small/targeted benchmarks is that you don't see the overall
impact.

On ARM, most recent CPUs can cache intermediate page table levels in the
TLB (usually as VA->pte translation). ARM64 introduces a new TLB
flushing instruction that only touches the last level (pte, huge pmd).
In theory this should be cheaper overall since the CPU doesn't need to
refill intermediate levels. In practice, it's probably lost in the
noise.

Anyway, if you want to keep the option of a full TLB flush for x86 on
huge pages, I'm happy to repost a v2 with a separate
flush_tlb_pmd_huge_page that arch code can define as it sees fit.

-- 
Catalin

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ