lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Nov 2020 22:33:44 +0000
From:   Matthew Wilcox <willy@...radead.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     "Liang, Kan" <kan.liang@...ux.intel.com>,
        Will Deacon <will@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>, mingo@...hat.com,
        acme@...nel.org, linux-kernel@...r.kernel.org,
        mark.rutland@....com, alexander.shishkin@...ux.intel.com,
        jolsa@...hat.com, eranian@...gle.com, ak@...ux.intel.com,
        dave.hansen@...el.com, kirill.shutemov@...ux.intel.com,
        benh@...nel.crashing.org, paulus@...ba.org,
        David Miller <davem@...emloft.net>, vbabka@...e.cz
Subject: Re: [PATCH V9 1/4] perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE

On Wed, Nov 11, 2020 at 09:00:00PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2020 at 06:26:20PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 11, 2020 at 06:22:53PM +0100, Peter Zijlstra wrote:
> > > On Wed, Nov 11, 2020 at 04:38:48PM +0000, Matthew Wilcox wrote:
> > > > 	if (pud_leaf(pud))
> > > > 		return PUD_SIZE;
> > > 
> > > But that doesn't handle non-pagetable aligned hugetlb sizes. Granted,
> > > that's unlikely at the PUD level, but why be inconsistent..
> > > 
> > > So we really want:
> > > 
> > > 	if (p*d_leaf(p*d)) {
> > > 		if (!'special') {
> > > 			page = p*d_page(p*d);
> > > 			if (PageHuge(page))
> > > 				return page_size(compound_head(page));
> > > 		}
> > > 		return P*D_SIZE;
> > > 	}
> > 
> > Still doesn't work because pages can be mapped at funny offsets.
> 
> Wait, what?! Is there hardware that has unaligned TLB page-sizes?

No, you can force a 2MB page to be mapped at an address which isn't
2MB aligned.

> Can you start a 64K page at an 8k offset? I don't think I've ever seen
> that. Still even with that, how would the above go wrong there? It would
> find the compound page covering @addr, PageHuge() (and possibly some
> addition arch specific condition) returns true and we get the compound
> size to find the hardware page size used.

On any architecture I can think of, that 2MB page will be mapped with 4kB
TLB entries.

> > What we really want is for a weak definition of
> > 
> > unsigned long tlb_size(struct mm_struct *mm, unsigned long addr)
> > {
> > 	if (p*d_leaf(p*d))
> > 		return p*d_size(p*d);
> > }
> > 
> > then ARM can look at its special bit in the page table to determine
> > whether this is a singleton or part of a brace of pages.
> 
> That's basically what we provide. but really the only thing that's
> missing from this generic page walker is the ability to detect if a
> !PageHuge compound page is actually still a hardware page.
> 
> > > Now, when you add !PMD THP sizes (presumably for architectures that have
> > > 'funny' sizes, otherwise what's the point), then you get to add '||
> > 
> > This is the problem with all the huge page support in Linux today.
> > It's written by people who work for hardware companies who think only
> > about exploiting the hardware features they sell.  You all ignore the
> > very real software overhedas of trying to manage millions of pages.
> > I see a 6% reduction in kernel overhead when running kernbench using
> > THPs that may go as large as 256kB.  On x86.  Intel x86, at that.
> 
> That's a really nice improvement. However then this code doesn't care
> about it. Please make it possible to distinguish between THP on hardware
> pages vs software pages.

That can and should be done just by looking at the page table entries.
There's no need to convert it into a struct page.  The CPU obviously
decides what TLB entry size to use based solely on the page tables,
so we can too.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ