[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201116163213.GG29991@casper.infradead.org>
Date: Mon, 16 Nov 2020 16:32:13 +0000
From: Matthew Wilcox <willy@...radead.org>
To: Dave Hansen <dave.hansen@...el.com>
Cc: "Kirill A. Shutemov" <kirill@...temov.name>,
Peter Zijlstra <peterz@...radead.org>,
kan.liang@...ux.intel.com, mingo@...nel.org, acme@...nel.org,
mark.rutland@....com, alexander.shishkin@...ux.intel.com,
jolsa@...hat.com, eranian@...gle.com, christophe.leroy@...roup.eu,
npiggin@...il.com, linuxppc-dev@...ts.ozlabs.org,
mpe@...erman.id.au, will@...nel.org, aneesh.kumar@...ux.ibm.com,
sparclinux@...r.kernel.org, davem@...emloft.net,
catalin.marinas@....com, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, ak@...ux.intel.com,
kirill.shutemov@...ux.intel.com
Subject: Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE
On Mon, Nov 16, 2020 at 08:28:23AM -0800, Dave Hansen wrote:
> On 11/16/20 7:54 AM, Matthew Wilcox wrote:
> > It gets even more complicated with CPUs with multiple levels of TLB
> > which support different TLB entry sizes. My CPU reports:
> >
> > TLB info
> > Instruction TLB: 2M/4M pages, fully associative, 8 entries
> > Instruction TLB: 4K pages, 8-way associative, 64 entries
> > Data TLB: 1GB pages, 4-way set associative, 4 entries
> > Data TLB: 4KB pages, 4-way associative, 64 entries
> > Shared L2 TLB: 4KB/2MB pages, 6-way associative, 1536 entries
>
> It's even "worse" on recent AMD systems. Those will coalesce multiple
> adjacent PTEs into a single TLB entry. I think Alphas did something
> like this back in the day with an opt-in.
I debated mentioning that ;-) We can detect in software whether that's
_possible_, but we can't detect whether it's *done* it. I heard it
sometimes takes several faults on the 4kB entries for the CPU to decide
that it's beneficial to use a 32kB TLB entry. But this is all rumour.
> Anyway, the changelog should probably replace:
>
> > This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate TLB
> > page sizes.
>
> with something more like:
>
> This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate page
> table mapping sizes.
>
> That's really the best we can do from software without digging into
> microarchitecture-specific events.
I mean this is perf. Digging into microarch specific events is what it
does ;-)
Powered by blists - more mailing lists