[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65a04f0d-668b-47b6-a532-d1e11ef4835a@intel.com>
Date: Thu, 19 Dec 2024 13:17:44 -0800
From: "Chen, Zide" <zide.chen@...el.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Arnaldo Carvalho de Melo
<acme@...nel.org>, Namhyung Kim <namhyung@...nel.org>,
Mark Rutland <mark.rutland@....com>,
Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
Jiri Olsa <jolsa@...nel.org>, Ian Rogers <irogers@...gle.com>,
Adrian Hunter <adrian.hunter@...el.com>,
Kan Liang <kan.liang@...ux.intel.com>, linux-perf-users@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Matthew Wilcox <willy@...radead.org>, David Hildenbrand <david@...hat.com>,
Yi Lai <yi1.lai@...ux.intel.com>
Subject: Re: [PATCH v3] perf: map pages in advance
On 12/5/2024 12:29 AM, Lorenzo Stoakes wrote:
> We are adjusting struct page to make it smaller, removing unneeded fields
> which correctly belong to struct folio.
>
> Two of those fields are page->index and page->mapping. Perf is currently
> making use of both of these. This is unnecessary. This patch eliminates
> this.
>
> Perf establishes its own internally controlled memory-mapped pages using
> vm_ops hooks. The first page in the mapping is the read/write user control
> page, and the rest of the mapping consists of read-only pages.
>
> The VMA is backed by kernel memory either from the buddy allocator or
> vmalloc depending on configuration. It is intended to be mapped read/write,
> but because it has a page_mkwrite() hook, vma_wants_writenotify() indicates
> that it should be mapped read-only.
>
> When a write fault occurs, the provided page_mkwrite() hook,
> perf_mmap_fault() (doing double duty handing faults as well) uses the
> vmf->pgoff field to determine if this is the first page, allowing for the
> desired read/write first page, read-only rest mapping.
>
> For this to work the implementation has to carefully work around faulting
> logic. When a page is write-faulted, the fault() hook is called first, then
> its page_mkwrite() hook is called (to allow for dirty tracking in file
> systems).
>
> On fault we set the folio's mapping in perf_mmap_fault(), this is because
> when do_page_mkwrite() is subsequently invoked, it treats a missing mapping
> as an indicator that the fault should be retried.
>
> We also set the folio's index so, given the folio is being treated as faux
> user memory, it correctly references its offset within the VMA.
>
> This explains why the mapping and index fields are used - but it's not
> necessary.
>
> We preallocate pages when perf_mmap() is called for the first time via
> rb_alloc(), and further allocate auxiliary pages via rb_aux_alloc() as
> needed if the mapping requires it.
>
> This allocation is done in the f_ops->mmap() hook provided in perf_mmap(),
> and so we can instead simply map all the memory right away here - there's
> no point in handling (read) page faults when we don't demand page nor need
> to be notified about them (perf does not).
>
> This patch therefore changes this logic to map everything when the mmap()
> hook is called, establishing a PFN map. It implements vm_ops->pfn_mkwrite()
> to provide the required read/write vs. read-only behaviour, which does not
> require the previously implemented workarounds.
>
> While it is not ideal to use a VM_PFNMAP here, doing anything else will
> result in the page_mkwrite() hook need to be provided, which requires the
> same page->mapping hack this patch seeks to undo.
>
> It will also result in the pages being treated as folios and placed on the
> rmap, which really does not make sense for these mappings.
>
> Semantically it makes sense to establish this as some kind of special
> mapping, as the pages are managed by perf and are not strictly user pages,
> but currently the only means by which we can do so functionally while
> maintaining the required R/W and R/O behaviour is a PFN map.
>
> There should be no change to actual functionality as a result of this
> change.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
With this patch, it seems perf tool has some problems in capturing the
kernel data with Intel PT.
Running the following commands, the size of perf.data is very small, and
perf script can't find any valid records.
perf record -e intel_pt//u -- /bin/ls
perf script --insn-trace
Powered by blists - more mailing lists