[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAAigxUv=HGpxoV-PruN_AhisKW675SxuG_yVi+vNmfSQ@mail.gmail.com>
Date: Mon, 18 Nov 2024 17:08:42 -0500
From: Pasha Tatashin <pasha.tatashin@...een.com>
To: Roman Gushchin <roman.gushchin@...ux.dev>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org,
cgroups@...r.kernel.org, linux-kselftest@...r.kernel.org,
akpm@...ux-foundation.org, corbet@....net, derek.kiernan@....com,
dragan.cvetic@....com, arnd@...db.de, gregkh@...uxfoundation.org,
viro@...iv.linux.org.uk, brauner@...nel.org, jack@...e.cz, tj@...nel.org,
hannes@...xchg.org, mhocko@...nel.org, shakeel.butt@...ux.dev,
muchun.song@...ux.dev, Liam.Howlett@...cle.com, lorenzo.stoakes@...cle.com,
vbabka@...e.cz, jannh@...gle.com, shuah@...nel.org, vegard.nossum@...cle.com,
vattunuru@...vell.com, schalla@...vell.com, david@...hat.com,
willy@...radead.org, osalvador@...e.de, usama.anjum@...labora.com,
andrii@...nel.org, ryan.roberts@....com, peterx@...hat.com, oleg@...hat.com,
tandersen@...flix.com, rientjes@...gle.com, gthelen@...gle.com
Subject: Re: [RFCv1 0/6] Page Detective
On Mon, Nov 18, 2024 at 2:11 PM Roman Gushchin <roman.gushchin@...ux.dev> wrote:
>
> On Sat, Nov 16, 2024 at 05:59:16PM +0000, Pasha Tatashin wrote:
> > Page Detective is a new kernel debugging tool that provides detailed
> > information about the usage and mapping of physical memory pages.
> >
> > It is often known that a particular page is corrupted, but it is hard to
> > extract more information about such a page from live system. Examples
> > are:
> >
> > - Checksum failure during live migration
> > - Filesystem journal failure
> > - dump_page warnings on the console log
> > - Unexcpected segfaults
> >
> > Page Detective helps to extract more information from the kernel, so it
> > can be used by developers to root cause the associated problem.
> >
> > It operates through the Linux debugfs interface, with two files: "virt"
> > and "phys".
> >
> > The "virt" file takes a virtual address and PID and outputs information
> > about the corresponding page.
> >
> > The "phys" file takes a physical address and outputs information about
> > that page.
> >
> > The output is presented via kernel log messages (can be accessed with
> > dmesg), and includes information such as the page's reference count,
> > mapping, flags, and memory cgroup. It also shows whether the page is
> > mapped in the kernel page table, and if so, how many times.
>
> This looks questionable both from the security and convenience points of view.
> Given the request-response nature of the interface, the output can be
> provided using a "normal" seq-based pseudo-file.
We opted to use dmesg for output because it's the standard method for
capturing kernel information and is commonly included in bug reports.
Introducing a new file would require modifying existing data
collection scripts used for reporting, so this approach minimizes
disruption to existing workflows.
> But I have a more generic question:
> doesn't it make sense to implement it as a set of drgn scripts instead
> of kernel code? This provides more flexibility, is safer (even if it's buggy,
> you won't crash the host) and should be at least in theory equally
> powerful.
Regarding your suggestion, our plan is to perform reverse lookups in
all page tables: kernel, user, IOMMU, and KVM. Currently, we only
traverse the kernel and user page tables, but we intend to extend this
functionality to IOMMU and KVM tables in future updates, I am not sure
if drgn can provide this level of details within a reasonable amount
of time.
This approach will be helpful for debugging memory corruption
scenarios. Often, external mechanisms detect corruption but require
kernel-level information for root cause analysis. In our experience,
invalid mappings persist in page tables for a period after corruption,
providing a window to identify other users of the corrupted page via
timely reverse lookup.
Additionally, using crash/drgn is not feasible for us at this time, it
requires keeping external tools on our hosts, also it requires
approval and a security review for each script before deployment in
our fleet.
Thanks,
Pasha
Powered by blists - more mailing lists