[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <973db2d5-4be2-492f-844c-a0f46c9e588f@linux.dev>
Date: Thu, 24 Apr 2025 11:24:33 +0800
From: Ye Liu <ye.liu@...ux.dev>
To: SeongJae Park <sj@...nel.org>
Cc: akpm@...ux-foundation.org, linux-debuggers@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
linux-toolchains@...r.kernel.org, osandov@...ndov.com, paulmck@...nel.org,
sweettea-kernel@...miny.me, liuye@...inos.cn
Subject: Re: [PATCH v3] tools/mm: Add script to display page state for a given
PID and VADDR
在 2025/4/24 10:51, SeongJae Park 写道:
> On Thu, 24 Apr 2025 09:55:22 +0800 Ye Liu <ye.liu@...ux.dev> wrote:
>
>> 在 2025/4/24 06:00, SeongJae Park 写道:
>>> On Wed, 23 Apr 2025 09:48:50 +0800 Ye Liu <ye.liu@...ux.dev> wrote:
>>>
>>>> From: Ye Liu <liuye@...inos.cn>
>>>>
>>>> Introduces a new drgn script, `show_page_info.py`, which allows users
>>>> to analyze the state of a page given a process ID (PID) and a virtual
>>>> address (VADDR). This can help kernel developers or debuggers easily
>>>> inspect page-related information in a live kernel or vmcore.
>>>>
>>>> The script extracts information such as the page flags, mapping, and
>>>> other metadata relevant to diagnosing memory issues.
>>>>
>>>> Output example:
>>>> sudo ./show_page_info.py 1 0x7f43df5acf00
>>>> PID: 1 Comm: systemd mm: 0xffff8881273bbc40
>>>> Raw: 0017ffffc000416c ffffea00043a4508 ffffea0004381e08 ffff88810f086a70
>>>> Raw: 0000000000000000 ffff888120c9b0c0 0000002500000007 ffff88812642c000
>>>> User Virtual Address: 0x7f43df5acf00
>>>> Page Address: 0xffffea00049a0b00
>>>> Page Flags: PG_referenced|PG_uptodate|PG_lru|PG_head|PG_active|
>>>> PG_private|PG_reported
>>>> Page Size: 16384
>>> Should this be called folio size? Or, could this simply removed since Compound
>>> Order is given below?
>>
>> Page size refers to the base page size, which equals PAGESIZE.
> Shouldn't 'prog["PAGE_SIZE"]' is used for what you are saying? This tool is
> using drgn.helpers.linux.page_size()[1] to print this, though?
>
> +def show_page_state(page, addr, mm, pid, task):
> + """Display detailed information about a page."""
> + print(f'PID: {pid} Comm: {task.comm.string_().decode()} mm: {hex(mm)}')
> + print(format_page_data(prog.read(page.value_(), 64)))
> + fields = {
> + "User Virtual Address": hex(addr),
> + "Page Address": hex(page.value_()),
> + "Page Flags": decode_page_flags(page),
> + "Page Size": page_size(page).value_(),
>
> [1] https://drgn.readthedocs.io/en/stable/helpers.html#drgn.helpers.linux.mm.page_size
>
>> Folio size can be calculated using the Compound Order, but of course,
>> it can also be shown directly as a result.
>>
>>>> Page PFN: 0x12682c
>>>> Page Physical: 0x12682c000
>>>> Page Virtual: 0xffff88812682c000
>>>> Page Refcount: 37
>>>> Page Mapcount: 7
>>>> Page Index: 0x0
>>>> Page Memcg Data: 0xffff88812642c000
>>>> Memcg Name: init.scope
>>>> Memcg Path: /sys/fs/cgroup/memory/init.scope
>>>> Page Mapping: 0xffff88810f086a70
>>>> Page Anon/File: File
>>>> Page VMA: 0xffff88810e4af3b8
>>>> VMA Start: 0x7f43df5ac000
>>>> VMA End: 0x7f43df5b0000
>>>> This page is part of a compound page.
>>>> This page is the head page of a compound page.
>>>> Head Page: 0xffffea00049a0b00
>>>> Compound Order: 2
>>>> Number of Pages: 4
>>>>
>>>> Signed-off-by: Ye Liu <liuye@...inos.cn>
>>>>
>>>> Changes in v3:
>>>> - Adjust display style.
>>>> - Link to v2:https://lore.kernel.org/all/20250421080748.114750-1-ye.liu@linux.dev/
>>>>
>>>> Changes in v2:
>>>> - Move the show_page_info.py file to tools/mm.
>>>> - Link to v1: https://lore.kernel.org/all/20250415075024.248232-1-ye.liu@linux.dev/
>>>> ---
>>>> MAINTAINERS | 5 ++
>>>> tools/mm/show_page_info.py | 120 +++++++++++++++++++++++++++++++++++++
>>>> 2 files changed, 125 insertions(+)
>>>> create mode 100755 tools/mm/show_page_info.py
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index 17ed0b5ffdd2..85686a30dc72 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -18351,6 +18351,11 @@ F: Documentation/mm/page_table_check.rst
>>>> F: include/linux/page_table_check.h
>>>> F: mm/page_table_check.c
>>>>
>>>> +PAGE STATE DEBUG SCRIPT
>>>> +M: Ye Liu <liuye@...inos.cn>
>>>> +S: Maintained
>>>> +F: tools/mm/show_page_info.py
>>>> +
>>>> PANASONIC LAPTOP ACPI EXTRAS DRIVER
>>>> M: Kenneth Chan <kenneth.t.chan@...il.com>
>>>> L: platform-driver-x86@...r.kernel.org
>>>> diff --git a/tools/mm/show_page_info.py b/tools/mm/show_page_info.py
>>>> new file mode 100755
>>>> index 000000000000..8622c5499dfe
>>>> --- /dev/null
>>>> +++ b/tools/mm/show_page_info.py
>>> [...]
>>>> +def main():
>>>> + """Main function to parse arguments and display page state."""
>>>> + parser = argparse.ArgumentParser(description=DESC, formatter_class=argparse.RawTextHelpFormatter)
>>>> + parser.add_argument('pid', metavar='PID', type=int, help='Target process ID (PID)')
>>>> + parser.add_argument('vaddr', metavar='VADDR', type=str, help='Target virtual address in hexadecimal format (e.g., 0x7fff1234abcd)')
>>>> + args = parser.parse_args()
>>>> +
>>>> + try:
>>>> + vaddr = int(args.vaddr, 16)
>>>> + except ValueError:
>>>> + print(f"Error: Invalid virtual address format: {args.vaddr}")
>>>> + return
>>>> +
>>>> + task = find_task(args.pid)
>>>> + mm = task.mm
>>>> + page = follow_page(mm, vaddr)
>>> I tried this script on my test machine and got the below error:
>>>
>>> $ cat ./a.c
>>> #include <stdio.h>
>>>
>>> int main(void)
>>> {
>>> int foo;
>>> printf("hello\n");
>>> printf("%x\n", &foo);
>> To avoid address truncation, you can use the %p format specifier
>> instead of %x or %lx when printing a pointer (memory address).
> Ah, you're correct, thank you. After fixing my test, the error I reported
> before is disappeared. But I think the follow_page() error handling would
> better to be updated to catch the exception and provide a better error message?
You're absolutely right, I initially overlooked the error handling in
follow_page(). I’ll update it to catch exceptions and provide clearer,
more user-friendly error messages.
> Also, I'm getting below new error:
>
> $ sudo ./tools/mm/show_page_info.py 47657 0x7fffaf925b6c
> PID: 47657 Comm: a.out mm: 0xffff959c8a022100
> Raw: 0017ffffc0020828 ffffea6b0c201408 ffffea6b0fc65648 ffff959d32bec9c1
> Raw: 00000007fffffffc 0000000000000000 0000000100000000 ffff959cba058000
> Traceback (most recent call last):
> File "/usr/local/bin/drgn", line 33, in <module>
> sys.exit(load_entry_point('drgn==0.0.30+82.ge2b60e4b', 'console_scripts', 'drgn')())
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/usr/local/lib/python3.11/dist-packages/drgn-0.0.30+82.ge2b60e4b-py3.11-linux-x86_64.egg/drgn/cli.py", line 461, in _main
> runpy.run_path(script, init_globals={"prog": prog}, run_name="__main__")
> File "<frozen runpy>", line 291, in run_path
> File "<frozen runpy>", line 98, in _run_module_code
> File "<frozen runpy>", line 88, in _run_code
> File "./tools/mm/show_page_info.py", line 120, in <module>
> main()
> File "./tools/mm/show_page_info.py", line 115, in main
> show_page_state(page, vaddr, mm, args.pid, task)
> File "./tools/mm/show_page_info.py", line 63, in show_page_state
> memcg_name, memcg_path = get_memcg_info(page)
> ^^^^^^^^^^^^^^^^^^^^
> File "./tools/mm/show_page_info.py", line 43, in get_memcg_info
> return cgroup_name(cgrp).decode(), f"/sys/fs/cgroup/memory{cgroup_path(cgrp).decode()}"
> ^^^^^^^^^^^^^^^^^
> File "/usr/local/lib/python3.11/dist-packages/drgn-0.0.30+82.ge2b60e4b-py3.11-linux-x86_64.egg/drgn/helpers/linux/cgroup.py", line 71, in cgroup_name
> return kernfs_name(cgrp.kn)
> ^^^^^^^^^^^^^^^^^^^^
> File "/usr/local/lib/python3.11/dist-packages/drgn-0.0.30+82.ge2b60e4b-py3.11-linux-x86_64.egg/drgn/helpers/linux/kernfs.py", line 32, in kernfs_name
> return kn.name.string_() if kn.parent else b"/"
> ^^^^^^^^^
> AttributeError: 'struct kernfs_node' has no member 'parent'. Did you mean: '__parent'?
>
> Seems not entirely this script's fault but due to the recent 'struct
> kernfs_node' change or my old version of drgn? But anyway, I think it is
> better to provide a better error message to users. I'm also curious if you
> have a plan for finding and fixing or avoiding this kind of future breakages.
As for the new error you encountered, it does appear to stem from a recent
change to the struct kernfs_node, or possibly from a mismatch between
kernel headers and the drgn version. While it may not be a bug in the
script itself, I agree that the script should handle such scenarios more
gracefully.
Regarding compatibility across versions, this is definitely something worth
paying attention to. Currently, the script is being adapted based on the
latest drgn and kernel versions. If future changes to kernel structures
occur, I plan to patch the script accordingly to maintain compatibility.
That said, I’m very open to suggestions — do you have any ideas on how
we could better detect or guard against these kinds of breakages
proactively?
Thanks again for raising this!
Thanks,
Ye Liu
Powered by blists - more mailing lists