[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a9a816d8-ce2f-4c89-a798-ef565febb906@efficios.com>
Date: Wed, 2 Apr 2025 11:01:46 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Mike Rapoport <rppt@...nel.org>, Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vincent Donnefort <vdonnefort@...gle.com>, Vlastimil Babka <vbabka@...e.cz>,
Jann Horn <jannh@...gle.com>
Subject: Re: [PATCH v5 1/4] tracing: Enforce the persistent ring buffer to be
page aligned
On 2025-04-02 05:21, Mike Rapoport wrote:
> On Tue, Apr 01, 2025 at 06:58:12PM -0400, Steven Rostedt wrote:
>> From: Steven Rostedt <rostedt@...dmis.org>
>>
>> Enforce that the address and the size of the memory used by the persistent
>> ring buffer is page aligned. Also update the documentation to reflect this
>> requirement.
I've been loosely following this thread, and I'm confused about one
thing.
AFAIU the goal is to have the ftrace persistent ring buffer written to
through a memory range mapped by vmap_page_range(), and userspace maps
the buffer with its own virtual mappings.
With respect to architectures with aliasing dcache, is the plan:
A) To make sure all persistent ring buffer mappings are aligned on
SHMLBA:
Quoting "Documentation/core-api/cachetlb.rst":
Is your port susceptible to virtual aliasing in its D-cache?
Well, if your D-cache is virtually indexed, is larger in size than
PAGE_SIZE, and does not prevent multiple cache lines for the same
physical address from existing at once, you have this problem.
If your D-cache has this problem, first define asm/shmparam.h SHMLBA
properly, it should essentially be the size of your virtually
addressed D-cache (or if the size is variable, the largest possible
size). This setting will force the SYSv IPC layer to only allow user
processes to mmap shared memory at address which are a multiple of
this value.
or
B) to flush both the kernel and userspace mappings when a ring buffer
page is handed over from writer to reader ?
I've seen both approaches being discussed in the recent threads, with
some participants recommending approach (A), but then the code
revisions that follow take approach (B).
AFAIU, it we are aiming for approach (A), then I'm missing where
vmap_page_range() guarantees that the _kernel_ virtual mapping is
SHMLBA aligned. AFAIU, only user mappings are aligned on SHMLBA.
And if we aiming towards approach (A), then the explicit flushing
is not needed when handing over pages from writer to reader.
Please let me know if I'm missing something,
Thanks,
Mathieu
>>
>> Link: https://lore.kernel.org/all/CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com/
>>
>> Suggested-by: Linus Torvalds <torvalds@...ux-foundation.org>
>> Signed-off-by: Steven Rostedt (Google) <rostedt@...dmis.org>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 2 ++
>> Documentation/trace/debugging.rst | 2 ++
>> kernel/trace/trace.c | 12 ++++++++++++
>> 3 files changed, 16 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 3435a062a208..f904fd8481bd 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -7266,6 +7266,8 @@
>> This is just one of many ways that can clear memory. Make sure your system
>> keeps the content of memory across reboots before relying on this option.
>>
>> + NB: Both the mapped address and size must be page aligned for the architecture.
>> +
>> See also Documentation/trace/debugging.rst
>>
>>
>> diff --git a/Documentation/trace/debugging.rst b/Documentation/trace/debugging.rst
>> index 54fb16239d70..d54bc500af80 100644
>> --- a/Documentation/trace/debugging.rst
>> +++ b/Documentation/trace/debugging.rst
>> @@ -136,6 +136,8 @@ kernel, so only the same kernel is guaranteed to work if the mapping is
>> preserved. Switching to a different kernel version may find a different
>> layout and mark the buffer as invalid.
>>
>> +NB: Both the mapped address and size must be page aligned for the architecture.
>> +
>> Using trace_printk() in the boot instance
>> -----------------------------------------
>> By default, the content of trace_printk() goes into the top level tracing
>> diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
>> index de6d7f0e6206..de9c237e5826 100644
>> --- a/kernel/trace/trace.c
>> +++ b/kernel/trace/trace.c
>> @@ -10788,6 +10788,18 @@ __init static void enable_instances(void)
>> }
>>
>> if (start) {
>> + /* Start and size must be page aligned */
>> + if (start & ~PAGE_MASK) {
>> + pr_warn("Tracing: mapping start addr %lx is not page aligned\n",
>> + (unsigned long)start);
>> + continue;
>> + }
>> + if (size & ~PAGE_MASK) {
>> + pr_warn("Tracing: mapping size %lx is not page aligned\n",
>> + (unsigned long)size);
>> + continue;
>> + }
>
> Better use %pa for printing physical address as on 32-bit systems
> phys_addr_t may be unsigned long long:
>
> pr_warn("Tracing: mapping size %pa is not page aligned\n", &size);
>
>> +
>> addr = map_pages(start, size);
>> if (addr) {
>> pr_info("Tracing: mapped boot instance %s at physical memory %pa of size 0x%lx\n",
>> --
>> 2.47.2
>>
>>
>
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists