[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wiDQpOeXi_GjKB7Mrh93Zbd__4k+FF_vJd+-prbaacEug@mail.gmail.com>
Date: Mon, 31 Mar 2025 17:11:46 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton <akpm@...ux-foundation.org>,
Vincent Donnefort <vdonnefort@...gle.com>, Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>,
Kees Cook <kees@...nel.org>, Tony Luck <tony.luck@...el.com>,
"Guilherme G. Piccoli" <gpiccoli@...lia.com>, linux-hardening@...r.kernel.org,
Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v2 1/2] tracing: ring-buffer: Have the ring buffer code do
the vmap of physical memory
On Mon, 31 Mar 2025 at 16:41, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> Hmm, so if we need to map this memory to user space memory, then I can't
> use the method from this patch series, if I have to avoid struct page.
>
> Should I then be using vm_iomap_memory() passing in the physical address?
I actually think that would be the best option in general - it works
*regardless* of the source of the pages (ie it works for pages that
don't have 'struct page' backing, but it works for regular RAM too).
So it avoids any question of how the page was allocated, and it also
avoids the page reference counting overhead.
I thought you did that already for the user mappings - don't use you
remap_pfn_range()?
That's basically the equivalent of vmap_page_range() - you're mapping
a whole range based on physical addresses, not mapping individual
pages.
But I didn't go look, this is from dim and possibly confused memories
from past patches.
> As for architectures that do not have user/kernel data cache coherency, how
> does one flush the page when there's an update on the kernel side so that
> the user side doesn't see stale data?
So if you don't treat this as some kind of 'page' or 'folio' thing,
then the proper function is actually flush_cache_range().
I actually suspect that if you treat things just as an arbitrary range
of memory, it might simplify things in general.
For example, the whole flush_cache_page() thing obviously just flushes
one page. So then you have to artificially iterate over pages rather
than just use the natural range.
HOWEVER.
At this point I have to also admit that you will likely find various
holes in various architecture implementations.
Why?
Because sane architectures don't care (there's basically no testing of
any of this on x86, because x86 is typically always cache coherent
outside of some GPU oddities that are handled by the DRM layer
explicitly, so all of these functions are just no-ops on x86).
And so almost none of this gets any testing in practice. A missed
cache flush doesn't matter on x86 or arm64, and very seldom elsewhere
too.
We do have "flush_cache_range()" calls in the generic MM code, and so
it *should* all work, but honestly, I'd expect there to be bugs in
this area.
Of course, I would expect the same to be true of the page/folio cases,
so I don't think using flush_cache_range() should be any worse, but I
*could* imagine that it's bad in a different way ;)
Linus
Powered by blists - more mailing lists