lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez1mZ4nq-_DXHqiHe8_tSX37DdcngnULqXQ71fFt0oQPyA@mail.gmail.com>
Date: Tue, 1 Apr 2025 03:28:20 +0200
From: Jann Horn <jannh@...gle.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, linux-kernel@...r.kernel.org, 
	linux-trace-kernel@...r.kernel.org, Masami Hiramatsu <mhiramat@...nel.org>, 
	Mark Rutland <mark.rutland@....com>, Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, 
	Andrew Morton <akpm@...ux-foundation.org>, Vincent Donnefort <vdonnefort@...gle.com>, 
	Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, Kees Cook <kees@...nel.org>, 
	Tony Luck <tony.luck@...el.com>, "Guilherme G. Piccoli" <gpiccoli@...lia.com>, 
	linux-hardening@...r.kernel.org, Matthew Wilcox <willy@...radead.org>
Subject: Re: [PATCH v2 1/2] tracing: ring-buffer: Have the ring buffer code do
 the vmap of physical memory

On Tue, Apr 1, 2025 at 3:01 AM Steven Rostedt <rostedt@...dmis.org> wrote:
> On Tue, 1 Apr 2025 02:09:10 +0200
> Jann Horn <jannh@...gle.com> wrote:
>
> > On Tue, Apr 1, 2025 at 1:41 AM Steven Rostedt <rostedt@...dmis.org> wrote:
> > > On Mon, 31 Mar 2025 14:42:38 -0700
> > > Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> > >
> > > > .. and *after* you've given it back to the memory allocator, and it
> > > > gets allocated using the page allocators, at that point ahead and use
> > > > 'struct page' as much as you want.
> > > >
> > > > Before that, don't. Even if it might work. Because you didn't allocate
> > > > it as a struct page, and for all you know it might be treated as a
> > > > different hotplug memory zone or whatever when given back.
> > >
> > > Hmm, so if we need to map this memory to user space memory, then I can't
> > > use the method from this patch series, if I have to avoid struct page.
> > >
> > > Should I then be using vm_iomap_memory() passing in the physical address?
> >
> > For mapping random physical memory ranges into userspace, we have
> > helpers like remap_pfn_range() (the easy option, for use in an mmap
> > handler, in case you want to want to map one contiguous physical
> > region into userspace) and vmf_insert_pfn() (for use in a page fault
> > handler, in case you want to map random physical pages into userspace
> > on demand).
>
> Note, I believe that Linus brought up the issue that because this physical
> memory is not currently part of the memory allocator (it's not aware of it
> yet), that the getting struct page or a "pfn" for it may not be reliable.

PFN mappings are specifically designed to work with memory that does
not have "struct page":

#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct
page", just pure PFN */

> > > As for architectures that do not have user/kernel data cache coherency, how
> > > does one flush the page when there's an update on the kernel side so that
> > > the user side doesn't see stale data?
> >
> > flush_kernel_vmap_range() (and invalidate_kernel_vmap_range() for the
> > other direction) might be what you want... I found those by going
> > backwards from an arch-specific cache-flushing implementation.
> >
> > > As the code currently uses flush_dcache_folio(), I'm guessing there's an
> > > easy way to create a folio that points to physical memory that's not part
> > > of the memory allocator?
> >
> > Creating your own folio structs sounds like a bad idea; folio structs
> > are supposed to be in specific kernel memory regions. For example,
> > conversions from folio* to physical address can involve pointer
> > arithmetic on the folio*, or they can involve reading members of the
> > pointed-to folio.
>
> Linus already mentioned flush_cache_range() which looks to be the thing to
> use.

It looks like flush_kernel_vmap_range() is used for flushing dcache
for the kernel mapping, while flush_cache_range() is for flushing
dcache/icache for the userspace mapping?

For example, on 32-bit arm, you might go down these paths, ending up
in arch-specific functions that make it clear whether they're for the
user side or the kernel side:

flush_cache_range -> __cpuc_flush_user_range

flush_kernel_vmap_range -> __cpuc_flush_dcache_area ->
cpu_cache.flush_kern_dcache_area

I think you probably need flushes on both sides, since you might have
to first flush out the dirty cacheline you wrote through the kernel
mapping, then discard the stale clean cacheline for the user mapping,
or something like that? (Unless these VIVT cache architectures provide
stronger guarantees on cache state than I thought.) But when you're
adding data to the tracing buffers, I guess maybe you only want to
flush the kernel mapping from the kernel, and leave flushing of the
user mapping to userspace? I think if you're running in some random
kernel context, you probably can't even reliably flush the right
userspace context - see how for example vivt_flush_cache_range() does
nothing if the MM being flushed is not running on the current CPU.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ