lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250331133906.48e115f5@gandalf.local.home>
Date: Mon, 31 Mar 2025 13:39:06 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, Masami
 Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
 Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton
 <akpm@...ux-foundation.org>, Vincent Donnefort <vdonnefort@...gle.com>,
 Vlastimil Babka <vbabka@...e.cz>, Mike Rapoport <rppt@...nel.org>, Kees
 Cook <kees@...nel.org>, Tony Luck <tony.luck@...el.com>, "Guilherme G.
 Piccoli" <gpiccoli@...lia.com>, linux-hardening@...r.kernel.org
Subject: Re: [PATCH v2 1/2] tracing: ring-buffer: Have the ring buffer code
 do the vmap of physical memory

On Mon, 31 Mar 2025 09:55:28 -0700
Linus Torvalds <torvalds@...ux-foundation.org> wrote:


> > +       /* Make sure the mappings are page aligned */
> > +       *start = ALIGN(*start, PAGE_SIZE);  
> 
> The above is *completely* unacceptable.
> 
> There is no way in hell that ALIGN() can ever be right.

I just did this to be robust in case what was passed in was not aligned. In
all my use cases, it is.

> 
> You don't even fix up the low bits of the returned virtual address, so
> you literally return the virtual address of something that doesn't
> match what was passed in.
> 
> So if you pass it a starting area that isn't page-aligned, it now
> randomly gives you complete crap back, and includes some random
> unrelated part in the mapping.
> 
> So no. That needs to be a
> 
>         if (*start & PAGE_MASK)
>                 return NULL;

No problem, will update. As I said, I just added that to not map something
that wasn't part of what was passed in. But returning error if it's not
page aligned works for me too.

> 
> or whatever. Because just randomly corrupting the base address by
> ignoring the low bits is not ok.
> 
> > +       /* The size must fit full pages */
> > +       page_count = size >> PAGE_SHIFT;  
> 
> This part works, simply because truncating the size is fine. It won't
> all get mapped, but that's the caller's problem, at least the code
> isn't returning random crap that has random other data in it.
> 
> That said, I don't see the point. If you want to virtually map
> physical pages, they need to be full pages, otherwise the end result
> gets randomly truncated. So I think that while this is much better
> than the "return random crap that doesn't make any sense", it should
> be the same rule: just don't allow mapping partial pages.
> 
> So make it be
> 
>         if (size & PAGE_MASK)
>                 return NULL;
> 
> instead, and just enforce the fact that allocations have to be sanely
> aligned for vmap.

Again, I'm happy to error out on non alignment. I'll just update the
documentation to say it must be page size aligned. Currently it shows an
example of using 4096 for alignment, but that should be changed to
explicitly say to have it page aligned, as some archs (ppc) have 64K pages.

> 
> Anyway, that takes care of the horrific interface. However, there's
> another issue:
> 
> > +       pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL);  
> 
> you create this pointless array of pages. Why? It's a physically
> contiguous area.
> 
> You do that just because you want to use vmap() to map that contiguous
> area one page at a time.
> 
> But this is NOT a new thing. It's exactly what every single PCI device
> with a random physical memory region BAR needs to do. And no, they
> don't create arrays of 'struct page *', because they use memory that
> doesn't even have page backing.
> 
> So we actually have interfaces to do linear virtual mappings of
> physical pages that *predate* vmap(), and do the right thing without
> any of these games.

[ Added the pstore folks ]

OK, so I did copy this from fs/pstore/ram_core.c as this does basically the
same thing as pstore. And it looks like pstore should be updated too.

> 
> Yes, the legacy versions of interfaces are all for IO memory, but we
> do have things like vmap_page_range() which should JustWork(tm).
> 
> Yeah, you'll need to do something like
> 
>         unsigned long vmap_start, vmap_end;
> 
>         area = get_vm_area(size, VM_IOREMAP);
>         if (!area)
>                 return NULL;
> 
>         vmap_start = (unsigned long) area->addr;
>         vmap_end = vmap_start + size;
> 
>         ret = vmap_page_range(vmap_start, vmap_end,
>                 *start, prot_nx(PAGE_KERNEL));
> 
>         if (ret < 0) {
>                 free_vm_area(area);
>                 return NULL;
>         }
> 
> and the above is *entirely* untested and maybe there's something wrong
> there, but the concept should work, and when you don't do it a page at
> a time, you not only don't need the kmalloc_array(), it should even do
> things like be able to use large page mappings if the alignment and
> size work out.
> 
> That said, the old code is *really* broken to begin with. I don't
> understand why you want to vmap() a contiguous physical range. Either
> it's real pages to begin with, and you can just use "page_address()"
> to get a virtual address, it's *not* real pages, and doing
> "pfn_to_page()" is actively wrong, because it creates a fake 'struct
> page *' pointer that isn't valid.
> 
> Is this all just for some disgusting HIGHMEM use (in which case you
> need the virtual mapping because of HIGHMEM)? Is there any reason to
> support HIGHMEM in this area at all?
> 
> So I'm not sure why this code does all this horror in the first place.
> Either it's all just confused code that just didn't know what it was
> doing and just happened to work (very possible..) or there is
> something odd going on.

[ Adding Mike Rapoport ]

This is due to what the "reserve_mem" kernel command line feature gives
back. It reserves physical memory that is not part of the kernel memory
allocator (it also works with memmap ie. memmap=12M$0x284500000, which also
requires physical memory addresses that are not part of the memory
allocator).

It's up to the user to map it to virtual memory. The same is true for
ramoops in pstore.

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ