lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whUOfVucfJRt7E0AH+GV41ELmS4wJqxHDnui6Giddfkzw@mail.gmail.com>
Date: Mon, 31 Mar 2025 09:55:28 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org, 
	Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>, 
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, Andrew Morton <akpm@...ux-foundation.org>, 
	Vincent Donnefort <vdonnefort@...gle.com>, Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH v2 1/2] tracing: ring-buffer: Have the ring buffer code do
 the vmap of physical memory

On Mon, 31 Mar 2025 at 07:34, Steven Rostedt <rostedt@...dmis.org> wrote:
>
> Instead, rename ring_buffer_alloc_range() to ring_buffer_alloc_physical()
> where contiguous physical memory is passed to the ring buffer code,

The concept looks like an improvement, but:

> +static unsigned long map_pages(unsigned long *start, unsigned long *end)
> +{
> +       struct page **pages;
> +       phys_addr_t page_start;
> +       unsigned long size;
> +       unsigned long page_count;
> +       unsigned long i;
> +       void *vaddr;
> +
> +       /* Make sure the mappings are page aligned */
> +       *start = ALIGN(*start, PAGE_SIZE);

The above is *completely* unacceptable.

There is no way in hell that ALIGN() can ever be right.

You don't even fix up the low bits of the returned virtual address, so
you literally return the virtual address of something that doesn't
match what was passed in.

So if you pass it a starting area that isn't page-aligned, it now
randomly gives you complete crap back, and includes some random
unrelated part in the mapping.

So no. That needs to be a

        if (*start & PAGE_MASK)
                return NULL;

or whatever. Because just randomly corrupting the base address by
ignoring the low bits is not ok.

> +       /* The size must fit full pages */
> +       page_count = size >> PAGE_SHIFT;

This part works, simply because truncating the size is fine. It won't
all get mapped, but that's the caller's problem, at least the code
isn't returning random crap that has random other data in it.

That said, I don't see the point. If you want to virtually map
physical pages, they need to be full pages, otherwise the end result
gets randomly truncated. So I think that while this is much better
than the "return random crap that doesn't make any sense", it should
be the same rule: just don't allow mapping partial pages.

So make it be

        if (size & PAGE_MASK)
                return NULL;

instead, and just enforce the fact that allocations have to be sanely
aligned for vmap.

Anyway, that takes care of the horrific interface. However, there's
another issue:

> +       pages = kmalloc_array(page_count, sizeof(struct page *), GFP_KERNEL);

you create this pointless array of pages. Why? It's a physically
contiguous area.

You do that just because you want to use vmap() to map that contiguous
area one page at a time.

But this is NOT a new thing. It's exactly what every single PCI device
with a random physical memory region BAR needs to do. And no, they
don't create arrays of 'struct page *', because they use memory that
doesn't even have page backing.

So we actually have interfaces to do linear virtual mappings of
physical pages that *predate* vmap(), and do the right thing without
any of these games.

Yes, the legacy versions of interfaces are all for IO memory, but we
do have things like vmap_page_range() which should JustWork(tm).

Yeah, you'll need to do something like

        unsigned long vmap_start, vmap_end;

        area = get_vm_area(size, VM_IOREMAP);
        if (!area)
                return NULL;

        vmap_start = (unsigned long) area->addr;
        vmap_end = vmap_start + size;

        ret = vmap_page_range(vmap_start, vmap_end,
                *start, prot_nx(PAGE_KERNEL));

        if (ret < 0) {
                free_vm_area(area);
                return NULL;
        }

and the above is *entirely* untested and maybe there's something wrong
there, but the concept should work, and when you don't do it a page at
a time, you not only don't need the kmalloc_array(), it should even do
things like be able to use large page mappings if the alignment and
size work out.

That said, the old code is *really* broken to begin with. I don't
understand why you want to vmap() a contiguous physical range. Either
it's real pages to begin with, and you can just use "page_address()"
to get a virtual address, it's *not* real pages, and doing
"pfn_to_page()" is actively wrong, because it creates a fake 'struct
page *' pointer that isn't valid.

Is this all just for some disgusting HIGHMEM use (in which case you
need the virtual mapping because of HIGHMEM)? Is there any reason to
support HIGHMEM in this area at all?

So I'm not sure why this code does all this horror in the first place.
Either it's all just confused code that just didn't know what it was
doing and just happened to work (very possible..) or there is
something odd going on.

          Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ