[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1498b22-0bc1-489a-8c2c-35aa48c7fe7d@redhat.com>
Date: Tue, 4 Feb 2025 12:59:35 +0100
From: David Hildenbrand <david@...hat.com>
To: Asahi Lina <lina@...hilina.net>, Zi Yan <ziy@...dia.com>
Cc: Miguel Ojeda <ojeda@...nel.org>, Alex Gaynor <alex.gaynor@...il.com>,
Boqun Feng <boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>,
Benno Lossin <benno.lossin@...ton.me>,
Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl <aliceryhl@...gle.com>,
Trevor Gross <tmgross@...ch.edu>, Jann Horn <jannh@...gle.com>,
Matthew Wilcox <willy@...radead.org>, Paolo Bonzini <pbonzini@...hat.com>,
Danilo Krummrich <dakr@...nel.org>, Wedson Almeida Filho
<wedsonaf@...il.com>, Valentin Obst <kernel@...entinobst.de>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
airlied@...hat.com, Abdiel Janulgue <abdiel.janulgue@...il.com>,
rust-for-linux@...r.kernel.org, linux-kernel@...r.kernel.org,
asahi@...ts.linux.dev, Oscar Salvador <osalvador@...e.de>,
Muchun Song <muchun.song@...ux.dev>
Subject: Re: [PATCH 0/6] rust: page: Support borrowing `struct page` and
physaddr conversion
>>> Add DavidH and OscarS for memory hot-remove questions.
>>>
>>> IIUC, struct page could be freed if a chunk of memory is hot-removed.
>>
>> Right, but only after there are no users anymore (IOW, memory was freed
>> back to the buddy). PFN walkers might still stumble over them, but I
>> would not expect (or recommend) rust to do that.
>
> The physaddr to page function does look up pages by pfn, but it's
> intended to be used by drivers that know what they're doing. There are
> two variants of the API, one that is completely unchecked (a fast path
> for cases where the driver definitely allocated these pages itself, for
> example just grabbing the `struct page` back from a decoded PTE it
> wrote), and one that has this check:
>
> pfn_valid(pfn) && page_is_ram(pfn)
>
> Which is intended as a safety net to allow drivers to look up
> firmware-reserved pages too, and fail gracefully if the kernel doesn't
> know about them (because they weren't declared in the
> bootloader/firmware memory map correctly) or doesn't have them mapped in
> the direct map (because they were declared no-map).
> > Is there anything else that can reasonably be done here to make the API
> safer to call on an arbitrary pfn?
In PFN walkers we use pfn_to_online_page() to make sure that (1) the
memmap really exists; and (2) that it has meaning (e.g., was actually
initialized).
It can still race with memory offlining, and it refuses ZONE_DEVICE
pages. For the latter, we have a different way to check validity. See
memory_failure() that first calls pfn_to_online_page() to then check
get_dev_pagemap().
>
> If the answer is "no" then that's fine. It's still an unsafe function
> and we need to document in the safety section that it should only be
> used for memory that is either known to be allocated and pinned and will
> not be freed while the `struct page` is borrowed, or memory that is
> reserved and not owned by the buddy allocator, so in practice correct
> use would not be racy with memory hot-remove anyway.
>
> This is already the case for the drm/asahi use case, where the pfns
> looked up will only ever be one of:
>
> - GEM objects that are mapped to the GPU and whose physical pages are
> therefore pinned (and the VM is locked while this happens so the objects
> cannot become unpinned out from under the running code),
How exactly are these pages pinned/obtained?
> - Raw pages allocated from the page allocator for use as GPU page tables,
That makes sense.
> - System memory that is marked reserved by the firmware/bootloader,
E.g., in arch/x86/mm/ioremap.c:__ioremap_check_ram() we refuse anything
that has a valid memmap and is *not* marked as PageReserved, to prevent
remapping arbitrary *real* RAM.
Is that case here similar?
> - (Potentially) invalid PFNs that aren't part of the System RAM region
> at all and don't have a struct page to begin with, which we check for,
> so the API returns an error. This would only happen if the bootloader
> didn't declare some used firmware ranges at all, so Linux doesn't know
> about them.
>
>>
>>>
>>> Another case struct page can be freed is when hugetlb vmemmap
>>> optimization
>>> is used. Muchun (cc'd) is the maintainer of hugetlbfs.
>>
>> Here, the "struct page" remains valid though; it can still be accessed,
>> although we disallow writes (which would be wrong).
>>
>> If you only allocate a page and free it later, there is no need to worry
>> about either on the rust side.
>
> This is what the safe API does. (Also the unsafe physaddr APIs if all
> you ever do is convert an allocated page to a physaddr and back, which
> is the only thing the GPU page table code does during normal use. The
> walking leaf PFNs story is only for GPU device coredumps when the
> firmware crashes.)
I would hope that we can lock down this interface as much as possible.
Ideally, we would never go from pfn->page, unless
(a) we remember somehow that we came from page->pfn. E.g., we allocated
these pages or someone else provided us with these pages. The memmap
cannot go away. I know it's hard.
(b) the pages are flagged as being special, similar to
__ioremap_check_ram().
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists