[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4cb93166-29fd-4aea-965b-5dfb62d4dc8c@redhat.com>
Date: Thu, 20 Feb 2025 11:59:11 +0100
From: David Hildenbrand <david@...hat.com>
To: Luiz Capitulino <luizcap@...hat.com>, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, yuzhao@...gle.com, pasha.tatashin@...een.com
Cc: akpm@...ux-foundation.org, hannes@...xchg.org, muchun.song@...ux.dev
Subject: Re: [PATCH 1/4] mm: page_ext: add an iteration API for page
extensions
On 19.02.25 03:17, Luiz Capitulino wrote:
> The page extension implementation assumes that all page extensions of
> a given page order are stored in the same memory section. The function
> page_ext_next() relies on this assumption by adding an offset to the
> current object to return the next adjacent page extension.
>
> This behavior works as expected for flatmem but fails for sparsemem when
> using 1G pages. The commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP for
> gigantic folios") exposes this issue, making it possible for a crash when
> using page_owner or page_table_check page extensions.
>
> The problem is that for 1G pages, the page extensions may span memory
> section boundaries and be stored in different memory sections. This issue
> was not visible before commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
> for gigantic folios") because alloc_contig_pages() never passed more than
> MAX_PAGE_ORDER to post_alloc_hook(). However, the series introducing
> mentioned commit changed this behavior allowing the full 1G page order
> to be passed.
>
> Reproducer:
>
> 1. Build the kernel with CONFIG_SPARSEMEM=y and table extensions
> support
> 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
> 3. Reserve one 1G page at run-time, this should crash (backtrace below)
>
> To address this issue, this commit introduces a new API for iterating
> through page extensions. The main iteration loops are for_each_page_ext()
> and for_each_page_ext_order(). Both must be called with the RCU read
> lock taken. Here's an usage example:
>
> """
> struct page_ext_iter iter;
> struct page_ext *page_ext;
>
> ...
>
> rcu_read_lock();
> for_each_page_ext_order(page, order, page_ext, iter) {
> struct my_page_ext *obj = get_my_page_ext_obj(page_ext);
> ...
> }
> rcu_read_unlock();
> """
>
[...]
> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page);
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter);
> +
> +/**
> + * page_ext_iter_get() - Get current page extension
> + * @iter: page extension iterator.
> + *
> + * Return: NULL if no page_ext exists for this iterator.
> + */
> +static inline struct page_ext *page_ext_iter_get(const struct page_ext_iter *iter)
> +{
> + return iter->page_ext;
> +}
> +
> +/**
> + * for_each_page_ext(): iterate through page_ext objects.
> + * @__page: the page we're interested in
> + * @__pgcount: how many pages to iterate through
> + * @__page_ext: struct page_ext pointer where the current page_ext
> + * object is returned
> + * @__iter: struct page_ext_iter object (defined in the stack)
> + *
> + * IMPORTANT: must be called with RCU read lock taken.
> + */
> +#define for_each_page_ext(__page, __pgcount, __page_ext, __iter) \
> + __page_ext = page_ext_iter_begin(&__iter, __page); \
Doing stuff out of the loop is dangerous. Assume something does
if (xxx)
for_each_page_ext()
Just move that inside the for().
for (__page_ext = page_ext_iter_begin(&__iter, __page), __iter.index = 0)
> + for (__iter.index = 0; \
> + __page_ext && __iter.index < __pgcount; \
> + __page_ext = page_ext_iter_next(&__iter), \
> + __iter.index++)
Hm, if we now have an index, why not turn iter.pfn -> iter.start_pfn,
and only adjust the index in page_ext_iter_next?
Then you can set the index to 0 in page_ext_iter_begin() and have here
for (__page_ext = page_ext_iter_begin(&__iter, __page),
__page_ext && __iter.index < __pgcount,
__page_ext = page_ext_iter_next(&__iter);)
A page_ext_iter_reset() could then simply reset the index=0 and
lookup the page_ext(start_pfn + index) == page_ext(start_pfn)
> +
> +/**
> + * for_each_page_ext_order(): iterate through page_ext objects
> + * for a given page order
> + * @__page: the page we're interested in
> + * @__order: page order to iterate through
> + * @__page_ext: struct page_ext pointer where the current page_ext
> + * object is returned
> + * @__iter: struct page_ext_iter object (defined in the stack)
> + *
> + * IMPORTANT: must be called with RCU read lock taken.
> + */
> +#define for_each_page_ext_order(__page, __order, __page_ext, __iter) \
> + for_each_page_ext(__page, (1UL << __order), __page_ext, __iter)
> +
> #else /* !CONFIG_PAGE_EXTENSION */
> struct page_ext;
>
> diff --git a/mm/page_ext.c b/mm/page_ext.c
> index 641d93f6af4c1..508deb04d5ead 100644
> --- a/mm/page_ext.c
> +++ b/mm/page_ext.c
> @@ -549,3 +549,44 @@ void page_ext_put(struct page_ext *page_ext)
>
> rcu_read_unlock();
> }
> +
> +/**
> + * page_ext_iter_begin() - Prepare for iterating through page extensions.
> + * @iter: page extension iterator.
> + * @page: The page we're interested in.
> + *
> + * Must be called with RCU read lock taken.
> + *
> + * Return: NULL if no page_ext exists for this page.
> + */
> +struct page_ext *page_ext_iter_begin(struct page_ext_iter *iter, struct page *page)
> +{
> + iter->pfn = page_to_pfn(page);
> + iter->page_ext = lookup_page_ext(page);
> +
> + return iter->page_ext;
> +}
> +
> +/**
> + * page_ext_iter_next() - Get next page extension
> + * @iter: page extension iterator.
> + *
> + * Must be called with RCU read lock taken.
> + *
> + * Return: NULL if no next page_ext exists.
> + */
> +struct page_ext *page_ext_iter_next(struct page_ext_iter *iter)
> +{
> + if (WARN_ON_ONCE(!iter->page_ext))
> + return NULL;
> +
> + iter->pfn++;
> +> + if (page_ext_iter_next_fast_possible(iter->pfn)) {
> + iter->page_ext = page_ext_next(iter->page_ext);
> + } else {
> + iter->page_ext = lookup_page_ext(pfn_to_page(iter->pfn));
> + }
> +
> + return iter->page_ext;
> +}
We now always have a function call when calling into
page_ext_iter_next(). Could we move that to the header and rather expose
lookup_page_ext() ?
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists