[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aK2QZnzS1ErHK5tP@raptor>
Date: Tue, 26 Aug 2025 11:45:58 +0100
From: Alexandru Elisei <alexandru.elisei@....com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, Alexander Potapenko <glider@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Brendan Jackman <jackmanb@...gle.com>,
Christoph Lameter <cl@...two.org>, Dennis Zhou <dennis@...nel.org>,
Dmitry Vyukov <dvyukov@...gle.com>, dri-devel@...ts.freedesktop.org,
intel-gfx@...ts.freedesktop.org, iommu@...ts.linux.dev,
io-uring@...r.kernel.org, Jason Gunthorpe <jgg@...dia.com>,
Jens Axboe <axboe@...nel.dk>, Johannes Weiner <hannes@...xchg.org>,
John Hubbard <jhubbard@...dia.com>, kasan-dev@...glegroups.com,
kvm@...r.kernel.org, "Liam R. Howlett" <Liam.Howlett@...cle.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-arm-kernel@...s.com, linux-arm-kernel@...ts.infradead.org,
linux-crypto@...r.kernel.org, linux-ide@...r.kernel.org,
linux-kselftest@...r.kernel.org, linux-mips@...r.kernel.org,
linux-mmc@...r.kernel.org, linux-mm@...ck.org,
linux-riscv@...ts.infradead.org, linux-s390@...r.kernel.org,
linux-scsi@...r.kernel.org,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Marco Elver <elver@...gle.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Michal Hocko <mhocko@...e.com>, Mike Rapoport <rppt@...nel.org>,
Muchun Song <muchun.song@...ux.dev>, netdev@...r.kernel.org,
Oscar Salvador <osalvador@...e.de>, Peter Xu <peterx@...hat.com>,
Robin Murphy <robin.murphy@....com>,
Suren Baghdasaryan <surenb@...gle.com>, Tejun Heo <tj@...nel.org>,
virtualization@...ts.linux.dev, Vlastimil Babka <vbabka@...e.cz>,
wireguard@...ts.zx2c4.com, x86@...nel.org, Zi Yan <ziy@...dia.com>
Subject: Re: [PATCH RFC 21/35] mm/cma: refuse handing out non-contiguous page
ranges
Hi David,
On Thu, Aug 21, 2025 at 10:06:47PM +0200, David Hildenbrand wrote:
> Let's disallow handing out PFN ranges with non-contiguous pages, so we
> can remove the nth-page usage in __cma_alloc(), and so any callers don't
> have to worry about that either when wanting to blindly iterate pages.
>
> This is really only a problem in configs with SPARSEMEM but without
> SPARSEMEM_VMEMMAP, and only when we would cross memory sections in some
> cases.
>
> Will this cause harm? Probably not, because it's mostly 32bit that does
> not support SPARSEMEM_VMEMMAP. If this ever becomes a problem we could
> look into allocating the memmap for the memory sections spanned by a
> single CMA region in one go from memblock.
>
> Signed-off-by: David Hildenbrand <david@...hat.com>
> ---
> include/linux/mm.h | 6 ++++++
> mm/cma.c | 36 +++++++++++++++++++++++-------------
> mm/util.c | 33 +++++++++++++++++++++++++++++++++
> 3 files changed, 62 insertions(+), 13 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360b72cb05c..f59ad1f9fc792 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -209,9 +209,15 @@ extern unsigned long sysctl_user_reserve_kbytes;
> extern unsigned long sysctl_admin_reserve_kbytes;
>
> #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages);
> #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
> #else
> #define nth_page(page,n) ((page) + (n))
> +static inline bool page_range_contiguous(const struct page *page,
> + unsigned long nr_pages)
> +{
> + return true;
> +}
> #endif
>
> /* to align the pointer to the (next) page boundary */
> diff --git a/mm/cma.c b/mm/cma.c
> index 2ffa4befb99ab..1119fa2830008 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -780,10 +780,8 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> unsigned long count, unsigned int align,
> struct page **pagep, gfp_t gfp)
> {
> - unsigned long mask, offset;
> - unsigned long pfn = -1;
> - unsigned long start = 0;
> unsigned long bitmap_maxno, bitmap_no, bitmap_count;
> + unsigned long start, pfn, mask, offset;
> int ret = -EBUSY;
> struct page *page = NULL;
>
> @@ -795,7 +793,7 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> if (bitmap_count > bitmap_maxno)
> goto out;
>
> - for (;;) {
> + for (start = 0; ; start = bitmap_no + mask + 1) {
> spin_lock_irq(&cma->lock);
> /*
> * If the request is larger than the available number
> @@ -812,6 +810,22 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> spin_unlock_irq(&cma->lock);
> break;
> }
> +
> + pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> + page = pfn_to_page(pfn);
> +
> + /*
> + * Do not hand out page ranges that are not contiguous, so
> + * callers can just iterate the pages without having to worry
> + * about these corner cases.
> + */
> + if (!page_range_contiguous(page, count)) {
> + spin_unlock_irq(&cma->lock);
> + pr_warn_ratelimited("%s: %s: skipping incompatible area [0x%lx-0x%lx]",
> + __func__, cma->name, pfn, pfn + count - 1);
> + continue;
> + }
> +
> bitmap_set(cmr->bitmap, bitmap_no, bitmap_count);
> cma->available_count -= count;
> /*
> @@ -821,29 +835,25 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
> */
> spin_unlock_irq(&cma->lock);
>
> - pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> mutex_lock(&cma->alloc_mutex);
> ret = alloc_contig_range(pfn, pfn + count, ACR_FLAGS_CMA, gfp);
> mutex_unlock(&cma->alloc_mutex);
> - if (ret == 0) {
> - page = pfn_to_page(pfn);
> + if (!ret)
> break;
> - }
>
> cma_clear_bitmap(cma, cmr, pfn, count);
> if (ret != -EBUSY)
> break;
>
> pr_debug("%s(): memory range at pfn 0x%lx %p is busy, retrying\n",
> - __func__, pfn, pfn_to_page(pfn));
> + __func__, pfn, page);
>
> trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
Nitpick: I think you already have the page here.
> count, align);
> - /* try again with a bit different memory target */
> - start = bitmap_no + mask + 1;
> }
> out:
> - *pagep = page;
> + if (!ret)
> + *pagep = page;
> return ret;
> }
>
> @@ -882,7 +892,7 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> */
> if (page) {
> for (i = 0; i < count; i++)
> - page_kasan_tag_reset(nth_page(page, i));
> + page_kasan_tag_reset(page + i);
Had a look at it, not very familiar with CMA, but the changes look equivalent to
what was before. Not sure that's worth a Reviewed-by tag, but here it in case
you want to add it:
Reviewed-by: Alexandru Elisei <alexandru.elisei@....com>
Just so I can better understand the problem being fixed, I guess you can have
two consecutive pfns with non-consecutive associated struct page if you have two
adjacent memory sections spanning the same physical memory region, is that
correct?
Thanks,
Alex
> }
>
> if (ret && !(gfp & __GFP_NOWARN)) {
> diff --git a/mm/util.c b/mm/util.c
> index d235b74f7aff7..0bf349b19b652 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -1280,4 +1280,37 @@ unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte,
> {
> return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0);
> }
> +
> +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
> +/**
> + * page_range_contiguous - test whether the page range is contiguous
> + * @page: the start of the page range.
> + * @nr_pages: the number of pages in the range.
> + *
> + * Test whether the page range is contiguous, such that they can be iterated
> + * naively, corresponding to iterating a contiguous PFN range.
> + *
> + * This function should primarily only be used for debug checks, or when
> + * working with page ranges that are not naturally contiguous (e.g., pages
> + * within a folio are).
> + *
> + * Returns true if contiguous, otherwise false.
> + */
> +bool page_range_contiguous(const struct page *page, unsigned long nr_pages)
> +{
> + const unsigned long start_pfn = page_to_pfn(page);
> + const unsigned long end_pfn = start_pfn + nr_pages;
> + unsigned long pfn;
> +
> + /*
> + * The memmap is allocated per memory section. We need to check
> + * each involved memory section once.
> + */
> + for (pfn = ALIGN(start_pfn, PAGES_PER_SECTION);
> + pfn < end_pfn; pfn += PAGES_PER_SECTION)
> + if (unlikely(page + (pfn - start_pfn) != pfn_to_page(pfn)))
> + return false;
> + return true;
> +}
> +#endif
> #endif /* CONFIG_MMU */
> --
> 2.50.1
>
>
Powered by blists - more mailing lists