linux-kernel - Re: [PATCH v2 2/2] mm: cma: try next MAX_ORDER_NR

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <517e1ea1-f826-228b-16a0-da1dc76017cc@redhat.com>
Date:   Tue, 25 Jan 2022 17:33:49 +0100
From:   David Hildenbrand <david@...hat.com>
To:     Dong Aisheng <aisheng.dong@....com>, linux-mm@...ck.org
Cc:     linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        dongas86@...il.com, jason.hui.liu@....com, leoyang.li@....com,
        abel.vesa@....com, shawnguo@...nel.org, linux-imx@....com,
        akpm@...ux-foundation.org, m.szyprowski@...sung.com,
        lecopzer.chen@...iatek.com, vbabka@...e.cz, stable@...r.kernel.org,
        shijie.qin@....com
Subject: Re: [PATCH v2 2/2] mm: cma: try next MAX_ORDER_NR_PAGES during retry

On 12.01.22 14:15, Dong Aisheng wrote:
> On an ARMv7 platform with 32M pageblock(MAX_ORDER 14), we observed a

Did you actually intend to talk about pageblocks here (and below)?

I assume you have to be clearer here that you talk about the maximum
allocation granularity, which is usually bigger than actual pageblock size.

> huge number of repeat retries of CMA allocation (1k+) during booting
> when allocating one page for each of 3 mmc instance probe.
> 
> This is caused by CMA now supports cocurrent allocation since commit
> a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
> The pageblock or (MAX_ORDER -1) from which we are trying to allocate
> memory may have already been acquired and isolated by others.
> Current cma_alloc() will then retry the next area by the step of
> bitmap_no + mask + 1 which are very likely within the same isolated range
> and fail again. So when the pageblock or MAX_ORDER is big (e.g. 8192),
> keep retrying in a small step become meaningless because it will be known
> to fail at a huge number of times due to the pageblock has been isolated
> by others, especially when allocating only one or two pages.
> 
> Instread of looping in the same pageblock and wasting CPU mips a lot,
> especially for big pageblock system (e.g. 16M or 32M),
> we try the next MAX_ORDER_NR_PAGES directly.
> 
> Doing this way can greatly mitigate the situtation.
> 
> Below is the original error log during booting:
> [    2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [    2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
> [    2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> [    2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
> .... (+1K retries)
> 
> After fix, the 1200+ reties can be reduced to 0.
> Another test running 8 VPU decoder in parallel shows that 1500+ retries
> dropped to ~145.
> 
> IOW this patch can improve the CMA allocation speed a lot when there're
> enough CMA memory by reducing retries significantly.
> 
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Marek Szyprowski <m.szyprowski@...sung.com>
> Cc: Lecopzer Chen <lecopzer.chen@...iatek.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> CC: stable@...r.kernel.org # 5.11+
> Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
> Signed-off-by: Dong Aisheng <aisheng.dong@....com>
> ---
> v1->v2:
>  * change to align with MAX_ORDER_NR_PAGES instead of pageblock_nr_pages
> ---
>  mm/cma.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 1c13a729d274..1251f65e2364 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -500,7 +500,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
>  		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
>  					   count, align);
>  		/* try again with a bit different memory target */
> -		start = bitmap_no + mask + 1;
> +		start = ALIGN(bitmap_no + mask + 1,
> +			      MAX_ORDER_NR_PAGES >> cma->order_per_bit);

Mind giving the reader a hint in the code why we went for
MAX_ORDER_NR_PAGES?

What would happen if the CMA granularity is bigger than
MAX_ORDER_NR_PAGES? I'd assume no harm done, as we'd try aligning to 0.

-- 
Thanks,

David / dhildenb