lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20211215080242.3034856-3-aisheng.dong@nxp.com>
Date:   Wed, 15 Dec 2021 16:02:42 +0800
From:   Dong Aisheng <aisheng.dong@....com>
To:     linux-mm@...ck.org
Cc:     linux-kernel@...r.kernel.org, dongas86@...il.com,
        linux-arm-kernel@...ts.infradead.org, jason.hui.liu@....com,
        leoyang.li@....com, abel.vesa@....com, shawnguo@...nel.org,
        linux-imx@....com, akpm@...ux-foundation.org,
        m.szyprowski@...sung.com, lecopzer.chen@...iatek.com,
        david@...hat.com, vbabka@...e.cz, stable@...r.kernel.org,
        shijie.qin@....com, Dong Aisheng <aisheng.dong@....com>
Subject: [PATCH 2/2] mm: cma: try next pageblock during retry

On an ARMv7 platform with 32M pageblock(MAX_ORDER 14), we observed a
huge number of retries of CMA allocation (1k+) during booting when
allocating one page for each of 3 mmc instance probe.

This is caused by CMA now supports cocurrent allocation since commit
a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock").
The pageblock we tried to allocate may have already been
acquired and isolated by others, then cma_alloc() will retry the next
area of the same size by bitmap_no + mask + 1. However, the pageblock
order could be big and pageblock_nr_pages is huge (e.g. 8192),
then keep retrying in a small step become meaningless because
it's likely known to fail again due to within the same pageblock.

Instread of looping in the same pageblock and wasting CPU
mips, especially for big pageblock system (e.g. 16M or 32M),
we try the next pageblock directly.

Doing this way can greatly mitigate the situtation.

Below is the original error log during booting:
[    2.004804] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010318] cma: cma_alloc(cma (ptrval), count 1, align 0)
[    2.010776] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010785] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010793] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010800] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010807] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
[    2.010814] cma: cma_alloc(): memory range at (ptrval) is busy, retrying
.... (+1K retries)

After fix, the 1200+ reties can be reduced to 0.
Another test running 8 VPU decoder in parallel shows that 1500+ retries
dropped to ~145.

IOW this patch can improve the CMA allocation speed a lot when there're
enough CMA memory by reducing retries significantly.

Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: Marek Szyprowski <m.szyprowski@...sung.com>
Cc: Lecopzer Chen <lecopzer.chen@...iatek.com>
Cc: David Hildenbrand <david@...hat.com>
Cc: Vlastimil Babka <vbabka@...e.cz>
CC: stable@...r.kernel.org # 5.11+
Fixes: a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock")
Signed-off-by: Dong Aisheng <aisheng.dong@....com>
---
 mm/cma.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/cma.c b/mm/cma.c
index 1c13a729d274..108a1ceacbe7 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -500,7 +500,9 @@ struct page *cma_alloc(struct cma *cma, unsigned long count,
 		trace_cma_alloc_busy_retry(cma->name, pfn, pfn_to_page(pfn),
 					   count, align);
 		/* try again with a bit different memory target */
-		start = bitmap_no + mask + 1;
+		start = ALIGN(bitmap_no + mask + 1,
+			      pageblock_nr_pages >> cma->order_per_bit);
+
 	}
 
 	trace_cma_alloc_finish(cma->name, pfn, page, count, align);
-- 
2.25.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ