linux-kernel - Re: [PATCH 1/4] mm: introduce cma_alloc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201125201258.GB1484898@google.com>
Date:   Wed, 25 Nov 2020 12:12:58 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     David Hildenbrand <david@...hat.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>, hyesoo.yu@...sung.com,
        willy@...radead.org, iamjoonsoo.kim@....com, vbabka@...e.cz,
        surenb@...gle.com, pullip.cho@...sung.com, joaodias@...gle.com,
        hridya@...gle.com, sumit.semwal@...aro.org, john.stultz@...aro.org,
        Brian.Starkey@....com, linux-media@...r.kernel.org,
        devicetree@...r.kernel.org, robh@...nel.org,
        christian.koenig@....com, linaro-mm-sig@...ts.linaro.org
Subject: Re: [PATCH 1/4] mm: introduce cma_alloc_bulk API

On Mon, Nov 23, 2020 at 03:15:37PM +0100, David Hildenbrand wrote:
> On 17.11.20 19:19, Minchan Kim wrote:
> > There is a need for special HW to require bulk allocation of
> > high-order pages. For example, 4800 * order-4 pages, which
> > would be minimum, sometimes, it requires more.
> > 
> > To meet the requirement, a option reserves 300M CMA area and
> > requests the whole 300M contiguous memory. However, it doesn't
> > work if even one of those pages in the range is long-term pinned
> > directly or indirectly. The other option is to ask higher-order
> > size (e.g., 2M) than requested order(64K) repeatedly until driver
> > could gather necessary amount of memory. Basically, this approach
> > makes the allocation very slow due to cma_alloc's function
> > slowness and it could be stuck on one of the pageblocks if it
> > encounters unmigratable page.
> > 
> > To solve the issue, this patch introduces cma_alloc_bulk.
> > 
> > 	int cma_alloc_bulk(struct cma *cma, unsigned int align,
> > 		gfp_t gfp_mask, unsigned int order, size_t nr_requests,
> > 		struct page **page_array, size_t *nr_allocated);
> > 
> > Most parameters are same with cma_alloc but it additionally passes
> > vector array to store allocated memory. What's different with cma_alloc
> > is it will skip pageblocks without waiting/stopping if it has unmovable
> > page so that API continues to scan other pageblocks to find requested
> > order page.
> > 
> > cma_alloc_bulk is best effort approach in that it skips some pageblocks
> > if they have unmovable pages unlike cma_alloc. It doesn't need to be
> > perfect from the beginning at the cost of performance. Thus, the API
> > takes gfp_t to support __GFP_NORETRY which is propagated into
> > alloc_contig_page to avoid significat overhead functions to inrecase
> > CMA allocation success ratio(e.g., migration retrial, PCP, LRU draining
> > per pageblock) at the cost of less allocation success ratio.
> > If the caller couldn't allocate enough pages with __GFP_NORETRY, they
> > could call it without __GFP_NORETRY to increase success ratio this time
> > if they are okay to expense the overhead for the success ratio.
> 
> I'm not a friend of connecting __GFP_NORETRY  to PCP and LRU draining.

I was also not a fan of the gfp flags in the cma funcions since it could
cause misunderstanding easily but saw taling about brining back gfp_t
into cma_alloc. Since It seems to be dropped, let's use other term.

diff --git a/mm/cma.c b/mm/cma.c
index 7c11ec2dc04c..806280050307 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -505,7 +505,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  *
  * @cma:       contiguous memory region for which the allocation is performed.
  * @align:     requested alignment of pages (in PAGE_SIZE order).
- * @gfp_mask:  memory allocation flags
+ * @fast:      will skip costly opeartions if it's true.
  * @order:     requested page order
  * @nr_requests: the number of 2^order pages requested to be allocated as input,
  * @page_array:        page_array pointer to store allocated pages (must have space
@@ -513,10 +513,10 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  * @nr_allocated: the number of 2^order pages allocated as output
  *
  * This function tries to allocate up to @nr_requests @order pages on specific
- * contiguous memory area. If @gfp_mask has __GFP_NORETRY, it will avoid costly
- * functions to increase allocation success ratio so it will be fast but might
- * return less than requested number of pages. User could retry with
- * !__GFP_NORETRY if it is needed.
+ * contiguous memory area. If @fast has true, it will avoid costly functions
+ * to increase allocation success ratio so it will be faster but might return
+ * less than requested number of pages. User could retry it with true if it is
+ * needed.
  *
  * Return: it will return 0 only if all pages requested by @nr_requestsed are
  * allocated. Otherwise, it returns negative error code.
@@ -525,7 +525,7 @@ struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align,
  * how many @order pages are allocated and free those pages when they are not
  * needed.
  */
-int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
+int cma_alloc_bulk(struct cma *cma, unsigned int align, bool fast,
                        unsigned int order, size_t nr_requests,
                        struct page **page_array, size_t *nr_allocated)
 {
@@ -538,8 +538,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
        unsigned long start = 0;
        unsigned long bitmap_maxno, bitmap_no, bitmap_count;
        struct page *page = NULL;
-       gfp_t gfp = GFP_KERNEL|__GFP_NOWARN|gfp_mask;
-
+       enum alloc_contig_mode mode = fast ? ALLOC_CONTIG_FAST :
+                                               ALLOC_CONTIG_NORMAL;
        *nr_allocated = 0;
        if (!cma || !cma->count || !cma->bitmap || !page_array)
                return -EINVAL;
@@ -576,7 +576,8 @@ int cma_alloc_bulk(struct cma *cma, unsigned int align, gfp_t gfp_mask,
                mutex_unlock(&cma->lock);

                pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
-               ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA, gfp);
+               ret = alloc_contig_range(pfn, pfn + nr_pages, MIGRATE_CMA,
+                                               GFP_KERNEL|__GFP_NOWARN, mode);
                if (ret) {
                        cma_clear_bitmap(cma, pfn, nr_pages);
                        if (ret != -EBUSY)



> Also, gfp flags apply mostly to compaction (e.g., how to allocate free
> pages for migration), so this seems a little wrong.
> 
> Can we instead introduce
> 
> enum alloc_contig_mode {
> 	/*
> 	 * Normal mode:
> 	 *
> 	 * Retry page migration 5 times, ... TBD
> 	 *
> 	 */
> 	ALLOC_CONTIG_NORMAL = 0,
> 	/*
> 	 * Fast mode: e.g., used for bulk allocations.
>          *
> 	 * Don't retry page migration if it fails, don't drain PCP
>          * lists, don't drain LRU.
> 	 */
> 	ALLOC_CONTIG_FAST,
> };

Yeah, the mode is better. Let's have it as preparation patch.