linux-kernel - Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20a21d17-77d8-4120-8643-c575304c39f2@126.com>
Date: Sun, 9 Feb 2025 18:49:15 +0800
From: Ge Yang <yangge1116@....com>
To: Barry Song <21cnbao@...il.com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org, david@...hat.com,
 baolin.wang@...ux.alibaba.com, aisheng.dong@....com, liuzixing@...on.cn
Subject: Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory
 allocation for the CMA



在 2025/2/9 5:34, Barry Song 写道:
> On Sat, Feb 8, 2025 at 9:50 PM Ge Yang <yangge1116@....com> wrote:
>>
>>
>>
>> 在 2025/1/28 17:58, Barry Song 写道:
>>> On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@....com> wrote:
>>>>
>>>> From: yangge <yangge1116@....com>
>>>>
>>>> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>>>> simply reverts to the original method of using the cma_mutex to ensure
>>>> that alloc_contig_range() runs sequentially. This change was made to avoid
>>>> concurrency allocation failures. However, it can negatively impact
>>>> performance when concurrent allocation of CMA memory is required.
>>>
>>> Do we have some data?
>> Yes, I will add it in the next version, thanks.
>>>
>>>>
>>>> To address this issue, we could introduce an API for concurrency settings,
>>>> allowing users to decide whether their CMA can perform concurrent memory
>>>> allocations or not.
>>>
>>> Who is the intended user of cma_set_concurrency?
>> We have some drivers that use cma_set_concurrency(), but they have not
>> yet been merged into the mainline. The cma_alloc_mem() function in the
>> mainline also supports concurrent allocation of CMA memory. By applying
>> this patch, we can also achieve significant performance improvements in
>> certain scenarios. I will provide performance data in the next version.
>> I also feel it is somewhat
>>> unsafe since cma->concurr_alloc is not protected by any locks.
>> Ok, thanks.
>>>
>>> Will a user setting cma->concurr_alloc = 1 encounter the original issue that
>>> commit 60a60e32cf91 was attempting to fix?
>>>
>> Yes, if a user encounters the issue described in commit 60a60e32cf91,
>> they will not be able to set cma->concurr_alloc to 1.
> 
> A user who hasn't encountered a problem yet doesn't mean they won't
> encounter it; it most likely just means the testing time hasn't been long
> enough.
> 
> Is it possible to implement a per-CMA lock or range lock that simultaneously
> improves performance and prevents the original issue that commit
> 60a60e32cf91 aimed to fix?
> 
Using per-CMA locks can improve performance and prevent the original 
issue. I am currently preparing the patch. Thanks.
> I strongly believe that cma->concurr_alloc is not the right approach. Let's
> not waste our time on this kind of hack or workaround.  Instead, we should
> find a proper fix that remains transparent to users.
> 
>>>>
>>>> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>>>> Signed-off-by: yangge <yangge1116@....com>
>>>> Cc: <stable@...r.kernel.org>
>>>> ---
>>>>    include/linux/cma.h |  2 ++
>>>>    mm/cma.c            | 22 ++++++++++++++++++++--
>>>>    mm/cma.h            |  1 +
>>>>    3 files changed, 23 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/cma.h b/include/linux/cma.h
>>>> index d15b64f..2384624 100644
>>>> --- a/include/linux/cma.h
>>>> +++ b/include/linux/cma.h
>>>> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>>>
>>>>    extern void cma_reserve_pages_on_error(struct cma *cma);
>>>>
>>>> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
>>>> +
>>>>    #ifdef CONFIG_CMA
>>>>    struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>>>>    bool cma_free_folio(struct cma *cma, const struct folio *folio);
>>>> diff --git a/mm/cma.c b/mm/cma.c
>>>> index de5bc0c..49a7186 100644
>>>> --- a/mm/cma.c
>>>> +++ b/mm/cma.c
>>>> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>>>>                   spin_unlock_irq(&cma->lock);
>>>>
>>>>                   pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>>>> -               mutex_lock(&cma_mutex);
>>>> +
>>>> +               /*
>>>> +                * If the user sets the concurr_alloc of CMA to true, concurrent
>>>> +                * memory allocation is allowed. If the user sets it to false or
>>>> +                * does not set it, concurrent memory allocation is not allowed.
>>>> +                */
>>>> +               if (!cma->concurr_alloc)
>>>> +                       mutex_lock(&cma_mutex);
>>>>                   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
>>>> -               mutex_unlock(&cma_mutex);
>>>> +               if (!cma->concurr_alloc)
>>>> +                       mutex_unlock(&cma_mutex);
>>>>                   if (ret == 0) {
>>>>                           page = pfn_to_page(pfn);
>>>>                           break;
>>>> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>>>
>>>>           return 0;
>>>>    }
>>>> +
>>>> +bool cma_set_concurrency(struct cma *cma, bool concurrency)
>>>> +{
>>>> +       if (!cma)
>>>> +               return false;
>>>> +
>>>> +       cma->concurr_alloc = concurrency;
>>>> +
>>>> +       return true;
>>>> +}
>>>> diff --git a/mm/cma.h b/mm/cma.h
>>>> index 8485ef8..30f489d 100644
>>>> --- a/mm/cma.h
>>>> +++ b/mm/cma.h
>>>> @@ -16,6 +16,7 @@ struct cma {
>>>>           unsigned long   *bitmap;
>>>>           unsigned int order_per_bit; /* Order of pages represented by one bit */
>>>>           spinlock_t      lock;
>>>> +       bool concurr_alloc;
>>>>    #ifdef CONFIG_CMA_DEBUGFS
>>>>           struct hlist_head mem_head;
>>>>           spinlock_t mem_head_lock;
>>>> --
>>>> 2.7.4
>>>>
>>>>
>>>
> 
> Thanks
> Barry