[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <70ad632d-88f3-db6d-3493-b09451150298@collabora.com>
Date: Wed, 16 Sep 2020 22:59:48 +0100
From: Guillaume Tucker <guillaume.tucker@...labora.com>
To: Mike Kravetz <mike.kravetz@...cle.com>,
"Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>,
"linux-mm@...ck.org" <linux-mm@...ck.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-mips@...r.kernel.org" <linux-mips@...r.kernel.org>
Cc: Aslan Bakirov <aslan@...com>, Joonsoo Kim <js1304@...il.com>,
Rik van Riel <riel@...riel.com>,
Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Will Deacon <will@...nel.org>, Roman Gushchin <guro@...com>,
Mike Rapoport <rppt@...nel.org>, kernelci-results@...ups.io
Subject: Re: [PATCH] cma: make number of CMA areas dynamic, remove
CONFIG_CMA_AREAS
On 16/09/2020 17:30, Mike Kravetz wrote:
> On 9/16/20 2:14 AM, Song Bao Hua (Barry Song) wrote:
>>>> -----Original Message-----
>>>> From: Mike Kravetz [mailto:mike.kravetz@...cle.com]
>>>> Sent: Wednesday, September 16, 2020 8:57 AM
>>>> To: linux-mm@...ck.org; linux-kernel@...r.kernel.org;
>>>> linux-arm-kernel@...ts.infradead.org; linux-mips@...r.kernel.org
>>>> Cc: Roman Gushchin <guro@...com>; Song Bao Hua (Barry Song)
>>>> <song.bao.hua@...ilicon.com>; Mike Rapoport <rppt@...nel.org>; Joonsoo
>>>> Kim <js1304@...il.com>; Rik van Riel <riel@...riel.com>; Aslan Bakirov
>>>> <aslan@...com>; Michal Hocko <mhocko@...nel.org>; Andrew Morton
>>>> <akpm@...ux-foundation.org>; Mike Kravetz <mike.kravetz@...cle.com>
>>>> Subject: [PATCH] cma: make number of CMA areas dynamic, remove
>>>> CONFIG_CMA_AREAS
>>>>
>>>> The number of distinct CMA areas is limited by the constant
>>>> CONFIG_CMA_AREAS. In most environments, this was set to a default
>>>> value of 7. Not too long ago, support was added to allocate hugetlb
>>>> gigantic pages from CMA. More recent changes to make
>>> dma_alloc_coherent
>>>> NUMA-aware on arm64 added more potential users of CMA areas. Along
>>>> with the dma_alloc_coherent changes, the default value of CMA_AREAS
>>>> was bumped up to 19 if NUMA is enabled.
>>>>
>>>> It seems that the number of CMA users is likely to grow. Instead of
>>>> using a static array for cma areas, use a simple linked list. These
>>>> areas are used before normal memory allocators, so use the memblock
>>>> allocator.
>>>>
>>>> Acked-by: Roman Gushchin <guro@...com>
>>>> Signed-off-by: Mike Kravetz <mike.kravetz@...cle.com>
>>>> ---
>>>> rfc->v1
>>>> - Made minor changes suggested by Song Bao Hua (Barry Song)
>>>> - Removed check for late calls to cma_init_reserved_mem that was part
>>>> of RFC.
>>>> - Added ACK from Roman Gushchin
>>>> - Still in need of arm testing
>>>
>>> Unfortunately, the test result on my arm64 board is negative, Linux can't boot
>>> after applying
>>> this patch.
>>>
>>> I guess we have to hold on this patch for a while till this is fixed. BTW, Mike, do
>>> you have
>>> a qemu-based arm64 numa system to debug? It is very easy to reproduce, we
>>> don't need to
>>> use hugetlb_cma and pernuma_cma. Just the default cma will make the boot
>>> hang.
>>
>> Hi Mike,
>> I spent some time on debugging the boot issue and sent a patch here:
>> https://lore.kernel.org/linux-mm/20200916085933.25220-1-song.bao.hua@hisilicon.com/
>> All details and knic oops can be found there.
>> pls feel free to merge my patch into your v2 if you want. And we probably need ack from
>> arm maintainers.
>>
>> Also, +Will,
>>
>> Hi Will, the whole story is that Mike tried to remove the cma array with CONFIG_CMA_AREAS
>> and moved to use memblock_alloc() to allocate cma area, so that the number of cma areas
>> could be dynamic. It turns out it causes a kernel panic on arm64 during system boot as the
>> returned address from memblock_alloc is invalid before paging_init() is done on arm64.
>>
>
> Thank you!
>
> Based on your analysis, I am concerned that other architectures may also
> have issues.
>
> Andrew,
> I suggest we remove this patch from your tree. I will audit all architectures
> which enable CMA and look for similar issues there. Will then merge Barry's
> patch into a V2 with any other arch specific changes.
FYI This was also bisected on kernelci.org[1] and it landed on
this commit: c999bd436fe9 ("mm/cma: make number of CMA areas
dynamic, remove CONFIG_CMA_AREAS"). Only arm and arm64 seem to
be affected, and not with all the builds:
https://kernelci.org/test/job/next/branch/master/kernel/next-20200916/plan/baseline/
The list of failures above might help someone debug the issue
with a platform they have at hand.
Guillaume
[1] https://groups.io/g/kernelci-results-staging/message/2027
Powered by blists - more mailing lists