[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <44d50df7-9940-6a37-179f-ee2aa6cf34b9@oracle.com>
Date: Tue, 23 Feb 2021 17:29:48 -0800
From: Mike Kravetz <mike.kravetz@...cle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Gerald Schaefer <gerald.schaefer@...ux.ibm.com>,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Michal Hocko <mhocko@...e.com>,
Heiko Carstens <hca@...ux.ibm.com>,
Sven Schnelle <svens@...ux.ibm.com>
Subject: Re: [RFC] linux-next panic in hugepage_subpool_put_pages()
On 2/23/21 3:58 PM, Andrew Morton wrote:
> On Tue, 23 Feb 2021 10:06:12 -0800 Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
>> On 2/23/21 6:57 AM, Gerald Schaefer wrote:
>>> Hi,
>>>
>>> LTP triggered a panic on s390 in hugepage_subpool_put_pages() with
>>> linux-next 5.12.0-20210222, see below.
>>>
>>> It crashes on the spin_lock(&spool->lock) at the beginning, because the
>>> passed-in *spool points to 0000004e00000000, which is not addressable
>>> memory. It rather looks like some flags and not a proper address. I suspect
>>> some relation to the recent rework in that area, e.g. commit f1280272ae4d
>>> ("hugetlb: use page.private for hugetlb specific page flags").
>>>
>>> __free_huge_page() calls hugepage_subpool_put_pages() and takes *spool from
>>> hugetlb_page_subpool(page), which was changed by that commit to use
>>> page[1]->private now.
>>>
>>
>> Thanks Gerald,
>>
>> Yes, I believe f1280272ae4d is the root cause of this issue. In that
>> commit, the subpool pointer was moved from page->private of the head
>> page to page->private of the first subpage. The page allocator will
>> initialize (zero) the private field of the head page, but not that of
>> subpages. So, that bad subpool pointer is likely an old page->private
>> value for the page.
>>
>> That strange call path from set_max_huge_pages to __free_huge_page is
>> actually how the code puts newly allocated pages on it's interfal free
>> list.
>>
>> I will do a bit more verification and put together a patch (it should
>> be simple).
>
> There's also Michel's documentation request:
> https://lkml.kernel.org/r/20210127102645.GH827@dhcp22.suse.cz
>
Thanks Andrew, I forgot about that.
It looks like the patch which added synchronization documentation requested
by Michal may not have be picked up.
https://lore.kernel.org/linux-mm/9183032f-3d77-d9e3-9cc8-fbaf3e892022@oracle.com/
If you still need to add that patch, I could redo and add the page[1]->private
documentation request mentioned here. Just let me know what is the easiest for
you.
--
Mike Kravetz
Powered by blists - more mailing lists