lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3FFB0B7F-058D-4DD6-9F09-D18C8EA4344C@nvidia.com>
Date:   Wed, 30 Mar 2022 18:24:26 -0400
From:   Zi Yan <ziy@...dia.com>
To:     Vlastimil Babka <vbabka@...e.cz>
Cc:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Mel Gorman <mgorman@...hsingularity.net>,
        David Hildenbrand <david@...hat.com>,
        Mike Rapoport <rppt@...ux.ibm.com>,
        Oscar Salvador <osalvador@...e.de>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux-MM <linux-mm@...ck.org>
Subject: Re: [BUG] Crash on x86_32 for: mm: page_alloc: avoid merging non-fallbackable pageblocks with others

On 30 Mar 2022, at 18:15, Vlastimil Babka wrote:

> On 3/31/22 00:03, Zi Yan wrote:
>> On 30 Mar 2022, at 17:43, Vlastimil Babka wrote:
>>
>>> On 3/30/22 22:05, Linus Torvalds wrote:
>>>> On Wed, Mar 30, 2022 at 12:42 PM Steven Rostedt <rostedt@...dmis.org> wrote:
>>>>>
>>>>> I started testing new patches and it crashed when doing the x86-32 test on
>>>>> boot up.
>>>>>
>>>>> Initializing HighMem for node 0 (000375fe:0021ee00)
>>>>> BUG: kernel NULL pointer dereference, address: 00000878
>>>>> #PF: supervisor read access in kernel mode
>>>>> #PF: error_code(0x0000) - not-present page
>>>>> *pdpt = 0000000000000000 *pde = f0000000f000eef3
>>>>> Oops: 0000 [#1] PREEMPT SMP PTI
>>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-test+ #469
>>>>> Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
>>>>> EIP: get_pfnblock_flags_mask+0x2c/0x36
>>>>> Code: 6d ea ff 55 89 e5 56 89 ce 53 8b 18 89 d8 c1 eb 1e e8 f7 fb ff ff 69 db c0 02 00 00 89 c1 89 c2 c1 ea 05 8b 83 7c d7 79 c1 5b <8b> 04 90 d3 e8 21 f0 5e 5d c3 55 89 e5 57 56 89 d6 53 89 c3 64 a1
>>>>
>>>> The whole function is in that Code: thing, and it decodes to:
>>>>
>>>>    0: 55                    push   %ebp
>>>>    1: 89 e5                mov    %esp,%ebp
>>>>    3: 56                    push   %esi
>>>>    4: 89 ce                mov    %ecx,%esi
>>>>    6: 53                    push   %ebx
>>>>    7: 8b 18                mov    (%eax),%ebx
>>>>    9: 89 d8                mov    %ebx,%eax
>>>>    b: c1 eb 1e              shr    $0x1e,%ebx
>>>>    e: e8 f7 fb ff ff        call   0xfffffc0a
>>>>   13: 69 db c0 02 00 00    imul   $0x2c0,%ebx,%ebx
>>>>   19: 89 c1                mov    %eax,%ecx
>>>>   1b: 89 c2                mov    %eax,%edx
>>>>   1d: c1 ea 05              shr    $0x5,%edx
>>>>   20: 8b 83 7c d7 79 c1    mov    -0x3e862884(%ebx),%eax
>>>>   26: 5b                    pop    %ebx
>>>>   27:* 8b 04 90              mov    (%eax,%edx,4),%eax <-- trapping instruction
>>>>   2a: d3 e8                shr    %cl,%eax
>>>>   2c: 21 f0                and    %esi,%eax
>>>>   2e: 5e                    pop    %esi
>>>>   2f: 5d                    pop    %ebp
>>>>   30: c3                    ret
>>>>
>>>> with '%eax' being NULL, and %edx being 0x21e.
>>>>
>>>> (The call seems to be to 'pfn_to_bitidx().isra.0' if my compiler does
>>>> similar code generation, so it's out-of-lined part of pfn_to_bitidx()
>>>> despite being marked inline)
>>>>
>>>> So that oops is that
>>>>
>>>>         word = bitmap[word_bitidx];
>>>>
>>>> line, with 'bitmap' being NULL (and %edx contains 'word_bitidx').
>>>>
>>>> Looking around, your 'config-bad' doesn't even have
>>>> CONFIG_MEMORY_ISOLATION enabled, and so I suspect the culprit is this
>>>> part of the change:
>>>>
>>>> -               if (unlikely(has_isolate_pageblock(zone))) {
>>>>
>>>> which used to always be false for that config, and now the code is
>>>> suddenly enabled.
>>>
>>> If CONFIG_MEMORY_ISOLATION was enabled then the zone layout would be the
>>> same, so I think it's not simply that. I think it's the timing -
>>> has_isolate_pageblock(zone) would only be possible to become true later
>>> in runtime when some isolation is ongoing, but here it seems we are
>>> still in the early boot. Probably at a boundary of highmem with another
>>> zone that doesn't have the pageblock bitmap yet initialized? While later
>>> it would have, and all would be fine.
>>
>> It seems so based on the boot log from Steven.
>>
>>>
>>> As Zi Yan said, the usual merging code will, through page_is_buddy()
>>> find safely enough the buddy is not applicable, so I agree with his
>>> patch direction. Seems this also shows the code tried to become too
>>> smart and for the next merge window we should try just move the
>>> migratetype checks into the main while loop (under something like "if
>>> (order >= max_order)") and simplify the function a lot, hopefully with
>>> negligible perf impact.
>>
>> Something like this?
>
> Exactly (with some nits).

Thanks for the comment. Should I send the patch below with nits fix to
supersede the patch I just sent?

>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 6c6af8658775..568ecaf5700d 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1053,7 +1053,6 @@ static inline void __free_one_page(struct page *page,
>>                 int migratetype, fpi_t fpi_flags)
>>  {
>>         struct capture_control *capc = task_capc(zone);
>> -       unsigned int max_order = pageblock_order;
>>         unsigned long buddy_pfn;
>>         unsigned long combined_pfn;
>>         struct page *buddy;
>> @@ -1069,8 +1068,7 @@ static inline void __free_one_page(struct page *page,
>>         VM_BUG_ON_PAGE(pfn & ((1 << order) - 1), page);
>>         VM_BUG_ON_PAGE(bad_range(zone, page), page);
>>
>> -continue_merging:
>> -       while (order < max_order) {
>> +       while (order < MAX_ORDER - 1) {
>>                 if (compaction_capture(capc, page, order, migratetype)) {
>>                         __mod_zone_freepage_state(zone, -(1 << order),
>>                                                                 migratetype);
>> @@ -1081,6 +1079,27 @@ static inline void __free_one_page(struct page *page,
>>
>>                 if (!page_is_buddy(page, buddy, order))
>>                         goto done_merging;
>> +
>> +               if (order >= pageblock_order) {
>
> Would use unlikely() here.
>
>> +                       /* If we are here, it means order is >= pageblock_order.
>
> Can drop this like as it became quite obvious now :)
>
>> +                        * We want to prevent merge between freepages on pageblock
>> +                        * without fallbacks and normal pageblock. Without this,
>> +                        * pageblock isolation could cause incorrect freepage or CMA
>> +                        * accounting or HIGHATOMIC accounting.
>> +                        *
>> +                        * We don't want to hit this code for the more frequent
>> +                        * low-order merging.
>
> The last two lines should be also dropped now.
>
>> +                        */
>> +                       int buddy_mt;
>> +
>> +                       buddy_mt = get_pageblock_migratetype(buddy);
>
> Can be now single line.
>
>> +
>> +                       if (migratetype != buddy_mt
>> +                                       && (!migratetype_is_mergeable(migratetype) ||
>> +                                               !migratetype_is_mergeable(buddy_mt)))
>> +                               goto done_merging;
>> +               }
>> +
>>                 /*
>>                  * Our buddy is free or it is CONFIG_DEBUG_PAGEALLOC guard page,
>>                  * merge with it and move up one order.
>> @@ -1094,32 +1113,6 @@ static inline void __free_one_page(struct page *page,
>>                 pfn = combined_pfn;
>>                 order++;
>>         }
>> -       if (order < MAX_ORDER - 1) {
>> -               /* If we are here, it means order is >= pageblock_order.
>> -                * We want to prevent merge between freepages on pageblock
>> -                * without fallbacks and normal pageblock. Without this,
>> -                * pageblock isolation could cause incorrect freepage or CMA
>> -                * accounting or HIGHATOMIC accounting.
>> -                *
>> -                * We don't want to hit this code for the more frequent
>> -                * low-order merging.
>> -                */
>> -               int buddy_mt;
>> -
>> -               buddy_pfn = __find_buddy_pfn(pfn, order);
>> -               buddy = page + (buddy_pfn - pfn);
>> -
>> -               if (!page_is_buddy(page, buddy, order))
>> -                       goto done_merging;
>> -               buddy_mt = get_pageblock_migratetype(buddy);
>> -
>> -               if (migratetype != buddy_mt
>> -                               && (!migratetype_is_mergeable(migratetype) ||
>> -                                       !migratetype_is_mergeable(buddy_mt)))
>> -                       goto done_merging;
>> -               max_order = order + 1;
>> -               goto continue_merging;
>> -       }
>>
>>  done_merging:
>>         set_buddy_order(page, order);
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi

Download attachment "signature.asc" of type "application/pgp-signature" (855 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ