linux-kernel - Re: [PATCH] mm/debug_vm_pgtable: Fix corrupted PG_arch_1 by set_pmd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <8e7d7ea3-8412-4c6c-0489-5c9f795a6f35@redhat.com>
Date:   Tue, 6 Jul 2021 15:09:00 +1000
From:   Gavin Shan <gshan@...hat.com>
To:     Anshuman Khandual <anshuman.khandual@....com>, linux-mm@...ck.org
Cc:     linux-kernel@...r.kernel.org, catalin.marinas@....com,
        will@...nel.org, akpm@...ux-foundation.org, shan.gavin@...il.com
Subject: Re: [PATCH] mm/debug_vm_pgtable: Fix corrupted PG_arch_1 by
 set_pmd_at()

Hi Anshuman,

On 7/5/21 1:59 PM, Anshuman Khandual wrote:
> On 7/2/21 4:02 PM, Gavin Shan wrote:
>> There are two addresses selected: random virtual address and physical
>> address corresponding to kernel symbol @start_kernel. During the PMD
>> tests in pmd_advanced_tests(), the physical address is aligned down
>> to the starting address of the huge page, whose size is 512MB on ARM64
>> when we have 64KB base page size. After that, set_pmd_at() is called
>> to populate the PMD entry. PG_arch_1, PG_dcache_clean on ARM64, is
>> set to the page flags. Unforunately, the page, corresponding to the
>> starting address of the huge page could be owned by buddy. It means
>> PG_arch_1 can be unconditionally set to page owned by buddy.
>>
>> Afterwards, the page with PG_arch_1 set is fetched from buddy's free
>> area list, but fails the checking. It leads to the following warning
>> on ARM64:
>>
>>     BUG: Bad page state in process memhog  pfn:08000
>>     page:0000000015c0a628 refcount:0 mapcount:0 \
>>          mapping:0000000000000000 index:0x1 pfn:0x8000
>>     flags: 0x7ffff8000000800(arch_1|node=0|zone=0|lastcpupid=0xfffff)
>>     raw: 07ffff8000000800 dead000000000100 dead000000000122 0000000000000000
>>     raw: 0000000000000001 0000000000000000 00000000ffffffff 0000000000000000
>>     page dumped because: PAGE_FLAGS_CHECK_AT_PREP flag(s) set
> 
> Does this problem happen right after the boot ? OR you ran some tests
> and workloads to trigger this ? IIRC never seen this before on arm64.
> Does this happen on other archs too ?
> 

The page flag (PG_arch_1) is corrupted during boot on ARM64 where
64KB base page size is selected, but the failing page check happens
when the page is pulled from buddy's free area list by "memhog".
I don't think other platform has same issue.

>>
>> This fixes the issue by calling flush_dcache_page() after each call
>> to set_{pud, pmd, pte}_at() because PG_arch_1 isn't needed in any case.
> 
> This (arm64 specific solution) might cause some side effects on other
> platforms ? The solution here needs to be generic enough. I will take
> a look into this patch but probably later this week or next week.
> 

Apart from the overhead of flushing the dcache introduced by flush_dcache_page().
I don't think there is any side-effect. By the way, I'm working on a series
to fix this issue and another issue. I will post the series for review pretty
soon and it's going to fix the following issues:

(1) Current code is organized in relaxed fashion. All information are maintained
     in variables in debug_vm_pgtable(). The variables are passed to test functions.
     It make the code hard to be maintained in long term. So I will introduce a
     dedicated data struct (struct vm_pgtable_debug), as place holder for various
     information.

(2) With the data struct, I'm able to allocate page, to be used by set_{pud, pmd, pte}_at()
     because the target page is accessed on ARM64. The PG_arch_1 flag is set to
     the page and the corresponding iCache is flush if execution permission is given.
     There are two issues if the page used by set_{pud, pmd, pte}_at() wasn't allocated
     from buddy: (a) the PG_arch_1 flag corruption as this patch tries to fix; (b) kernel
     crash because of invalid page fault on accessing the target page. The page isn't
     mapped if CONFIG_DEBUG_PAGEALLOC is enabled.

     start_kernel
     mm_init
     mem_init
     memblock_free_all
     free_low_memory_core_early
     __free_memory_core
     __free_pages_memory
     memblock_free_pages
     __free_pages_core
     __free_pages_ok
     free_pages_prepare
     debug_pagealloc_unmap_pages           # The page is unmapped here

Thanks,
Gavin