linux-kernel - Re: [PATCH v1] init: avoid race condition of update page table in kernel init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cbe8d847-8a2d-4220-c6a3-775d517e2edd@redhat.com>
Date:   Wed, 20 Oct 2021 11:05:11 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Jianyong Wu <Jianyong.Wu@....com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhiramat@...nel.org" <mhiramat@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>
Cc:     "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Anshuman Khandual <Anshuman.Khandual@....com>,
        Justin He <Justin.He@....com>, nd <nd@....com>
Subject: Re: [PATCH v1] init: avoid race condition of update page table in
 kernel init

>> But why does it matter on arm64? Can you describe how the exact race
>> triggers?
> 
> I don't know much about how x86 does in memory map. Let me show you how the race happens on arm64.
> When virtio-mem workqueue is triggered, arch_memory_add will be called where the related page table will be created. The call chain is arch_memory_add->__create_pgd_mapping->alloc_init_pud. As the memory add may take for serval seconds, it may be concurrent with mark_rodata_ro, in which page tables are created either.

> The race can occur in alloc_init_pud. See below:
> /***************************************************************************/
> Virtio-mem workqueue thread                                                               mark_rodata_ro thread
> {                                                                                                                         
> ...
>         pudp = pud_set_fixmap_offset(p4dp, addr); // set fixmap
>         do {
>                 pud_t old_pud = READ_ONCE(*pudp);                                         
> ...
>         } while (pudp++, addr = next, addr != end);                                   pudp = pud_set_fixmap_offset;  //set fixmap                                
>         pud_clear_fixmap();  // clear fixmap                                              do {
> }                                                                                                                              pud_t old_pud = READ_ONCE(*pudp);//CRASH
> 

I still don't quite understand how that race can even exist. I assume
it's due to the weird semantics of the "fixmap". (whatever that is :) )
I don't see anything similar happen on other archs, especially x86-64
and s390x, which I'm familiar with.

s390x similarly to x86-64 code uses a vmem_mutex to serialize add/remove
in the direct map and a cpa_mutex to serialize attribute changes (and
splitting of large mappings).

The right should be to teach arm64 mmu code that direct mapping updates
might be concurrent, and that two instances might try messing with the
fixmap concurrently.


On a similar topic: I think we might want to reclaim compeltely empty
page tables when unplugging memory; I suspect that we also have to mess
with the fixmap then, whem removing page tables. But I feel like the
whole fixmap machinery is still a big black box for me.

-- 
Thanks,

David / dhildenb