linux-kernel - RE: [PATCH v1] init: avoid race condition of update page table in kernel init

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <AM9PR08MB7276F15F55066A0C264A4B9CF4BE9@AM9PR08MB7276.eurprd08.prod.outlook.com>
Date:   Wed, 20 Oct 2021 11:54:36 +0000
From:   Jianyong Wu <Jianyong.Wu@....com>
To:     David Hildenbrand <david@...hat.com>,
        "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
        "mhiramat@...nel.org" <mhiramat@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        Anshuman Khandual <Anshuman.Khandual@....com>
CC:     "rostedt@...dmis.org" <rostedt@...dmis.org>,
        "vbabka@...e.cz" <vbabka@...e.cz>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Justin He <Justin.He@....com>, nd <nd@....com>
Subject: RE: [PATCH v1] init: avoid race condition of update page table in
 kernel init



> -----Original Message-----
> From: David Hildenbrand <david@...hat.com>
> Sent: Wednesday, October 20, 2021 5:05 PM
> To: Jianyong Wu <Jianyong.Wu@....com>; akpm@...ux-foundation.org;
> mhiramat@...nel.org; peterz@...radead.org
> Cc: rostedt@...dmis.org; vbabka@...e.cz; linux-kernel@...r.kernel.org;
> Anshuman Khandual <Anshuman.Khandual@....com>; Justin He
> <Justin.He@....com>; nd <nd@....com>
> Subject: Re: [PATCH v1] init: avoid race condition of update page table in
> kernel init
> 
> >> But why does it matter on arm64? Can you describe how the exact race
> >> triggers?
> >
> > I don't know much about how x86 does in memory map. Let me show you
> how the race happens on arm64.
> > When virtio-mem workqueue is triggered, arch_memory_add will be called
> where the related page table will be created. The call chain is
> arch_memory_add->__create_pgd_mapping->alloc_init_pud. As the memory
> add may take for serval seconds, it may be concurrent with mark_rodata_ro,
> in which page tables are created either.
> 
> > The race can occur in alloc_init_pud. See below:
> >
> /***************************************************************
> ************/
> > Virtio-mem workqueue thread
> mark_rodata_ro thread
> > {
> > ...
> >         pudp = pud_set_fixmap_offset(p4dp, addr); // set fixmap
> >         do {
> >                 pud_t old_pud = READ_ONCE(*pudp);
> > ...
> >         } while (pudp++, addr = next, addr != end);                                   pudp =
> pud_set_fixmap_offset;  //set fixmap
> >         pud_clear_fixmap();  // clear fixmap                                              do {
> > }
> pud_t old_pud = READ_ONCE(*pudp);//CRASH
> >
> 
> I still don't quite understand how that race can even exist. I assume it's due
> to the weird semantics of the "fixmap". (whatever that is :) ) I don't see
> anything similar happen on other archs, especially x86-64 and s390x, which
> I'm familiar with.
> 
> s390x similarly to x86-64 code uses a vmem_mutex to serialize add/remove
> in the direct map and a cpa_mutex to serialize attribute changes (and
> splitting of large mappings).
> 

Yeah, I see that there is a spin lock when update page table in x86.
OK, let me poke Anshuman about this.
If this issue won't affect all arches, should we only fix it on arm64? @Anshuman Khandual

Thanks
Jianyong


> The right should be to teach arm64 mmu code that direct mapping updates
> might be concurrent, and that two instances might try messing with the
> fixmap concurrently.
> 
> 
> On a similar topic: I think we might want to reclaim compeltely empty page
> tables when unplugging memory; I suspect that we also have to mess with
> the fixmap then, whem removing page tables. But I feel like the whole fixmap
> machinery is still a big black box for me.
> 
 
> --
> Thanks,
> 
> David / dhildenb