[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <AM9PR08MB7276F15F55066A0C264A4B9CF4BE9@AM9PR08MB7276.eurprd08.prod.outlook.com>
Date: Wed, 20 Oct 2021 11:54:36 +0000
From: Jianyong Wu <Jianyong.Wu@....com>
To: David Hildenbrand <david@...hat.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>,
"mhiramat@...nel.org" <mhiramat@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
Anshuman Khandual <Anshuman.Khandual@....com>
CC: "rostedt@...dmis.org" <rostedt@...dmis.org>,
"vbabka@...e.cz" <vbabka@...e.cz>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Justin He <Justin.He@....com>, nd <nd@....com>
Subject: RE: [PATCH v1] init: avoid race condition of update page table in
kernel init
> -----Original Message-----
> From: David Hildenbrand <david@...hat.com>
> Sent: Wednesday, October 20, 2021 5:05 PM
> To: Jianyong Wu <Jianyong.Wu@....com>; akpm@...ux-foundation.org;
> mhiramat@...nel.org; peterz@...radead.org
> Cc: rostedt@...dmis.org; vbabka@...e.cz; linux-kernel@...r.kernel.org;
> Anshuman Khandual <Anshuman.Khandual@....com>; Justin He
> <Justin.He@....com>; nd <nd@....com>
> Subject: Re: [PATCH v1] init: avoid race condition of update page table in
> kernel init
>
> >> But why does it matter on arm64? Can you describe how the exact race
> >> triggers?
> >
> > I don't know much about how x86 does in memory map. Let me show you
> how the race happens on arm64.
> > When virtio-mem workqueue is triggered, arch_memory_add will be called
> where the related page table will be created. The call chain is
> arch_memory_add->__create_pgd_mapping->alloc_init_pud. As the memory
> add may take for serval seconds, it may be concurrent with mark_rodata_ro,
> in which page tables are created either.
>
> > The race can occur in alloc_init_pud. See below:
> >
> /***************************************************************
> ************/
> > Virtio-mem workqueue thread
> mark_rodata_ro thread
> > {
> > ...
> > pudp = pud_set_fixmap_offset(p4dp, addr); // set fixmap
> > do {
> > pud_t old_pud = READ_ONCE(*pudp);
> > ...
> > } while (pudp++, addr = next, addr != end); pudp =
> pud_set_fixmap_offset; //set fixmap
> > pud_clear_fixmap(); // clear fixmap do {
> > }
> pud_t old_pud = READ_ONCE(*pudp);//CRASH
> >
>
> I still don't quite understand how that race can even exist. I assume it's due
> to the weird semantics of the "fixmap". (whatever that is :) ) I don't see
> anything similar happen on other archs, especially x86-64 and s390x, which
> I'm familiar with.
>
> s390x similarly to x86-64 code uses a vmem_mutex to serialize add/remove
> in the direct map and a cpa_mutex to serialize attribute changes (and
> splitting of large mappings).
>
Yeah, I see that there is a spin lock when update page table in x86.
OK, let me poke Anshuman about this.
If this issue won't affect all arches, should we only fix it on arm64? @Anshuman Khandual
Thanks
Jianyong
> The right should be to teach arm64 mmu code that direct mapping updates
> might be concurrent, and that two instances might try messing with the
> fixmap concurrently.
>
>
> On a similar topic: I think we might want to reclaim compeltely empty page
> tables when unplugging memory; I suspect that we also have to mess with
> the fixmap then, whem removing page tables. But I feel like the whole fixmap
> machinery is still a big black box for me.
>
> --
> Thanks,
>
> David / dhildenb
Powered by blists - more mailing lists