lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 27 Jan 2022 06:24:50 +0000 From: Jianyong Wu <Jianyong.Wu@....com> To: David Hildenbrand <david@...hat.com>, Ard Biesheuvel <ardb@...nel.org>, Catalin Marinas <Catalin.Marinas@....com> CC: Justin He <Justin.He@....com>, "will@...nel.org" <will@...nel.org>, Anshuman Khandual <Anshuman.Khandual@....com>, "akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "quic_qiancai@...cinc.com" <quic_qiancai@...cinc.com>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, "gshan@...hat.com" <gshan@...hat.com>, nd <nd@....com> Subject: RE: [PATCH v3] arm64/mm: avoid fixmap race condition when create pud mapping Hi David, > >>>>>> Thanks for tracking that down. > >>>>>> > >>>>>> Note that clearing the BSS twice is not the root problem here. > >>>>>> The root problem is that we set global state while the kernel > >>>>>> runs at the default link time address, and then refer to it again > >>>>>> after the entire kernel has been shifted in the kernel VA space. > >>>>>> Such global state could consist of mutable pointers to statically > >>>>>> allocated data (which would be reset to their default values > >>>>>> after the relocation code > >>> runs again), or global pointer variables in BSS. > >>>>>> In either case, relying on such a global variable after the > >>>>>> second relocation performed by KASLR would be risky, and so we > >>>>>> should avoid manipulating global state at all if it might involve > >>>>>> pointer to statically allocated data structures. > >>>>>> > >>>>>>> In other ways, if we invoke mutex_lock/unlock in such a early > >>>>>>> booting > >>> stage. > >>>>>>> It might be unsafe because lockdep inserts lock_acquire/release > >>>>>>> as the complex hooks. > >>>>>>> > >>>>>>> In summary, would it better if Jianyong splits these early boot > >>>>>>> and late boot case? e.g. introduce a nolock version for > >>>>>> create_mapping_noalloc(). > >>>>>>> > >>>>>>> What do you think of it? > >>>>>>> > >>>>>> > >>>>>> The pre-KASLR case definitely doesn't need a lock. But given that > >>>>>> create_mapping_noalloc() is only used to map the FDT, which > >>>>>> happens very early one way or the other, wouldn't it be better to > >>>>>> move the lock/unlock into other callers of > >>>>>> __create_pgd_mapping()? (and make sure no other users of the > >>>>>> fixmap slots exist) > >>>>> > >>>>> There are server callers of __create_pgd_mapping. I think some of > >>>>> them > >>> need no fixmap lock as they are called so early. I figure out all of them > here: > >>>>> create_mapping_noalloc: no lock > >>>>> create_pgd_mapping: no lock > >>>>> __map_memblock: no lock > >>>>> map_kernel_segment: no lock > >>>>> map_entry_trampoline: no lock > >>>>> update_mapping_prot: need lock > >>>>> arch_add_memory: need lock > >>>>> > >>>>> WDYT? > >>>>> > >>>> > >>>> That seems reasonable, but it needs to be documented clearly in the > code. > >>>> > >>> > >>> Just a random thought, could we rely on system_state to do the > >>> locking conditionally? > >> > >> I can't see the point. At the early stages of kernel boot, we definitely > need no lock. Also, I think we should keep it simple. > >> > > > > Is e.g., > > > > if (system_state < SYSTEM_RUNNING) > > /* lock */ > > > > if (system_state < SYSTEM_RUNNING) > > /* unlock */ > > of course, inverting the conditions ;) Yes, system_state can roughly separate these callers of __create_pgd_mapping. When system_state > SYSTEM_BOOTING we can add the lock. Thus, I have the following change: static DEFINE_SPINLOCK(swapper_pgdir_lock); +static DEFINE_MUTEX(fixmap_lock); void set_swapper_pgd(pgd_t *pgdp, pgd_t pgd) { @@ -329,6 +330,8 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, } BUG_ON(p4d_bad(p4d)); + if (system_state > SYSTEM_BOOTING) + mutex_lock(&fixmap_lock); pudp = pud_set_fixmap_offset(p4dp, addr); do { pud_t old_pud = READ_ONCE(*pudp); @@ -359,6 +362,8 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, } while (pudp++, addr = next, addr != end); pud_clear_fixmap(); + if (system_state > SYSTEM_BOOTING) + mutex_unlock(&fixmap_lock); } It seems work and somehow simper. But I don't know if it is reasonable to do this. So, any idea? @Ard Biesheuvel @Catalin Marinas Thanks Jianyong > > > -- > Thanks, > > David / dhildenb
Powered by blists - more mailing lists