lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200317210154.GA19752@willie-the-truck>
Date:   Tue, 17 Mar 2020 21:01:54 +0000
From:   Will Deacon <will@...nel.org>
To:     Mark Brown <broonie@...nel.org>
Cc:     Mark Rutland <mark.rutland@....com>,
        Hongbo Yao <yaohongbo@...wei.com>,
        linux-kernel@...r.kernel.org, catalin.marinas@....com,
        linux-arm-kernel@...ts.infradead.org
Subject: Re: [RFC PATCH] arm64: fix the missing ktpi= cmdline check in
 arm64_kernel_unmapped_at_el0()

On Tue, Mar 17, 2020 at 04:36:38PM +0000, Mark Brown wrote:
> On Tue, Mar 17, 2020 at 03:18:14PM +0000, Will Deacon wrote:
> > On Tue, Mar 17, 2020 at 01:57:19PM +0000, Mark Brown wrote:
> > > On Tue, Mar 17, 2020 at 12:43:24PM +0000, Will Deacon wrote:
> > > > On Tue, Mar 17, 2020 at 12:10:51PM +0000, Mark Rutland wrote:
> > > > > On Tue, Mar 17, 2020 at 07:47:08PM +0800, Hongbo Yao wrote:
> 
> > > > > > -	return arm64_use_ng_mappings;
> > > > > > +	return arm64_use_ng_mappings &&
> > > > > > +		cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
> 
> > > > This probably isn't the right fix, since this will mean that early mappings
> > > > will be global and we'll have to go through the painful page-table rewrite
> > > > logic when the cap gets enabled for KASLR-enabled kernels.
> 
> > > Aren't we looking for a rewrite from non-global to global here (disable
> > > KPTI where we would otherwise have it), which we don't currently have
> > > code for?
> 
> > What I mean is that cpus_have_const_cap() will be false initially, so we'll
> > put down global mappings early on because PTE_MAYBE_NG will be 0, which
> > means that we'll have to invoke the rewriting code if we then realise we
> > want non-global mappings after the caps are finalised.
> 
> Ah, I see - a different case to the one originally reported but also an
> issue.
> 
> > > That is probably a good idea but I think that runs too late to affect
> > > the early mappings, they're done based on kaslr_requires_kpti() well
> > > before we start secondaries.  My first pass not having paged everything
> > > back in yet is that there needs to be command line parsing in
> > > kaslr_requires_kpti() but as things stand the command line isn't
> > > actually ready then...
> 
> > Yeah, and I think you probably run into chicken and egg problems mapping
> 
> The whole area is just a mess.
> 
> > the thing. With the change above, it's true that /some/ mappings will
> > still be nG if you pass kpti=off, but I was hoping that didn't really matter
> > :)
> 
> > What was the behaviour prior to your patch? If it used to work without
> > any nG mappings, then I suppose we should try to restore that behaviour.
> 
> I'd need to go back and retest to confirm but it looks like always had
> the issue that we'd install some nG mappings early even with KPTI
> disabled on the command line so your change is just restoring the
> previous behaviour and we're no worse than we were before.

Urgh, this code brings back really bad memories :( :( :(

I just ran 5.4 and it looks like we leave everything non-global with KASLR,
even when "kpti=off". Great -- that means we're ok with my patch! Well, we
would be except that when we finalise the linear mapping we'll end up trying
to transition the old non-global entry to global, which is a break-before-make
violation (we panic early in __create_pgd_mapping()).

Staring more at the code, I think we're conflating the global/non-global
mappings with whether or not the kpti trampoline is active and it looks like
this might lead to other issues in mainline right now -- for example, I
don't think we clear TPIDRRO_EL0 properly for native tasks because the
arm64_kernel_unmapped_at_el0() check in tls_thread_switch() will defer the
zeroing to the trampoline code, but that might not even run!

So I've hacked the following, which appears to work but damn I'd like
somebody else to look at this. I also have a nagging feeling that you
implemented it like this at some point, but we tried to consolidate things
during review.

Thoughts?

Will

--->8

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index e4d862420bb4..d79ce6df9e12 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -29,11 +29,9 @@ typedef struct {
  */
 #define ASID(mm)	((mm)->context.id.counter & 0xffff)
 
-extern bool arm64_use_ng_mappings;
-
 static inline bool arm64_kernel_unmapped_at_el0(void)
 {
-	return arm64_use_ng_mappings;
+	return cpus_have_const_cap(ARM64_UNMAP_KERNEL_AT_EL0);
 }
 
 typedef void (*bp_hardening_cb_t)(void);
diff --git a/arch/arm64/include/asm/pgtable-prot.h b/arch/arm64/include/asm/pgtable-prot.h
index 6f87839f0249..1305e28225fc 100644
--- a/arch/arm64/include/asm/pgtable-prot.h
+++ b/arch/arm64/include/asm/pgtable-prot.h
@@ -23,11 +23,13 @@
 
 #include <asm/pgtable-types.h>
 
+extern bool arm64_use_ng_mappings;
+
 #define _PROT_DEFAULT		(PTE_TYPE_PAGE | PTE_AF | PTE_SHARED)
 #define _PROT_SECT_DEFAULT	(PMD_TYPE_SECT | PMD_SECT_AF | PMD_SECT_S)
 
-#define PTE_MAYBE_NG		(arm64_kernel_unmapped_at_el0() ? PTE_NG : 0)
-#define PMD_MAYBE_NG		(arm64_kernel_unmapped_at_el0() ? PMD_SECT_NG : 0)
+#define PTE_MAYBE_NG		(arm64_use_ng_mappings ? PTE_NG : 0)
+#define PMD_MAYBE_NG		(arm64_use_ng_mappings ? PMD_SECT_NG : 0)
 
 #define PROT_DEFAULT		(_PROT_DEFAULT | PTE_MAYBE_NG)
 #define PROT_SECT_DEFAULT	(_PROT_SECT_DEFAULT | PMD_MAYBE_NG)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ