linux-kernel - Re: [PATCH v4] arm64: mte: move register initialization to C

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMn1gO7vf7EKvdo-w_ez5aBDWaNRm=X_qRfReVtUkA5LOF127A@mail.gmail.com>
Date:   Tue, 13 Sep 2022 13:32:07 -0700
From:   Peter Collingbourne <pcc@...gle.com>
To:     Catalin Marinas <catalin.marinas@....com>
Cc:     Evgenii Stepanov <eugenis@...gle.com>,
        Kenny Root <kroot@...gle.com>, Marc Zyngier <maz@...nel.org>,
        Will Deacon <will@...nel.org>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Mark Brown <broonie@...nel.org>,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        LKML <linux-kernel@...r.kernel.org>,
        kernel test robot <lkp@...el.com>
Subject: Re: [PATCH v4] arm64: mte: move register initialization to C

On Mon, Sep 12, 2022 at 1:04 PM Catalin Marinas <catalin.marinas@....com> wrote:
>
> On Fri, Sep 09, 2022 at 05:54:13PM -0700, Peter Collingbourne wrote:
> > On Tue, Sep 6, 2022 at 5:36 PM Evgenii Stepanov <eugenis@...gle.com> wrote:
> > > If FEAT_MTE2 is disabled via the arm64.nomte command line argument on a
> > > CPU that claims to support FEAT_MTE2, the kernel will use Tagged Normal
> > > in the MAIR. If we interpret arm64.nomte to mean that the CPU does not
> > > in fact implement FEAT_MTE2, setting the system register like this may
> > > lead to UNSPECIFIED behavior. Fix it by arranging for MAIR to be set
> > > in the C function cpu_enable_mte which is called based on the sanitized
> > > version of the system register.
> > >
> > > There is no need for the rest of the MTE-related system register
> > > initialization to happen from assembly, with the exception of TCR_EL1,
> > > which must be set to include at least TBI1 because the secondary CPUs
> > > access KASan-allocated data structures early. Therefore, make the TCR_EL1
> > > initialization unconditional and move the rest of the initialization to
> > > cpu_enable_mte so that we no longer have a dependency on the unsanitized
> > > ID register value.
> >
> > Moving the register initialization to C also fixes a bug where the
> > kernel's zeroing of TFSR_EL1 has no practical effect when the kernel
> > is started in VHE mode because the register is currently being zeroed
> > prior to the kernel enabling the redirect of TFSR_EL2 to TFSR_EL1 when
> > it enables VHE. As a result, without this patch it is possible to get
> > a spurious KASAN error report if TFSR_EL2 is non-zero out of reset.
>
> Oh, I think this is a side-effect of the nVHE patches. We added MTE in
> 5.10 and __cpu_setup() was called at EL2 if the kernel was entered at
> EL2 - 3b714d24ef17 ("arm64: mte: CPU feature detection and initial
> sysreg configuration"). When nVHE turned up in 5.12, this was changed to
> to run __cpu_setup at EL1 and this only initialises TFSR_EL1.
> __finalise_el2 should have transferred TFSR_EL12.
>
> I don't think there other registers we missed in __cpu_setup() but I
> haven't looked in detail.
>
> So for this, we either move the reg initialisation to C or we fix
> __finalise_el2. I'm tempted to go with the former as long as the kernel
> doesn't read that register up to that point and complain of a spurious
> asynchronous fault.

Yes, I don't think we can generally expect the kernel to read the
register up to that point. The register is generally only read in
response to an exception, and the early kernel initialization code
runs with interrupts disabled and only enables them well after the
call to smp_prepare_boot_cpu() for the primary CPU (see the calls to
local_irq_disable() and local_irq_enable() in start_kernel()) and
check_local_cpu_capabilities() for the secondaries (see the call to
local_daif_restore() in secondary_start_kernel()), so unless something
has gone seriously wrong that would cause us to get a synchronous
exception that early (in which case the spurious async fault report is
the least of our worries) I don't think we should expect to see one of
these reports as a result of moving the initialization code to C.

Peter