[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+CK2bAkb7zg6ne=PzA7UrQF49J2Sa7rmyWM3Bqugfe00-36ng@mail.gmail.com>
Date: Wed, 20 Nov 2019 11:55:50 -0500
From: Pavel Tatashin <pasha.tatashin@...een.com>
To: Mark Rutland <mark.rutland@....com>
Cc: James Morris <jmorris@...ei.org>, Sasha Levin <sashal@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
Catalin Marinas <catalin.marinas@....com>, will@...nel.org,
steve.capper@....com,
Linux ARM <linux-arm-kernel@...ts.infradead.org>,
Marc Zyngier <marc.zyngier@....com>,
James Morse <james.morse@....com>,
Vladimir Murzin <vladimir.murzin@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
allison@...utok.net, info@...ux.net, alexios.zavras@...el.com
Subject: Re: [PATCH] arm64: kernel: memory corruptions due non-disabled PAN
On Wed, Nov 20, 2019 at 11:43 AM Mark Rutland <mark.rutland@....com> wrote:
>
> Hi Pavel,
>
> On Tue, Nov 19, 2019 at 05:10:06PM -0500, Pavel Tatashin wrote:
> > Userland access functions (__arch_clear_user, __arch_copy_from_user,
> > __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN
> > for the duration of copy. However, when copy fails for some reason,
> > i.e. access violation the code is transferred to fixedup section,
> > where we do not disable PAN.
>
> Thanks for reporting this. This is a very nasty bug.
Indeed, it was biting us randomly, and it took me awhile to understand
the root cause.
>
> > The bug is a security violation as the access to userland is still
> > open when it should be disabled, but it also causes memory corruptions
> > when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y.
>
> I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may
> leave the stale TTBR0 value installed across a context-switch (and have
> reproduced that locally), but I'm having some difficulty reproducing the
> corruption that you see.
I will send the full test shortly. Note, I was never able to reproduce
it in QEMU, only on real hardware. Also, for some unknown reason after
kexec I could not reproduce it only during first boot, so it is
somewhat fragile, but I am sure it can be reproduced in other cases as
well, it is just my reproducer is not tunes for that.
>
> > I was able to reproduce memory corruption problem on Broadcom's SoC
> > ARMv8-A like this:
> >
> > Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
> > stack is accessed and copied.
>
> IIUC this tickles the issue by performing a faulting uaccess in IRQ
> context. On the path to returnign from the exception, it directly calls
> into the scheduler as part of el1_preempt, erroneously passing the TTBR0
> value to the next task. Note that a preemption would remove the stale
> TTBR0 value as part of kernel entry.
Correct.
>
> It looks like if we're in this state, and return from an exception taken
> from EL1 with SW PAN enabled, we'll also leave the stale TTBR0 value
> installed. If PAN was disabled (e.g. in the middle of a uaccess region),
> then we'll restore the correct TTBR0.
>
> > The test program performed the following on every CPU and forking many
> > processes:
> >
> > unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
> > MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > map[0] = getpid();
> > sched_yield();
> > if (map[0] != getpid()) {
> > fprintf(stderr, "Corruption detected!");
> > }
> > munmap(map, PAGE_SIZE);
>
> Can you provide the whole test, please? And precisely how you're
> launching it?
I will shortly.
>
> I've tried turning the above into a main() function, and spawning a
> number of instances in parallel while perf is running, but I haven't
> been able to reproduce the issue locally, and I'm concerned that I'm
> missing something.
>
> > From time to time I was getting map[0] to contain pid for a different
> > process.
>
> How often is "from time to time"? How many processes are you running,
> across how many CPUs?
Less than a second on 8 CPU SoC it takes for a process to get access
to address space of another process.
>
> >
> > Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...")
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@...een.com>
> > ---
> > arch/arm64/lib/clear_user.S | 1 +
> > arch/arm64/lib/copy_from_user.S | 1 +
> > arch/arm64/lib/copy_in_user.S | 1 +
> > arch/arm64/lib/copy_to_user.S | 1 +
> > 4 files changed, 4 insertions(+)
>
> FWIW, the diff below looks correct to me, but we might want to fold this
> into the C wrappers, so that this is consistent with the other uaccess
> cases (and done in one place in the code).
I agree, and I actually have a patch for that, but I wanted my fix to
be included into 5.4 if possible. This is why I sent it out. I will
send out a C wrapper patch soon, but that one won't need to be
backported to stable.
Pasha
Powered by blists - more mailing lists