lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Nov 2019 11:55:50 -0500
From:   Pavel Tatashin <pasha.tatashin@...een.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     James Morris <jmorris@...ei.org>, Sasha Levin <sashal@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Catalin Marinas <catalin.marinas@....com>, will@...nel.org,
        steve.capper@....com,
        Linux ARM <linux-arm-kernel@...ts.infradead.org>,
        Marc Zyngier <marc.zyngier@....com>,
        James Morse <james.morse@....com>,
        Vladimir Murzin <vladimir.murzin@....com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        allison@...utok.net, info@...ux.net, alexios.zavras@...el.com
Subject: Re: [PATCH] arm64: kernel: memory corruptions due non-disabled PAN

On Wed, Nov 20, 2019 at 11:43 AM Mark Rutland <mark.rutland@....com> wrote:
>
> Hi Pavel,
>
> On Tue, Nov 19, 2019 at 05:10:06PM -0500, Pavel Tatashin wrote:
> > Userland access functions (__arch_clear_user, __arch_copy_from_user,
> > __arch_copy_in_user, __arch_copy_to_user), enable and disable PAN
> > for the duration of copy. However, when copy fails for some reason,
> > i.e. access violation the code is transferred to fixedup section,
> > where we do not disable PAN.
>
> Thanks for reporting this. This is a very nasty bug.

Indeed, it was biting us randomly, and it took me awhile to understand
the root cause.

>
> > The bug is a security violation as the access to userland is still
> > open when it should be disabled, but it also causes memory corruptions
> > when software emulated PAN is used: CONFIG_ARM64_SW_TTBR0_PAN=y.
>
> I see that with CONFIG_ARM64_SW_TTBR0_PAN=y, this means that we may
> leave the stale TTBR0 value installed across a context-switch (and have
> reproduced that locally), but I'm having some difficulty reproducing the
> corruption that you see.

I will send the full test shortly. Note, I was never able to reproduce
it in QEMU, only on real hardware. Also, for some unknown reason after
kexec I could not reproduce it only during first boot, so it is
somewhat fragile, but I am sure it can be reproduced in other cases as
well, it is just my reproducer is not tunes for that.

>
> > I was able to reproduce memory corruption problem on Broadcom's SoC
> > ARMv8-A like this:
> >
> > Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's
> > stack is accessed and copied.
>
> IIUC this tickles the issue by performing a faulting uaccess in IRQ
> context. On the path to returnign from the exception, it directly calls
> into the scheduler as part of el1_preempt, erroneously passing the TTBR0
> value to the next task. Note that a preemption would remove the stale
> TTBR0 value as part of kernel entry.

Correct.

>
> It looks like if we're in this state, and return from an exception taken
> from EL1 with SW PAN enabled, we'll also leave the stale TTBR0 value
> installed. If PAN was disabled (e.g. in the middle of a uaccess region),
> then we'll restore the correct TTBR0.
>
> > The test program performed the following on every CPU and forking many
> > processes:
> >
> >       unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE,
> >                                 MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> >       map[0] = getpid();
> >       sched_yield();
> >       if (map[0] != getpid()) {
> >               fprintf(stderr, "Corruption detected!");
> >       }
> >       munmap(map, PAGE_SIZE);
>
> Can you provide the whole test, please? And precisely how you're
> launching it?

I will shortly.

>
> I've tried turning the above into a main() function, and spawning a
> number of instances in parallel while perf is running, but I haven't
> been able to reproduce the issue locally, and I'm concerned that I'm
> missing something.
>
> > From time to time I was getting map[0] to contain pid for a different
> > process.
>
> How often is "from time to time"? How many processes are you running,
> across how many CPUs?

Less than a second on 8 CPU SoC it takes for a process to get access
to address space of another process.

>
> >
> > Fixes: 338d4f49d6f7114 ("arm64: kernel: Add support for Privileged...")
> >
> > Signed-off-by: Pavel Tatashin <pasha.tatashin@...een.com>
> > ---
> >  arch/arm64/lib/clear_user.S     | 1 +
> >  arch/arm64/lib/copy_from_user.S | 1 +
> >  arch/arm64/lib/copy_in_user.S   | 1 +
> >  arch/arm64/lib/copy_to_user.S   | 1 +
> >  4 files changed, 4 insertions(+)
>
> FWIW, the diff below looks correct to me, but we might want to fold this
> into the C wrappers, so that this is consistent with the other uaccess
> cases (and done in one place in the code).

I agree, and I actually have a patch for that, but I wanted my fix to
be included into 5.4 if possible. This is why I sent it out. I will
send out a C wrapper patch soon, but that one won't need to be
backported to stable.

Pasha

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ