lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 1 Feb 2021 14:39:46 +0000
From:   Giancarlo Ferrari <giancarlo.ferrari89@...il.com>
To:     Mark Rutland <mark.rutland@....com>
Cc:     linux-arm-kernel@...ts.infradead.org, linux@...linux.org.uk,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        rppt@...nel.org, penberg@...nel.org, geert@...ux-m68k.org,
        giancarlo.ferrari@...ia.com
Subject: Re: [PATCH] ARM: kexec: Fix panic after TLB are invalidated

Hi,

On Mon, Feb 01, 2021 at 12:47:20PM +0000, Mark Rutland wrote:
> On Mon, Feb 01, 2021 at 12:44:56AM +0000, Giancarlo Ferrari wrote:
> > machine_kexec() need to set rw permission in text and rodata sections
> > to assign some variables (e.g. kexec_start_address). To do that at
> > the end (after flushing pdm in memory, etc.) it needs to invalidate
> > TLB [section] entries.
> 
> It'd be worth noting explicitly that set_kernel_text_rw() alters
> current->active_mm...
> 
> > If during the TLB invalidation an interrupt occours, which might cause
> > a context switch, there is the risk to inject invalid TLBs, with ro
> > permissions.
> 
> ... which is why if there's a context switch things can go wrong, since
> active_mm isn't stable, and so it's possible that set_kernel_text_rw()
> updates multiple tables, none of which might be the active table at the
> point we try to make an access.
> 

Maybe the behaviour causing issue is not completely clear to me, and I do
apologize for that (moreover I haven't eougth debug capabilities).
However, current-active_mm is switched among context switches. Correct ?
So, in principle, the invalidation, if stopped, is carried on where it
left.

I thought the issue was that the PageTable entry for the section 0x8010_0000
is global, thus not indexed by ASID (Address Space ID). By the fact that each
process has its own version of that entry, is the cause of the issue, as the
schedule process might bringing a spurious entry (with ro permission) in the
MMU cache.

If the entry is not global holds the ASID, and the issue cannot happen.

Please note that this behaviour was tested on a armv7 arch board.

> It would be nice to spell that out rather than saying "invalid TLBs".
> 
> We could disable preemption to prevent that, which is possibly better
> than disabling interrupts.
> 
> Overall, it would be much better to avoid having to mess with the kernel
> page tables. So rather than going:
> 
> 1. mark kernel RW
> 2. alter variables in reloc code
> 3. copy reloc code into buffer
> 4. branch to buffer
> 
> ... we should be able to go:
> 
> 1. copy reloc code into buffer
> 2. alter variables in copy of reloc code
> 3. branch to buffer
> 
> ... which would avoid this class of problem too.
> 
> Thanks,
> Mark.

Thanks,


GF

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ