linux-kernel - Re: [PATCH v3 0/3] arm64: entry: Convert to generic entry

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZxejvAmccYMTa4P1@J2N7QTR9R3>
Date: Tue, 22 Oct 2024 14:08:12 +0100
From: Mark Rutland <mark.rutland@....com>
To: Jinjie Ruan <ruanjinjie@...wei.com>
Cc: catalin.marinas@....com, will@...nel.org, oleg@...hat.com,
	tglx@...utronix.de, peterz@...radead.org, luto@...nel.org,
	kees@...nel.org, wad@...omium.org, rostedt@...dmis.org,
	arnd@...db.de, ardb@...nel.org, broonie@...nel.org,
	rick.p.edgecombe@...el.com, leobras@...hat.com,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org
Subject: Re: [PATCH v3 0/3] arm64: entry: Convert to generic entry

On Tue, Oct 22, 2024 at 08:07:54PM +0800, Jinjie Ruan wrote:
> On 2024/10/17 23:25, Mark Rutland wrote:
> > There's also some indirection that I don't think is necessary *and*
> > hides important ordering concerns and results in mistakes. In
> > particular, note that before this series, enter_from_kernel_mode() calls
> > the (instrumentable) MTE checks *after* all the necessary lockdep+RCU
> > management is performed by __enter_from_kernel_mode():
> > 
> > 	static void noinstr enter_from_kernel_mode(struct pt_regs *regs)
> > 	{
> > 	        __enter_from_kernel_mode(regs);
> > 		mte_check_tfsr_entry();
> > 		mte_disable_tco_entry(current);
> > 	}
> > 
> > ... whereas after this series is applied, those MTE checks are placed in
> > arch_enter_from_kernel_mode(), which irqentry_enter() calls *before* the
> > necessary lockdep+RCU management. That is broken.
> > 
> > It would be better to keep that explicit in the arm64 entry code with
> > arm64-specific wrappers, e.g.
> > 
> > 	static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
> > 	{
> > 		irqentry_state_t state = irqentry_enter(regs);
> > 		mte_check_tfsr_entry();
> > 		mte_disable_tco_entry(current);
> > 
> > 		return state;
> > 	}
> 
> Hi, Mark, It seems that there is a problem for
> arm64_preempt_schedule_irq() when wrap irqentry_exit() with
> exit_to_kernel_mode().
> 
> The arm64_preempt_schedule_irq() is about PREEMPT_DYNAMIC and preempt
> irq which is the raw_irqentry_exit_cond_resched() in generic code called
> by irqentry_exit().
> 
> Only __el1_irq() call arm64_preempt_schedule_irq(), but when we switch
> all exit_to_kernel_mode() to arm64-specific one that wrap
> irqentry_exit(), not only __el1_irq() but also el1_abort(), el1_pc(),
> el1_undef() etc. will try to reschedule by calling
> arm64_preempt_schedule_irq() similar logic.

Yes, the generic entry code will preempt any context where an interrupt
*could* have been taken from.

I'm not sure it actually matters either way; I believe that the generic
entry code was written this way because that's what x86 did, and
checking for whether interrupts are enabled in the interrupted context
is cheap.

I's suggest you first write a patch to align arm64's entry code with the
generic code, by removing the call to arm64_preempt_schedule_irq() from
__el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
__exit_to_kernel_mode(), e.g.

| static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
| {
| 	...
| 	if (interrupts_enabled(regs)) {
| 		...
| 		if (regs->exit_rcu) {
| 			...
| 		}
| 		...
| 		arm64_preempt_schedule_irq();
| 		...
| 	} else {
| 		...
| 	}
| }

That way the behaviour and structure will be more aligned with the
generic code, and with that as an independent patch it will be easier to
review/test/bisect/etc.

This change will have at least two key impacts:

(1) We'll preempt even without taking a "real" interrupt. That
    shouldn't result in preemption that wasn't possible before, 
    but it does change the probability of preempting at certain points,
    and might have a performance impact, so probably warrants a
    benchmark.

(2) We will not preempt when taking interrupts from a region of kernel
    code where IRQs are enabled but RCU is not watching, matching the
    behaviour of the generic entry code.

    This has the potential to introduce livelock if we can ever have a
    screaming interrupt in such a region, so we'll need to go figure out
    whether that's actually a problem.

    Having this as a separate patch will make it easier to test/bisect
    for that specifically.

Mark.