lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyD2pk285YeVmZTm@J2N7QTR9R3.cambridge.arm.com>
Date: Tue, 29 Oct 2024 14:52:22 +0000
From: Mark Rutland <mark.rutland@....com>
To: Jinjie Ruan <ruanjinjie@...wei.com>
Cc: oleg@...hat.com, linux@...linux.org.uk, will@...nel.org,
	catalin.marinas@....com, sstabellini@...nel.org, maz@...nel.org,
	tglx@...utronix.de, peterz@...radead.org, luto@...nel.org,
	kees@...nel.org, wad@...omium.org, akpm@...ux-foundation.org,
	samitolvanen@...gle.com, arnd@...db.de, ojeda@...nel.org,
	rppt@...nel.org, hca@...ux.ibm.com, aliceryhl@...gle.com,
	samuel.holland@...ive.com, paulmck@...nel.org, aquini@...hat.com,
	petr.pavlu@...e.com, viro@...iv.linux.org.uk,
	rmk+kernel@...linux.org.uk, ardb@...nel.org,
	wangkefeng.wang@...wei.com, surenb@...gle.com,
	linus.walleij@...aro.org, yangyj.ee@...il.com, broonie@...nel.org,
	mbenes@...e.cz, puranjay@...nel.org, pcc@...gle.com,
	guohanjun@...wei.com, sudeep.holla@....com,
	Jonathan.Cameron@...wei.com, prarit@...hat.com, liuwei09@...tc.cn,
	dwmw@...zon.co.uk, oliver.upton@...ux.dev,
	kristina.martsenko@....com, ptosi@...gle.com, frederic@...nel.org,
	vschneid@...hat.com, thiago.bauermann@...aro.org,
	joey.gouly@....com, liuyuntao12@...wei.com, leobras@...hat.com,
	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	xen-devel@...ts.xenproject.org
Subject: Re: [PATCH -next v4 06/19] arm64: entry: Move
 arm64_preempt_schedule_irq() into exit_to_kernel_mode()

On Fri, Oct 25, 2024 at 06:06:47PM +0800, Jinjie Ruan wrote:
> Move arm64_preempt_schedule_irq() into exit_to_kernel_mode(), so not
> only __el1_irq() but also every time when kernel mode irq return,
> there is a chance to reschedule.

We use exit_to_kernel_mode() for every non-NMI exception return to the
kernel, not just IRQ returns.

> As Mark pointed out, this change will have the following key impact:
> 
>     "We'll preempt even without taking a "real" interrupt. That
>     shouldn't result in preemption that wasn't possible before,
>     but it does change the probability of preempting at certain points,
>     and might have a performance impact, so probably warrants a
>     benchmark."

For anyone following along at home, I said that at:

  https://lore.kernel.org/linux-arm-kernel/ZxejvAmccYMTa4P1@J2N7QTR9R3/

... and there I specifically said:

> I's suggest you first write a patch to align arm64's entry code with the
> generic code, by removing the call to arm64_preempt_schedule_irq() from
> __el1_irq(), and adding a call to arm64_preempt_schedule_irq() in
> __exit_to_kernel_mode(), e.g.
> 
> | static __always_inline void __exit_to_kernel_mode(struct pt_regs *regs)
> | {
> | 	...
> | 	if (interrupts_enabled(regs)) {
> | 		...
> | 		if (regs->exit_rcu) {
> | 			...
> | 		}
> | 		...
> | 		arm64_preempt_schedule_irq();
> | 		...
> | 	} else {
> | 		...
> | 	}
> | }

[...]

> +#ifdef CONFIG_PREEMPT_DYNAMIC
> +DEFINE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched);
> +#define need_irq_preemption() \
> +	(static_branch_unlikely(&sk_dynamic_irqentry_exit_cond_resched))
> +#else
> +#define need_irq_preemption()	(IS_ENABLED(CONFIG_PREEMPTION))
> +#endif
> +
> +static void __sched arm64_preempt_schedule_irq(void)
> +{
> +	if (!need_irq_preemption())
> +		return;
> +
> +	/*
> +	 * Note: thread_info::preempt_count includes both thread_info::count
> +	 * and thread_info::need_resched, and is not equivalent to
> +	 * preempt_count().
> +	 */
> +	if (READ_ONCE(current_thread_info()->preempt_count) != 0)
> +		return;
> +
> +	/*
> +	 * DAIF.DA are cleared at the start of IRQ/FIQ handling, and when GIC
> +	 * priority masking is used the GIC irqchip driver will clear DAIF.IF
> +	 * using gic_arch_enable_irqs() for normal IRQs. If anything is set in
> +	 * DAIF we must have handled an NMI, so skip preemption.
> +	 */
> +	if (system_uses_irq_prio_masking() && read_sysreg(daif))
> +		return;
> +
> +	/*
> +	 * Preempting a task from an IRQ means we leave copies of PSTATE
> +	 * on the stack. cpufeature's enable calls may modify PSTATE, but
> +	 * resuming one of these preempted tasks would undo those changes.
> +	 *
> +	 * Only allow a task to be preempted once cpufeatures have been
> +	 * enabled.
> +	 */
> +	if (system_capabilities_finalized())
> +		preempt_schedule_irq();
> +}
> +
>  /*
>   * Handle IRQ/context state management when exiting to kernel mode.
>   * After this function returns it is not safe to call regular kernel code,
> @@ -72,6 +114,8 @@ static noinstr irqentry_state_t enter_from_kernel_mode(struct pt_regs *regs)
>  static void noinstr exit_to_kernel_mode(struct pt_regs *regs,
>  					irqentry_state_t state)
>  {
> +	arm64_preempt_schedule_irq();

This is broken; exit_to_kernel_mode() is called for any non-NMI return
excpetion return to the kernel, and this doesn't check that interrupts
were enabled in the context the exception was taken from.

This will preempt in cases where we should not, e.g. if we WARN() in a section with
IRQs disabled.

Mark.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ