linux-kernel - Re: [PATCH] timers: Annotate possible non critical data race of next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZtTo0wB_Jccoi0oM@pavilion.home>
Date: Mon, 2 Sep 2024 00:21:07 +0200
From: Frederic Weisbecker <frederic@...nel.org>
To: Anna-Maria Behnsen <anna-maria@...utronix.de>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH] timers: Annotate possible non critical data race of
 next_expiry

Le Thu, Aug 29, 2024 at 05:43:05PM +0200, Anna-Maria Behnsen a écrit :
> Global timers could be expired remotely when the target CPU is idle. After
> a remote timer expiry, the remote timer_base->next_expiry value is updated
> while holding the timer_base->lock. When the formerly idle CPU becomes
> active at the same time and checks whether timers need to expire, this
> check is done lockless as it is on the local CPU. This could lead to a data
> race, which was reported by sysbot:
> 
>   https://lore.kernel.org/r/000000000000916e55061f969e14@google.com
> 
> When the value is read lockless but changed by the remote CPU, only two non
> critical scenarios could happen:
> 
> 1) The already update value is read -> everything is perfect
> 
> 2) The old value is read -> a superfluous timer soft interrupt is raised
> 
> The same situation could happen when enqueueing a new first pinned timer by
> a remote CPU also with non critical scenarios:
> 
> 1) The already update value is read -> everything is perfect
> 
> 2) The old value is read -> when the CPU is idle, an IPI is executed
> nevertheless and when the CPU isn't idle, the updated value will be visible
> on the next tick and the timer might be late one jiffie.
> 
> As this is very unlikely to happen, the overhead of doing the check under
> the lock is a way more effort, than a superfluous timer soft interrupt or a
> possible 1 jiffie delay of the timer.
> 
> Document and annotate this non critical behavior in the code by using
> READ/WRITE_ONCE() pair when accessing timer_base->next_expiry.
> 
> Reported-by: syzbot+bf285fcc0a048e028118@...kaller.appspotmail.com
> Signed-off-by: Anna-Maria Behnsen <anna-maria@...utronix.de>
> Closes: https://lore.kernel.org/lkml/000000000000916e55061f969e14@google.com

Reviewed-by: Frederic Weisbecker <frederic@...nel.org>

Just a few nits:

> ---
>  kernel/time/timer.c | 41 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 36 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 18aa759c3cae..71b96a9bf6e8 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -672,7 +672,7 @@ static void enqueue_timer(struct timer_base *base, struct timer_list *timer,
>  		 * Set the next expiry time and kick the CPU so it
>  		 * can reevaluate the wheel:
>  		 */
> -		base->next_expiry = bucket_expiry;
> +		WRITE_ONCE(base->next_expiry, bucket_expiry);
>  		base->timers_pending = true;
>  		base->next_expiry_recalc = false;
>  		trigger_dyntick_cpu(base, timer);
> @@ -1964,7 +1964,7 @@ static void next_expiry_recalc(struct timer_base *base)
>  		clk += adj;
>  	}
>  
> -	base->next_expiry = next;
> +	WRITE_ONCE(base->next_expiry, next);
>  	base->next_expiry_recalc = false;
>  	base->timers_pending = !(next == base->clk + NEXT_TIMER_MAX_DELTA);
>  }
> @@ -2018,7 +2018,7 @@ static unsigned long next_timer_interrupt(struct timer_base *base,
>  	 * easy comparable to find out which base holds the first pending timer.
>  	 */
>  	if (!base->timers_pending)
> -		base->next_expiry = basej + NEXT_TIMER_MAX_DELTA;
> +		WRITE_ONCE(base->next_expiry, basej + NEXT_TIMER_MAX_DELTA);
>  
>  	return base->next_expiry;
>  }
> @@ -2462,8 +2462,39 @@ static void run_local_timers(void)
>  	hrtimer_run_queues();
>  
>  	for (int i = 0; i < NR_BASES; i++, base++) {
> -		/* Raise the softirq only if required. */
> -		if (time_after_eq(jiffies, base->next_expiry) ||
> +		/*
> +		 * Raise the softirq only if required.
> +		 *
> +		 * timer_base::next_expiry can be written by a remote CPU while
> +		 * holding the lock. If this write happens at the same time than
> +		 * the lockless local read, sanity checker could complain about
> +		 * data corruption.
> +		 *
> +		 * There are two possible situations where
> +		 * timer_base::next_expiry is written by a remote CPU:
> +		 *
> +		 * 1. Remote CPU expires global timers of this CPU and updates
> +		 * timer_base::next_expiry of BASE_LOCAL afterwards in

BASE_GLOBAL ?

> +		 * next_timer_interrupt() or timer_recalc_next_expiry(). The
> +		 * worst outcome is a superfluous raise of the timer softirq
> +		 * when the not yet updated value is read.
> +		 *
> +		 * 2. A new first pinned timer is enqueued by a remote CPU and
> +		 * therefore timer_base::next_expiry of BASE_GLOBAL is

BASE_LOCAL ?

Thanks.

> +		 * updated. When this update is missed, this isn't a problem, as
> +		 * an IPI is executed nevertheless when the CPU was idle
> +		 * before. When the CPU wasn't idle but the update is missed,
> +		 * then the timer would expire one jiffie late - bad luck.
> +		 *
> +		 * Those unlikely corner cases where the worst outcome is only a
> +		 * one jiffie delay or a superfluous raise of the softirq are
> +		 * not that expensive as doing the check always while holding
> +		 * the lock.
> +		 *
> +		 * Possible remote writers are using WRITE_ONCE(). Local reader
> +		 * uses therefore READ_ONCE().
> +		 */
> +		if (time_after_eq(jiffies, READ_ONCE(base->next_expiry)) ||
>  		    (i == BASE_DEF && tmigr_requires_handle_remote())) {
>  			raise_softirq(TIMER_SOFTIRQ);
>  			return;
> -- 
> 2.39.2
>