[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210621133757.GS4397@paulmck-ThinkPad-P17-Gen-1>
Date: Mon, 21 Jun 2021 06:37:57 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Marco Elver <elver@...gle.com>, linux-kernel@...r.kernel.org
Subject: Re: RCU vs data_race()
On Mon, Jun 21, 2021 at 09:28:02AM +0200, Peter Zijlstra wrote:
> On Sun, Jun 20, 2021 at 02:01:27PM -0700, Paul E. McKenney wrote:
> > On Sun, Jun 20, 2021 at 09:14:28PM +0200, Peter Zijlstra wrote:
>
> > > I don't buy that argument. pr_err() (or worse) is not supposed to
> > > happen, ever. If it does, *that* is a far worse condition that any data
> > > race possibly found by kcsan.
> > >
> > > So the only way the pr_err() expression itself can lead to kcsan
> > > determining a data-race, if something far worse triggered the pr_err()
> > > itself.
> >
> > Earlier, you said pr_warn(). Above, I said pr_*(). Now you say
> > pr_err(). But OK...
>
> Same, thing.. also Sundays aren't great for details it seems :-)
I know that feeling! ;-)
> > Let's take for example the pr_err() in __call_rcu(), that is, the
> > double-free diagnostic. A KCSAN warning on the unmarked load from
> > head->func could give valuable information on the whereabouts of the
> > other code interfering with the callback. Blanket disabling of KCSAN
> > across all pr_err() calls (let alone all pr_*() calls) would be the
> > opposite of helpful.
>
> I'm confused. That pr_err() should never happen in a correct program. If
> it happens, fix it and any data race as a consequence of that pr_err()
> no longer exists either.
>
> I fundementally don't see the relevance of a possible data race from a
> statement that should never happen in a correct program to begin with.
>
> Why do you think otherwise?
Because detection of that data race can provide valuable debugging help.
> > > You've lost me on the schedule thing, what?
> >
> > The definition of schedule_timeout_interruptible() is in part of the
> > kernel that uses much looser KCSAN checking. Thus there are some
> > KCSAN warnings from RCU involving schedule_timeout_interruptible().
> > But at least some of these warnings are for conflicting writes, which
> > many parts of the kernel suppress KCSAN warnings for.
>
> You've lost me again.. schedule_timeout_interruptible() doesn't do
> writes to rcu state, does it? So how can there be problems?
Because the static inline functions end up in RCU's tranlation units,
and they do write to other state. See for example the line marked with
the asterisk at the end of this email, which is apparently between
__run_timers() and rcu_torture_reader(). But gdb says that this is
really between this statement in __run_timers():
base->running_timer = NULL;
And this statement in try_to_del_timer_sync():
if (base->running_timer != timer)
Because of inline functions, running KCSAN on RCU can detect races in
other subsystems. In this case, I could argue for a READ_ONCE() on the
"if" condition, but last I knew, the rules for timers are that C-language
writes do not participate in data races. So maybe I should add that
READ_ONCE(), but running KCSAN on rcutorture would still complain.
Here is the stack leading to try_to_del_timer_sync():
read to 0xffff95bbdf49c280 of 8 bytes by task 167 on cpu 0:
try_to_del_timer_sync+0x96/0x1b0
del_timer_sync+0xa8/0xe0
schedule_timeout+0xc9/0x140
schedule_timeout_interruptible+0x28/0x30
stutter_wait+0x18a/0x220
rcu_torture_reader+0xfc/0x330
kthread+0x1fa/0x210
ret_from_fork+0x22/0x30
So one option would be to add the READ_ONCE(), but put an RCU wrapper
around schedule_timeout_interruptible() to allow rcutorture to either
show all data races or only those directly related to RCU. The reason for
wanting it to show only those directly related to RCU is that this allows
me to much more easily spot a new RCU-related data race. The reason
for wanting it to (only sometimes!) show all data races is to help track
down any non-RCU bugs that might be afficting my rcutorture runs.
Make sense?
Thanx, Paul
------------------------------------------------------------------------
129 BUG: KCSAN: data-race in prandom_bytes / prandom_u32
123 BUG: KCSAN: data-race in __hrtimer_run_queues / hrtimer_active
56 BUG: KCSAN: data-race in mutex_spin_on_owner+0x13d/0x260
51 BUG: KCSAN: data-race in prandom_u32 / prandom_u32
41 BUG: KCSAN: data-race in __run_timers / try_to_del_timer_sync
37 BUG: KCSAN: data-race in tick_sched_timer / tick_sched_timer
34 BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_next_event
26 BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_nohz_idle_stop_tick
* 18 BUG: KCSAN: data-race in __run_timers / rcu_torture_reader
16 BUG: KCSAN: data-race in __hrtimer_run_queues / hrtimer_try_to_cancel
16 BUG: KCSAN: data-race in tick_nohz_idle_stop_tick / tick_sched_timer
6 BUG: KCSAN: data-race in cpumask_next+0x20/0x50
6 BUG: KCSAN: data-race in __mod_timer / __run_timers
6 BUG: KCSAN: data-race in run_timer_softirq / try_to_del_timer_sync
6 BUG: KCSAN: data-race in rwsem_spin_on_owner+0x144/0x230
6 BUG: KCSAN: data-race in tick_nohz_next_event / tick_sched_timer
5 BUG: KCSAN: data-race in cpumask_next_and+0x25/0x60
4 BUG: KCSAN: data-race in _find_next_bit+0x3f/0x120
4 BUG: KCSAN: data-race in ps2_do_sendbyte / ps2_handle_ack
4 BUG: KCSAN: data-race in rcu_preempt_deferred_qs_irqrestore / try_to_take_rt_mutex
4 BUG: KCSAN: data-race in rcu_torture_reader / run_timer_softirq
3 BUG: KCSAN: data-race in cpumask_next_and+0x27/0x70
3 BUG: KCSAN: data-race in cpumask_next_and+0x30/0x60
3 BUG: KCSAN: data-race in cpumask_next_wrap+0x4b/0xa0
3 BUG: KCSAN: data-race in do_raw_write_lock / do_raw_write_unlock
3 BUG: KCSAN: data-race in process_one_work / queue_work_on
3 BUG: KCSAN: data-race in timer_clear_idle / trigger_dyntick_cpu
2 BUG: KCSAN: data-race in do_raw_write_lock / do_raw_write_lock
2 BUG: KCSAN: data-race in ktime_get_real_fast_ns+0x7d/0x130
2 BUG: KCSAN: data-race in ktime_get_real_fast_ns+0xb5/0x130
2 BUG: KCSAN: data-race in __mod_timer / run_timer_softirq
2 BUG: KCSAN: data-race in queue_work_on / worker_thread
1 BUG: KCSAN: data-race in bringup_cpu / cpuhp_should_run
1 BUG: KCSAN: data-race in can_stop_idle_tick / tick_sched_timer
1 BUG: KCSAN: data-race in internal_add_timer / update_process_times
1 BUG: KCSAN: data-race in rcu_nocb_cb_kthread / rcu_nocb_gp_kthread
1 BUG: KCSAN: data-race in rcu_preempt_deferred_qs_irqrestore / rt_mutex_unlock
1 BUG: KCSAN: data-race in __ww_mutex_lock+0x1a3/0x1610
Powered by blists - more mailing lists