[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250304035732.GA128190@joelnvbox>
Date: Mon, 3 Mar 2025 22:57:32 -0500
From: Joel Fernandes <joelagnelf@...dia.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Strforexc yn <strforexc@...il.com>,
Lai Jiangshan <jiangshanlai@...il.com>,
"Paul E. McKenney" <paulmck@...nel.org>,
Josh Triplett <josh@...htriplett.org>,
Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
rcu@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: KASAN: global-out-of-bounds Read in srcu_gp_start_if_needed
On Mon, Mar 03, 2025 at 11:47:11AM -0500, Steven Rostedt wrote:
[...]
> > [ 92.322347][ T28] register_lock_class+0xb2/0xfc0
> > [ 92.322366][ T28] ? __lock_acquire+0xb97/0x16a0
> > [ 92.322386][ T28] ? __pfx_register_lock_class+0x10/0x10
> > [ 92.322407][ T28] ? do_perf_trace_lock.isra.0+0x10b/0x570
> > [ 92.322427][ T28] __lock_acquire+0xc3/0x16a0
> > [ 92.322446][ T28] ? __pfx___lock_release+0x10/0x10
> > [ 92.322466][ T28] ? rcu_is_watching+0x12/0xd0
> > [ 92.322486][ T28] lock_acquire+0x181/0x3a0
> > [ 92.322505][ T28] ? srcu_gp_start_if_needed+0x1a9/0x5f0
> > [ 92.322522][ T28] ? __pfx_lock_acquire+0x10/0x10
> > [ 92.322541][ T28] ? debug_object_active_state+0x2f1/0x3f0
> > [ 92.322557][ T28] ? do_raw_spin_trylock+0xb4/0x190
> > [ 92.322570][ T28] ? __pfx_do_raw_spin_trylock+0x10/0x10
> > [ 92.322583][ T28] ? __kmalloc_cache_noprof+0x1b9/0x450
> > [ 92.322604][ T28] _raw_spin_trylock+0x76/0xa0
> > [ 92.322619][ T28] ? srcu_gp_start_if_needed+0x1a9/0x5f0
> > [ 92.322636][ T28] srcu_gp_start_if_needed+0x1a9/0x5f0
>
> The lock taken is from the passed in rcu_pending pointer.
>
> > [ 92.322655][ T28] rcu_pending_enqueue+0x686/0xd30
> > [ 92.322676][ T28] ? __pfx_rcu_pending_enqueue+0x10/0x10
> > [ 92.322693][ T28] ? trace_lock_release+0x11a/0x180
> > [ 92.322708][ T28] ? bkey_cached_free+0xa3/0x170
> > [ 92.322725][ T28] ? lock_release+0x13/0x180
> > [ 92.322744][ T28] ? bkey_cached_free+0xa3/0x170
> > [ 92.322760][ T28] bkey_cached_free+0xfd/0x170
>
> Which has:
>
> static void bkey_cached_free(struct btree_key_cache *bc,
> struct bkey_cached *ck)
> {
> kfree(ck->k);
> ck->k = NULL;
> ck->u64s = 0;
>
> six_unlock_write(&ck->c.lock);
> six_unlock_intent(&ck->c.lock);
>
> bool pcpu_readers = ck->c.lock.readers != NULL;
> rcu_pending_enqueue(&bc->pending[pcpu_readers], &ck->rcu);
> this_cpu_inc(*bc->nr_pending);
> }
>
> So if that bc->pending[pcpu_readers] gets corrupted in anyway, that could trigger this.
True, another thing that could corrupt it is if per-cpu global data section
section is corrupted, because the crash is happening in this trylock per the
above stack:
srcu_gp_start_if_needed ->
spin_lock_irqsave_sdp_contention(sdp) ->
spin_trylock(sdp->lock)
where sdp is ssp->sda and is allocated from per-cpu storage.
So corruption of the per-cpu global data section can also trigger this, even
if the rcu_pending pointer is intact.
thanks,
- Joel
Powered by blists - more mailing lists