[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250304101150.0eb83618@gandalf.local.home>
Date: Tue, 4 Mar 2025 10:11:50 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: Strforexc yn <strforexc@...il.com>, Lai Jiangshan
<jiangshanlai@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>, Josh
Triplett <josh@...htriplett.org>, Mathieu Desnoyers
<mathieu.desnoyers@...icios.com>, rcu@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: KASAN: global-out-of-bounds Read in srcu_gp_start_if_needed
On Mon, 3 Mar 2025 22:57:32 -0500
Joel Fernandes <joelagnelf@...dia.com> wrote:
> >
> > The lock taken is from the passed in rcu_pending pointer.
> >
> > > [ 92.322655][ T28] rcu_pending_enqueue+0x686/0xd30
> > > [ 92.322676][ T28] ? __pfx_rcu_pending_enqueue+0x10/0x10
> > > [ 92.322693][ T28] ? trace_lock_release+0x11a/0x180
> > > [ 92.322708][ T28] ? bkey_cached_free+0xa3/0x170
> > > [ 92.322725][ T28] ? lock_release+0x13/0x180
> > > [ 92.322744][ T28] ? bkey_cached_free+0xa3/0x170
> > > [ 92.322760][ T28] bkey_cached_free+0xfd/0x170
> >
> > Which has:
> >
> > static void bkey_cached_free(struct btree_key_cache *bc,
> > struct bkey_cached *ck)
> > {
> > kfree(ck->k);
> > ck->k = NULL;
> > ck->u64s = 0;
> >
> > six_unlock_write(&ck->c.lock);
> > six_unlock_intent(&ck->c.lock);
> >
> > bool pcpu_readers = ck->c.lock.readers != NULL;
> > rcu_pending_enqueue(&bc->pending[pcpu_readers], &ck->rcu);
> > this_cpu_inc(*bc->nr_pending);
> > }
> >
> > So if that bc->pending[pcpu_readers] gets corrupted in anyway, that could trigger this.
>
> True, another thing that could corrupt it is if per-cpu global data section
> section is corrupted, because the crash is happening in this trylock per the
> above stack:
>
> srcu_gp_start_if_needed ->
> spin_lock_irqsave_sdp_contention(sdp) ->
> spin_trylock(sdp->lock)
>
> where sdp is ssp->sda and is allocated from per-cpu storage.
>
> So corruption of the per-cpu global data section can also trigger this, even
> if the rcu_pending pointer is intact.
If there was corruption of the per-cpu section, you would think it would
have a bigger impact than just this location. As most of the kernel relies
on the per-cpu section.
But it could be corruption of the per-cpu variable that was used. Caused by
the code that uses it.
That code is quite complex, and I usually try to rule out the code that is
used in one location as being the issue before looking at something like
per-cpu or RCU code which is used all over the place, and if that was
buggy, it would likely blow up elsewhere outside of bcachefs.
But who knows, perhaps the bcachefs triggered a corner case?
-- Steve
Powered by blists - more mailing lists