linux-kernel - Re: KASAN: global-out-of-bounds Read in srcu_gp_start_if

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250304101150.0eb83618@gandalf.local.home>
Date: Tue, 4 Mar 2025 10:11:50 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Joel Fernandes <joelagnelf@...dia.com>
Cc: Strforexc yn <strforexc@...il.com>, Lai Jiangshan
 <jiangshanlai@...il.com>, "Paul E. McKenney" <paulmck@...nel.org>, Josh
 Triplett <josh@...htriplett.org>, Mathieu Desnoyers
 <mathieu.desnoyers@...icios.com>, rcu@...r.kernel.org,
 linux-kernel@...r.kernel.org
Subject: Re: KASAN: global-out-of-bounds Read in srcu_gp_start_if_needed

On Mon, 3 Mar 2025 22:57:32 -0500
Joel Fernandes <joelagnelf@...dia.com> wrote:

> > 
> > The lock taken is from the passed in rcu_pending pointer.
> >   
> > > [   92.322655][   T28]  rcu_pending_enqueue+0x686/0xd30
> > > [   92.322676][   T28]  ? __pfx_rcu_pending_enqueue+0x10/0x10
> > > [   92.322693][   T28]  ? trace_lock_release+0x11a/0x180
> > > [   92.322708][   T28]  ? bkey_cached_free+0xa3/0x170
> > > [   92.322725][   T28]  ? lock_release+0x13/0x180
> > > [   92.322744][   T28]  ? bkey_cached_free+0xa3/0x170
> > > [   92.322760][   T28]  bkey_cached_free+0xfd/0x170  
> > 
> > Which has:
> > 
> > static void bkey_cached_free(struct btree_key_cache *bc,
> >                              struct bkey_cached *ck)
> > {
> >         kfree(ck->k);
> >         ck->k           = NULL;
> >         ck->u64s        = 0;
> >                 
> >         six_unlock_write(&ck->c.lock);
> >         six_unlock_intent(&ck->c.lock);
> > 
> >         bool pcpu_readers = ck->c.lock.readers != NULL;
> >         rcu_pending_enqueue(&bc->pending[pcpu_readers], &ck->rcu);
> >         this_cpu_inc(*bc->nr_pending);
> > }
> > 
> > So if that bc->pending[pcpu_readers] gets corrupted in anyway, that could trigger this.  
> 
> True, another thing that could corrupt it is if per-cpu global data section
> section is corrupted, because the crash is happening in this trylock per the
> above stack:
> 
>  srcu_gp_start_if_needed ->
> 	spin_lock_irqsave_sdp_contention(sdp) ->
> 		spin_trylock(sdp->lock)
> 
> 	where sdp is ssp->sda and is allocated from per-cpu storage.
> 
> So corruption of the per-cpu global data section can also trigger this, even
> if the rcu_pending pointer is intact.

If there was corruption of the per-cpu section, you would think it would
have a bigger impact than just this location. As most of the kernel relies
on the per-cpu section.

But it could be corruption of the per-cpu variable that was used. Caused by
the code that uses it.

That code is quite complex, and I usually try to rule out the code that is
used in one location as being the issue before looking at something like
per-cpu or RCU code which is used all over the place, and if that was
buggy, it would likely blow up elsewhere outside of bcachefs.

But who knows, perhaps the bcachefs triggered a corner case?

-- Steve