[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b79eb142-67b2-48f0-9ad9-f9b634491e09@paulmck-laptop>
Date: Tue, 5 Sep 2023 06:56:22 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: David Laight <David.Laight@...lab.com>,
Denis Arefev <arefev@...mel.ru>,
Lai Jiangshan <jiangshanlai@...il.com>,
Josh Triplett <josh@...htriplett.org>,
Steven Rostedt <rostedt@...dmis.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"lvc-project@...uxtesting.org" <lvc-project@...uxtesting.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"trufanov@...mel.ru" <trufanov@...mel.ru>,
"vfh@...mel.ru" <vfh@...mel.ru>
Subject: Re: [PATCH v2] The value may overflow
On Tue, Sep 05, 2023 at 09:34:45AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 09:26, Paul E. McKenney wrote:
> > On Tue, Sep 05, 2023 at 08:26:51AM -0400, Mathieu Desnoyers wrote:
> > > On 9/5/23 05:31, David Laight wrote:
> > > > From: Mathieu Desnoyers
> > > > > Sent: 04 September 2023 11:24
> > > > >
> > > > > On 9/4/23 05:42, Denis Arefev wrote:
> > > > > > The value of an arithmetic expression 1 << (cpu - sdp->mynode->grplo)
> > > > > > is subject to overflow due to a failure to cast operands to a larger
> > > > > > data type before performing arithmetic
> > > > >
> > > > > The patch title should identify more precisely its context, e.g.:
> > > > >
> > > > > "srcu: Fix srcu_struct node grpmask overflow on 64-bit systems"
> > > > >
> > > > > Also, as I stated in my reply to the previous version, the patch commit
> > > > > message should describe the impact of the bug it fixes.
> > > >
> > > > And is the analysis complete?
> > > > Is 1UL right for 32bit archs??
> > > > Is 64 bits even enough??
> > >
> > > I understand from include/linux/rcu_node_tree.h and kernel/rcu/Kconfig
> > > RCU_FANOUT and RCU_FANOUT_LEAF ranges that a 32-bit integer is sufficient to
> > > hold the mask on 32-bit architectures, and a 64-bit integer is enough on
> > > 64-bit architectures given the current implementation.
> > >
> > > At least this appears to be the intent. I did not do a thorough analysis of
> > > the various parameter limits.
> >
> > Mathieu has it right.
> >
> > 32-bit kernels are unaffected by this bug.
> >
> > RCU_FANOUT_LEAF defaults to 16, which means that a 64-bit kernel would
> > need more than 32 leaf rcu_node structures for the parent rcu_node
> > structure to need more than 32 bit to track its children. This means
> > that more than 32*16=512 CPUs are required for this bug to affect 64-bit
> > systems. And there really are systems this big, so I am surprised that
> > this has not shown up long ago. But it would not be the first time that
> > objective reality surprised me. ;-)
>
> So with a 64-bit kernel, RCU_FANOUT_LEAF=16, if we have exactly 32 leaf
> rcu_node structures (exactly 512 CPUs), we end up in the situation where the
> type promotion from signed integer (32-bit) to unsigned long will carry the
> sign, and thus create a mask of 0xffffffff80000000.
If there is someplace that does a shift of 1UL, yes, this would be a
problem. I don't see one, but I might have missed something. If all
the shifts are from 1, then the top 33 bits would be always set and
cleared as a unit, as if they were one bit.
> So if this weird mask is indeed an issue we should state that configurations
> _starting from 512 CPUs_ are affected, not just those with more than 512
> CPUs.
That would instead be more than 512-16=496 CPUs, correct? 496 CPUs would
only require a 31-bit shift, which should be OK, but 497 would require
a 32-bit shift, which would result in sign extension. If it turns out
that sign extension is OK, then we should get in trouble at 513 CPUs,
which would result in a 33-bit shift (and is that even defined in C?).
But this can be tested. Let's try setting RCU_FANOUT_LEAF to 2. Then we
"only" need 64 (maybe 62) CPUs to trigger the bug. And I happen to
have access to a system with 80 CPUs. (Those with smaller systems can
try to trigger this bug by changing the current "1 <<" to (say) "4 <<",
which should trigger it on 16-CPU systems.)
I just fired off an 80-CPU test with CONFIG_RCU_FANOUT_LEAF=2 and will
let you know how it goes.
Thanx, Paul
> Thanks,
>
> Mathieu
>
>
> >
> > Thanx, Paul
> >
> > > Thanks,
> > >
> > > Mathieu
> > >
> > > >
> > > > David
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Mathieu
> > > > >
> > > > >
> > > > > >
> > > > > > Found by Linux Verification Center (linuxtesting.org) with SVACE.
> > > > > >
> > > > > > Signed-off-by: Denis Arefev <arefev@...mel.ru>
> > > > > > ---
> > > > > > v2: Added fixes to the srcu_schedule_cbs_snp function as suggested by
> > > > > > Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
> > > > > > kernel/rcu/srcutree.c | 4 ++--
> > > > > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> > > > > > index 20d7a238d675..6c18e6005ae1 100644
> > > > > > --- a/kernel/rcu/srcutree.c
> > > > > > +++ b/kernel/rcu/srcutree.c
> > > > > > @@ -223,7 +223,7 @@ static bool init_srcu_struct_nodes(struct srcu_struct *ssp, gfp_t gfp_flags)
> > > > > > snp->grplo = cpu;
> > > > > > snp->grphi = cpu;
> > > > > > }
> > > > > > - sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
> > > > > > + sdp->grpmask = 1UL << (cpu - sdp->mynode->grplo);
> > > > > > }
> > > > > > smp_store_release(&ssp->srcu_sup->srcu_size_state, SRCU_SIZE_WAIT_BARRIER);
> > > > > > return true;
> > > > > > @@ -833,7 +833,7 @@ static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp
> > > > > > int cpu;
> > > > > >
> > > > > > for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
> > > > > > - if (!(mask & (1 << (cpu - snp->grplo))))
> > > > > > + if (!(mask & (1UL << (cpu - snp->grplo))))
> > > > > > continue;
> > > > > > srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
> > > > > > }
> > > > >
> > > > > --
> > > > > Mathieu Desnoyers
> > > > > EfficiOS Inc.
> > > > > https://www.efficios.com
> > > >
> > > > -
> > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > > > Registration No: 1397386 (Wales)
> > >
> > > --
> > > Mathieu Desnoyers
> > > EfficiOS Inc.
> > > https://www.efficios.com
> > >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Powered by blists - more mailing lists