[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <abe14f37-ec05-48c3-87fc-9294848d221f@paulmck-laptop>
Date: Tue, 5 Sep 2023 09:40:51 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
Cc: David Laight <David.Laight@...lab.com>,
Denis Arefev <arefev@...mel.ru>,
Lai Jiangshan <jiangshanlai@...il.com>,
Josh Triplett <josh@...htriplett.org>,
Steven Rostedt <rostedt@...dmis.org>,
"rcu@...r.kernel.org" <rcu@...r.kernel.org>,
"lvc-project@...uxtesting.org" <lvc-project@...uxtesting.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"trufanov@...mel.ru" <trufanov@...mel.ru>,
"vfh@...mel.ru" <vfh@...mel.ru>, ldufour@...ux.ibm.com
Subject: Re: [PATCH v2] The value may overflow
On Tue, Sep 05, 2023 at 10:34:43AM -0400, Mathieu Desnoyers wrote:
> On 9/5/23 10:15, David Laight wrote:
> > ...
> > > That would instead be more than 512-16=496 CPUs, correct? 496 CPUs would
> > > only require a 31-bit shift, which should be OK, but 497 would require
> > > a 32-bit shift, which would result in sign extension. If it turns out
> > > that sign extension is OK, then we should get in trouble at 513 CPUs,
> > > which would result in a 33-bit shift (and is that even defined in C?).
> >
> > Not quite right :-)
> >
> > (1 << 31) is int and negative, that gets sign extended before
> > being converted to 'unsigned long' - so has the top 33 bits set.
Good point, thank you for the correction.
> > (1 << 32) is undefined, the current x86 cpu ignore the high
> > shift bits so it is (1 << 0).
Which would cause it to attempt to invoke SRCU callbacks on the
lowest-numbered CPU associated with that srcu_node structure.
> Yes, I was about to reply the same thing. A shift of 31 is buggy,
> because shifting 1 << 31 raises the sign bit, which sets the top 33
> bits when cast to unsigned long. A shift of 1 << 32 is undefined,
> with for instance x86 choosing to ignore the top bit.
>
> But in order to have a 1 << 31 shift from this expression:
>
> sdp->grpmask = 1 << (cpu - sdp->mynode->grplo);
>
> I *think* we need the group to have 32 cpus or more (indexed within
> the group from grplo to grplo + 31 (both inclusive)).
>
> So as soon as we have one group with 32 cpus, the problem should show
> up. With FANOUT_LEAF=16, we can have 15 groups of 31 cpus and 1
> group of 32 cpus, for:
>
> 15*31 + 32 = 497 cpus.
>
> AFAIU, this would be the minimum value for smp_processor_id()+1 which
> triggers this issue.
By default, there are 16 CPUs per leaf srcu_node structure. Each leaf
srcu_node structure takes up one bit in the parent srcu_node structures'
srcu_data_have_cbs[] array. Up to 30 bits is OK, 31 bits is questionable,
and 32 bits and larger erroneous.
This is the situation as I see it (assuming dense CPU numbering):
# Leaf srcu_node Largest
structures #CPUs CPU # Status
0-30 0-480 479 Good
31 481-496 495 Questionable
32- 497- 496+ Bad.
Tree SRCU differs from Tree RCU in its operation, so this bug should
not hold up SRCU grace periods. It might instead cause SRCU callbacks
to be ignored (which would admittedly look quite similar to the user).
My attempts to cheat the numbering system ran up against the limited
height of the srcu_node tree.
But there is still the question of why this has not been seen in the
wild, given that there really are systems with more than 479 CPUs
out there. One possibility is the call to srcu_schedule_cbs_sdp()
from srcu_funnel_gp_start(). But it does not seem likely that this
would happen all that often, as it requires back-to-back grace periods
and then some.
Maybe people with large systems boot with srcutree.convert_to_big=0?
Adding Laurent for his thoughts.
Aha!
Here is what makes it work, given David's description of 1<<32:
static void srcu_schedule_cbs_snp(struct srcu_struct *ssp, struct srcu_node *snp,
unsigned long mask, unsigned long delay)
{
int cpu;
for (cpu = snp->grplo; cpu <= snp->grphi; cpu++) {
if (!(mask & (1 << (cpu - snp->grplo))))
continue;
srcu_schedule_cbs_sdp(per_cpu_ptr(ssp->sda, cpu), delay);
}
}
As long as at least one bit is set in the result of 1<<N, and as long
as the compiler always does the same thing for a given N, then this loop
will make the right thing happen.
But even that is not relied upon, because the calling code looks like
this:
spin_lock_irq_rcu_node(snp);
cbs = false;
last_lvl = snp >= sup->level[rcu_num_lvls - 1];
if (last_lvl)
cbs = ss_state < SRCU_SIZE_BIG || snp->srcu_have_cbs[idx] == gpseq;
snp->srcu_have_cbs[idx] = gpseq;
rcu_seq_set_state(&snp->srcu_have_cbs[idx], 1);
sgsne = snp->srcu_gp_seq_needed_exp;
if (srcu_invl_snp_seq(sgsne) || ULONG_CMP_LT(sgsne, gpseq))
WRITE_ONCE(snp->srcu_gp_seq_needed_exp, gpseq);
if (ss_state < SRCU_SIZE_BIG)
mask = ~0;
else
mask = snp->srcu_data_have_cbs[idx];
snp->srcu_data_have_cbs[idx] = 0;
spin_unlock_irq_rcu_node(snp);
if (cbs)
srcu_schedule_cbs_snp(ssp, snp, mask, cbdelay);
So that last_lvl check means that the srcu_schedule_cbs_snp() function
is invoked only for leaf srcu_node structures. Which by default limit
the shift to 16.
So this bug appears to have no effect for default RCU setups, even on very
large 64-bit systems, which is consistent with field experience. Even if
this is the case, it still should be fixed, to avoid confusion if nothing
else. Or just in case someone decides to set CONFIG_RCU_FANOUT_LEAF=32.
Which actually happened the other day due to someone trusting ChatGPT's
opinion about RCU Kconfig options...
Thanx, Paul
> Thanks,
>
> Mathieu
>
> >
> > If the mask is being used to optimise a search the code might
> > just happen to work!
> >
> > David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >
>
> --
> Mathieu Desnoyers
> EfficiOS Inc.
> https://www.efficios.com
>
Powered by blists - more mailing lists