[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Z4disaeF2T1buHWb@vaxr-BM6660-BM6360>
Date: Wed, 15 Jan 2025 15:24:33 +0800
From: I Hsin Cheng <richard120310@...il.com>
To: Yury Norov <yury.norov@...il.com>
Cc: mark.rutland@....com, linux@...musvillemoes.dk, jserv@...s.ncku.edu.tw,
linux-kernel@...r.kernel.org, visitorckw@...il.com
Subject: Re: [RFC PATCH] cpumask: Implement "random" version of
cpumask_any_but()
On Tue, Jan 14, 2025 at 10:43:56AM -0500, Yury Norov wrote:
> > > > > @@ -401,12 +401,18 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
> > > > > static __always_inline
> > > > > unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> > > > > {
> > > > > - unsigned int i;
> > > > > + unsigned int i, n, weight;
> > > > >
> > > > > cpumask_check(cpu);
> > > > > - for_each_cpu(i, mask)
> > > > > - if (i != cpu)
> > > > > - break;
> > > > > + weight = cpumask_weight(mask);
> > > > > + n = get_random_u32() % weight;
> > > > > +
> > > > > + /* If we accidentally pick "n" equal to "cpu",
> > > > > + * then simply choose "n + 1"th cpu.
> > > > > + */
> > > > > + if (n == cpu)
> > > > > + n = (n + 1) % weight;
> > > > > + i = cpumask_nth(n, mask);
> > >
> > > This is an entirely broken thing, and it works only because your CPU mask
> > > is dense. Imagine cpumask: 0111 1111. Your new cpumask_any_but(mask, 5)
> > > will return 5 exactly, if the get_random_u32() draws 4.
> > >
> >
> > Oh I'm sorry for this part, "cpumask_nth()" return value should be
> > checked instead of checking "get_random_u32()". May I ask why does it
> > only work for dense cpumask?
>
> Because for dense mask cpumask_nth(n) == n, and your
> n = get_random_u32() % weight
> is the same as
> n = cpumask_nth(get_random_u32() % weight)
>
> > I mean clearly original method will be
> > faster when cpumask isn't dense.
> >
> > > It looks broken even for a dense mask. By probability, your code returns:
> > >
> > > P(0-4,7) == 1/8,
> > > P(5) == 0,
> > > P(6) == 1/4.
> > >
> > > Assuming you are trying to implement a random uniform distribution drawing,
> > > the correct probabilities should look like:
> > >
> > > P(0-4,6-7) == 1/7,
> > > P(5) == 0,
> > >
> > > Thanks,
> > > Yury
> > >
> >
> > I did noticed this part, I was thinking that we do not actually need to
> > make the prbability to be uniform distribution, just want to make it
> > scatters more then picking certain number.
>
> On cpumask level you can't speculate whether users need true randomness
> or not. That's why we pay so much attention to proper function naming.
>
> See:
> - cpumask_any() - 'any' requirement is much simpler than random;
> - cpumask_any_distribute() - not random at all, just a better (good
> enough) approximation;
> - cpumask_random() - a true random drawing. Must pass Chi^2,
> Kolmogorov-Smirnov and other fancy statistical tests. As you can see,
> doesn't exist.
>
> If in your function you use expensive true-random get_random_u32(),
> one can expect that the output will not be worse than the original
> uniform distribution, by the statistical properties means.
>
I see, thanks for the explanation, it really helps me learn alot!
> > Thanks for your detailed review and explanation, also thanks to Mark and
> > Kuan-Wei !
> >
> > I now realize "random" here is more of a convention for the caller to
> > states that it doesn't matter which cpu it gets, maybe we should
> > rephrase the comment to make it less confusing? Because I think "random"
> > itself does stands for a particular meaning.
>
> Feel free to send a patch.
No problem, I'll send it ASAP.
Regards,
I Hsin Cheng.
Powered by blists - more mailing lists