linux-kernel - Re: [RFC PATCH] cpumask: Implement "random" version of cpumask_any

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Z4aGPI0yka6qL-dF@thinkpad>
Date: Tue, 14 Jan 2025 10:43:56 -0500
From: Yury Norov <yury.norov@...il.com>
To: I Hsin Cheng <richard120310@...il.com>
Cc: mark.rutland@....com, linux@...musvillemoes.dk, jserv@...s.ncku.edu.tw,
	linux-kernel@...r.kernel.org, visitorckw@...il.com
Subject: Re: [RFC PATCH] cpumask: Implement "random" version of
 cpumask_any_but()

> > > > @@ -401,12 +401,18 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
> > > >  static __always_inline
> > > >  unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu)
> > > >  {
> > > > -	unsigned int i;
> > > > +	unsigned int i, n, weight;
> > > >  
> > > >  	cpumask_check(cpu);
> > > > -	for_each_cpu(i, mask)
> > > > -		if (i != cpu)
> > > > -			break;
> > > > +	weight = cpumask_weight(mask);
> > > > +	n = get_random_u32() % weight;
> > > > +
> > > > +	/* If we accidentally pick "n" equal to "cpu",
> > > > +	 * then simply choose "n + 1"th cpu.
> > > > +	 */
> > > > +	if (n == cpu)
> > > > +		n = (n + 1) % weight;
> > > > +	i = cpumask_nth(n, mask);
> > 
> > This is an entirely broken thing, and it works only because your CPU mask
> > is dense. Imagine cpumask: 0111 1111. Your new cpumask_any_but(mask, 5)
> > will return 5 exactly, if the get_random_u32() draws 4.
> > 
> 
> Oh I'm sorry for this part, "cpumask_nth()" return value should be
> checked instead of checking "get_random_u32()". May I ask why does it
> only work for dense cpumask?

Because for dense mask cpumask_nth(n) == n, and your
 	n = get_random_u32() % weight
is the same as 
 	n = cpumask_nth(get_random_u32() % weight)

> I mean clearly original method will be
> faster when cpumask isn't dense.
> 
> > It looks broken even for a dense mask. By probability, your code returns:
> > 
> > P(0-4,7) == 1/8,
> > P(5) == 0,
> > P(6) == 1/4.
> > 
> > Assuming you are trying to implement a random uniform distribution drawing,
> > the correct probabilities should look like:
> > 
> > P(0-4,6-7) == 1/7,
> > P(5) == 0,
> > 
> > Thanks,
> > Yury
> > 
> 
> I did noticed this part, I was thinking that we do not actually need to
> make the prbability to be uniform distribution, just want to make it
> scatters more then picking certain number.

On cpumask level you can't speculate whether users need true randomness
or not. That's why we pay so much attention to proper function naming.

See: 
 - cpumask_any() - 'any' requirement is much simpler than random;
 - cpumask_any_distribute() - not random at all, just a better (good
   enough) approximation;
 - cpumask_random() - a true random drawing. Must pass Chi^2, 
   Kolmogorov-Smirnov and other fancy statistical tests. As you can see,
   doesn't exist.

If in your function you use expensive true-random get_random_u32(),
one can expect that the output will not be worse than the original
uniform distribution, by the statistical properties means.

> Thanks for your detailed review and explanation, also thanks to Mark and
> Kuan-Wei !
> 
> I now realize "random" here is more of a convention for the caller to
> states that it doesn't matter which cpu it gets, maybe we should
> rephrase the comment to make it less confusing? Because I think "random"
> itself does stands for a particular meaning.

Feel free to send a patch.

Thanks,
Yury