[<prev] [next>] [day] [month] [year] [list]
Message-Id: <202003281643.02SGhAEl008453@sdf.org>
Date: Fri, 20 Dec 2019 22:30:32 -0500
From: George Spelvin <lkml@....org>
To: linux-kernel@...r.kernel.org, lkml@....org
Cc: Stephen Hemminger <stephen@...workplumber.org>,
Hannes Frederic Sowa <hannes@...essinduktion.org>
Subject: [RFC PATCH v1 10/50] <linux/random.h> Make prandom_u32_max() exact
for constant ranges
This is a cute hack I'm not sure should go into the kernel.
Comments very much requested!
Getting rid of the last tiny bit of non-uniformity is quite cheap
*if* the range is known at compile time: a compare with an immediate
and a (rarely taken) conditional branch to retry.
So for hack value, implement this for compile-time constant ranges.
For variable ranges, it involves a division, so just live with the
slight non-uniformity.
The style questions are:
- Is a more accurate result *some* of the time actually worth anything?
- Is the benefit enough to justify the ugly inconsistency?
Signed-off-by: George Spelvin <lkml@....org>
Cc: Stephen Hemminger <stephen@...workplumber.org>
Cc: Hannes Frederic Sowa <hannes@...essinduktion.org>
---
include/linux/random.h | 63 ++++++++++++++++++++++++++++++------------
1 file changed, 46 insertions(+), 17 deletions(-)
diff --git a/include/linux/random.h b/include/linux/random.h
index 109772175c833..82dd0d613e75c 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -9,6 +9,7 @@
#include <linux/list.h>
#include <linux/once.h>
+#include <linux/math64.h>
#include <uapi/linux/random.h>
@@ -147,9 +148,11 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
* a property that is provided by prandom_u32().)
*
* It is possible to extend this to avoid all bias and return perfectly
- * uniform pseudorandom numbers by discarding and regenerating sometimes,
- * but if your needs are that stringent, you might want to use a stronger
- * generator (like get_random_u32()).
+ * uniform pseudorandom numbers by discarding and regenerating sometimes.
+ * That's easy to do for constant ranges, so this code does it in that
+ * case, but settles for the approimation for variable ranges. If you
+ * need exact uniformity, you might also want to use a stronger generator
+ * (like get_random_u32()).
*
* Ref:
* Fast Random Integer Generation in an Interval
@@ -160,21 +163,47 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state);
*/
static inline u32 prandom_u32_max(u32 range)
{
- /*
- * If the range is a compile-time constant power of 2, then use
- * a simple shift. This is mathematically equivalent to the
- * multiplication, but GCC 8.3 doesn't optimize that perfectly.
- *
- * We could do an AND with a mask, but
- * 1) The shift is the same speed on a decent CPU,
- * 2) It's generally smaller code (smaller immediate), and
- * 3) Many PRNGs have trouble with their low-order bits;
- * using the msbits is generaly preferred.
- */
- if (__builtin_constant_p(range) && (range & (range - 1)) == 0)
- return prandom_u32() / (u32)(0x100000000 / range);
- else
+ if (!__builtin_constant_p(range)) {
+ /*
+ * If range is a variable, prioritize speed over
+ * perfect uniformity.
+ */
return reciprocal_scale(prandom_u32(), range);
+ } else if (range & (range - 1)) {
+ /*
+ * If the range is a compile-time constant, then achieving
+ * perfect uniformity is one compare with immediate and
+ * one unlikely branch, so go ahead and do it.
+ *
+ * Some 32-bit processors require an additional
+ * instruction to get the low half of a 64-bit product.
+ * PowerPC and NIOS have separate "multiply high" and
+ * "multiply low" instructions. MIPS32 and ARC need to
+ * move the result from a special-purpose register to
+ * a GPR.
+ */
+ u32 const lim = -range % range;
+ u64 prod;
+
+ do
+ prod = mul_u32_u32(prandom_u32(), range);
+ while (unlikely((u32)prod < lim));
+ return prod >> 32;
+ } else {
+ /*
+ * If the range is a compile-time constant power of 2,
+ * then use a simple shift. This is mathematically
+ * equivalent to the multiplication, but GCC 8.3 doesn't
+ * optimize that perfectly.
+ *
+ * We could do an AND with a mask, but
+ * 1) The shift is the same speed on a decent CPU,
+ * 2) It's generally smaller code (smaller immediate), and
+ * 3) Many PRNGs have trouble with their low-order bits;
+ * using the msbits is generaly preferred.
+ */
+ return prandom_u32() / (u32)(0x100000000 / range);
+ }
}
/*
--
2.26.0
Powered by blists - more mailing lists