linux-kernel - Re: Updated scalable urandom patchkit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151012001601.18155.qmail@ns.horizon.com>
Date:	11 Oct 2015 20:16:01 -0400
From:	"George Spelvin" <linux@...izon.com>
To:	linux@...izon.com, tytso@....edu
Cc:	ahferroin7@...il.com, andi@...stfloor.org, jepler@...ythonic.net,
	linux-kernel@...r.kernel.org, linux@...musvillemoes.dk
Subject: Re: Updated scalable urandom patchkit

TedTs'o wrote:
> Yep, good catch; I need to subtract that off.

I'm not thrilled with incrementing the pointer from i to len, but mixing
at positions i+k to i+k+len.  The whole LFSR scheme relies on a regular
pass structure.

How about this instead: drop the hashed offset, and instead let each
writer do an atomic_add_return on the index, then iterate over the
"reserved" slots.  Concurrent additions will at least do non-overlapping
writes until the numer equals the pool size.

(Admittedly, if the pool is 32 words and we're adding back 20 bytes
per reader, overlap is hard to avoid.  Could we do seeding a byte at a
time with input rotate, but add-back 32 bits at a time for simplicity
and speed?)

The other one would be to have separate indexes for add-back and seed
addition, with the latter protected by a lock.

>> (There's a similar, lesser problem with input_rotate.)

> I didn't bother messing with input_rotate at all, and I don't think
> that will be a problem.

It was more that it's going to do strange non-deterministic things.

Personally, I hate the input_rotate.  It's not that it's harmful, just
that it doesn't do much good compared to the cost; for the number of cycles
and context space required there are more efficient mixing operations.

But if you want, you can easily compute it on demand.
If you avoid reducing r->add_words mod poolwords, then
	input_rotate = 7*(r->add_words + r->add_words/poolwords)

>> The add-back is not critical, and races between two writers don't really
>> do any harm.  But seed entropy is valuable.

> That's true, but I've been going back and forth about how much it's
> worth to fix this.  After all, the the non-blocking pool is guaranteed
> to have cryptographic randomness, and so it's a question of how much
> effort we want to avoid the possibility of losing some seed entropy.

I worry that with only 32 words of pool, on a large (1K CPUs) machine
that's execv()ing and geerating AT_RANDOM vectors a lot, the traffic
could be pretty heavy, and the odds of stomping a byte of seed material
cound get substantial.

Or a small machine with a couple of concurrent /dev/urandom abusers.
Remember, it's globally readable, so it has to be resistance to malicious
abuse.

> One approach might be to use take a reader lock when mixing back
> entropy, but take an exclusive lock in the case when seeding the
> non-random pool.

Even a reader lock is at least one atomic operation.

> Another approach is to have an alternate mechanism for the
> non-blocking pool which uses the LOCK prefix when XOR'ing into the
> pool.  This would mean that we would have to drop the twisted LFSR and
> replace it with something simpler but so long as the mixing algothm
> eventually involves all of the bits in the pool, that will probably be
> OK.
>
> Of course, using the LOCK prefix is CPU architecture specific, and in
> particular, there is no equivalent on the ARM architecture.

You can add rather than XOR, and we have atomic add primitives.

(You could even, if you wanted to preserve the period proof of the
mixing function, compute the value which, when added to the original
word, would make the XOR change, and then atomically add that.)

> The final thing we could do is to just throw in a smp_mb() before the
> write into the pool, and just call it day, and accept that while we
> might lose a small amount of entropy due to a race, it will only a
> byte's worth each time we lose a race (instead of the whole 32 byte
> cache line).

If you're willing to accept "probably" solutions, just add the seed
material twice.  If one byte is lost, the other copy will probably
survive.

Or, if you want to be cleverer, any form of error-correcting code
(duplication is just a simple for of it) will add enough redundant
information that a few erasures won't lose entropy.

But the atomic add seems safer.  I don't have a good feel for the fraction
of lost entropy a deliberate attack on a large machine could cause and
/dev/urandom is not an area where I'm comfortable hoping.

> None of the alternatives are all that satisfying, and they will all be
> a bit more expensive than the first patch.  The question is how to
> weigh these problems against just simply using a separate pool for
> each NUMA node, and how much scalability are we really need for
> "realistic" workloads?

There are several possible solutions that don't need separate pools
(including separate add-back pools, with a shared seeded pool that
is never touched by add-back), so I don't think it's necessary to
give up yet.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/