[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YzjAfdip8giWBF4+@zx2c4.com>
Date: Sun, 2 Oct 2022 00:34:37 +0200
From: "Jason A. Donenfeld" <Jason@...c4.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Christophe Leroy <christophe.leroy@...roup.eu>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
David Dworken <ddworken@...gle.com>,
Willem de Bruijn <willemb@...gle.com>,
"David S. Miller" <davem@...emloft.net>, bigeasy@...utronix.de
Subject: Re: 126 ms irqsoff Latency - Possibly due to commit 190cc82489f4
("tcp: change source port randomizarion at connect() time")
Hi again,
Actually, ignore everything I said before. I looked more closely at the
trace, and this seems like a bogus report. Let me explain:
The part of the trace that concerns my last email is tiny:
CORSurv-352 0d..1. 36us : get_random_bytes <-__inet_hash_connect
CORSurv-352 0d..1. 45us : _get_random_bytes.part.0 <-__inet_hash_connect
CORSurv-352 0d..1. 55us : crng_make_state <-_get_random_bytes.part.0
CORSurv-352 0d..1. 65us+: ktime_get_seconds <-crng_make_state
CORSurv-352 0d..1. 77us+: crng_fast_key_erasure <-crng_make_state
CORSurv-352 0d..1. 89us+: chacha_block_generic <-crng_fast_key_erasure
CORSurv-352 0d..1. 101us+: chacha_permute <-chacha_block_generic
After those lines, crng_make_state() returns back into
_get_random_bytes(), where _get_random_bytes() proceeds to call chacha20
totally unlocked, having released all interrupts:
CORSurv-352 0d..1. 129us : chacha_block_generic <-_get_random_bytes.part.0
CORSurv-352 0d..1. 139us+: chacha_permute <-chacha_block_generic
...
CORSurv-352 0d..1. 126275us : chacha_block_generic <-_get_random_bytes.part.0
CORSurv-352 0d..1. 126285us+: chacha_permute <-chacha_block_generic
I guess it's generating a lot of blocks, and this is a slow board?
Either way, no interrupts are held here, and no locks either.
But then let's zoom out to see if we can figure out what is disabling
IRQs. This time, pasting from the top and the bottom of the stack trace,
rather than from the middle:
CORSurv-352 0d.... 4us : _raw_spin_lock_irqsave
...
CORSurv-352 0d..1. 126309us : _raw_spin_unlock_irqrestore <-__do_once_done
CORSurv-352 0d..1. 126318us+: do_raw_spin_unlock <-_raw_spin_unlock_irqrestore
CORSurv-352 0d..1. 126330us+: _raw_spin_unlock_irqrestore
CORSurv-352 0d..1. 126346us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
Oh, hello hello __do_once_done(). Let's have a look at you:
bool __do_once_start(bool *done, unsigned long *flags)
__acquires(once_lock)
{
spin_lock_irqsave(&once_lock, *flags);
if (*done) {
spin_unlock_irqrestore(&once_lock, *flags);
/* Keep sparse happy by restoring an even lock count on
* this lock. In case we return here, we don't call into
* __do_once_done but return early in the DO_ONCE() macro.
*/
__acquire(once_lock);
return false;
}
return true;
}
EXPORT_SYMBOL(__do_once_start);
void __do_once_done(bool *done, struct static_key_true *once_key,
unsigned long *flags, struct module *mod)
__releases(once_lock)
{
*done = true;
spin_unlock_irqrestore(&once_lock, *flags);
once_disable_jump(once_key, mod);
}
EXPORT_SYMBOL(__do_once_done);
Well then! It looks like DO_ONCE() takes an irqsave spinlock. So, as far
as get_random_bytes() is concerned, interrupts are not being held
abnormally long. This is something having to do with the code that's
calling into it. So... doesn't seem like an RNG issue?
Jason
Powered by blists - more mailing lists