[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CANn89iKTzqBsphc1=6fDbTH=gmA2nVfZTGObCFETfpg03gqb7A@mail.gmail.com>
Date: Sat, 1 Oct 2022 15:37:09 -0700
From: Eric Dumazet <edumazet@...gle.com>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: Christophe Leroy <christophe.leroy@...roup.eu>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
David Dworken <ddworken@...gle.com>,
Willem de Bruijn <willemb@...gle.com>,
"David S. Miller" <davem@...emloft.net>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: 126 ms irqsoff Latency - Possibly due to commit 190cc82489f4
("tcp: change source port randomizarion at connect() time")
On Sat, Oct 1, 2022 at 3:34 PM Jason A. Donenfeld <Jason@...c4.com> wrote:
>
> Hi again,
>
> Actually, ignore everything I said before. I looked more closely at the
> trace, and this seems like a bogus report. Let me explain:
>
> The part of the trace that concerns my last email is tiny:
>
> CORSurv-352 0d..1. 36us : get_random_bytes <-__inet_hash_connect
> CORSurv-352 0d..1. 45us : _get_random_bytes.part.0 <-__inet_hash_connect
> CORSurv-352 0d..1. 55us : crng_make_state <-_get_random_bytes.part.0
> CORSurv-352 0d..1. 65us+: ktime_get_seconds <-crng_make_state
> CORSurv-352 0d..1. 77us+: crng_fast_key_erasure <-crng_make_state
> CORSurv-352 0d..1. 89us+: chacha_block_generic <-crng_fast_key_erasure
> CORSurv-352 0d..1. 101us+: chacha_permute <-chacha_block_generic
>
> After those lines, crng_make_state() returns back into
> _get_random_bytes(), where _get_random_bytes() proceeds to call chacha20
> totally unlocked, having released all interrupts:
>
> CORSurv-352 0d..1. 129us : chacha_block_generic <-_get_random_bytes.part.0
> CORSurv-352 0d..1. 139us+: chacha_permute <-chacha_block_generic
> ...
> CORSurv-352 0d..1. 126275us : chacha_block_generic <-_get_random_bytes.part.0
> CORSurv-352 0d..1. 126285us+: chacha_permute <-chacha_block_generic
>
> I guess it's generating a lot of blocks, and this is a slow board?
> Either way, no interrupts are held here, and no locks either.
>
> But then let's zoom out to see if we can figure out what is disabling
> IRQs. This time, pasting from the top and the bottom of the stack trace,
> rather than from the middle:
>
> CORSurv-352 0d.... 4us : _raw_spin_lock_irqsave
> ...
> CORSurv-352 0d..1. 126309us : _raw_spin_unlock_irqrestore <-__do_once_done
> CORSurv-352 0d..1. 126318us+: do_raw_spin_unlock <-_raw_spin_unlock_irqrestore
> CORSurv-352 0d..1. 126330us+: _raw_spin_unlock_irqrestore
> CORSurv-352 0d..1. 126346us+: trace_hardirqs_on <-_raw_spin_unlock_irqrestore
>
> Oh, hello hello __do_once_done(). Let's have a look at you:
>
> bool __do_once_start(bool *done, unsigned long *flags)
> __acquires(once_lock)
> {
> spin_lock_irqsave(&once_lock, *flags);
> if (*done) {
> spin_unlock_irqrestore(&once_lock, *flags);
> /* Keep sparse happy by restoring an even lock count on
> * this lock. In case we return here, we don't call into
> * __do_once_done but return early in the DO_ONCE() macro.
> */
> __acquire(once_lock);
> return false;
> }
>
> return true;
> }
> EXPORT_SYMBOL(__do_once_start);
>
> void __do_once_done(bool *done, struct static_key_true *once_key,
> unsigned long *flags, struct module *mod)
> __releases(once_lock)
> {
> *done = true;
> spin_unlock_irqrestore(&once_lock, *flags);
> once_disable_jump(once_key, mod);
> }
> EXPORT_SYMBOL(__do_once_done);
>
> Well then! It looks like DO_ONCE() takes an irqsave spinlock. So, as far
> as get_random_bytes() is concerned, interrupts are not being held
> abnormally long. This is something having to do with the code that's
> calling into it. So... doesn't seem like an RNG issue?
I guess you did not read my prior email sent hours ago :)
https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#m43fa0bac40dfda59c2b72cb3e844b41f3cdb949d
Powered by blists - more mailing lists