lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKzfzOUPc+g0Brfzyi2efnXE0jLUebBz5fQMWVt9UCtfA@mail.gmail.com>
Date:   Sat, 1 Oct 2022 10:43:34 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Christophe Leroy <christophe.leroy@...roup.eu>,
        "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        David Dworken <ddworken@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: 126 ms irqsoff Latency - Possibly due to commit 190cc82489f4
 ("tcp: change source port randomizarion at connect() time")

On Sat, Oct 1, 2022 at 10:16 AM Christophe Leroy
<christophe.leroy@...roup.eu> wrote:
>
> Hi,
>
> With recent kernels I have a huge irqsoff latency in my boards, shortly
> after startup, from the call to net_get_random_once() in
> __inet_hash_connect().
>
> On a non instrumented kernel, IRQs are disabled during approximately 80
> milliseconds. With the traces in goes to 126 milliseconds.
>
> Was apparently introduced by commit 190cc82489f4 ("tcp: change source
> port randomizarion at connect() time")
>
> Trace below.
>
> Would there be a way to perform the call to get_random_bytes() without
> disabling IRQ ?

This looks a question for drivers/char/random.c maintainer, because we
do not block interrupts at this point in __inet_hash_connect()




>
> Thanks
> Christophe
>
> # tracer: irqsoff
> #
> # irqsoff latency trace v1.1.5 on 6.0.0-rc5-s3k-dev-02351-gebc95f69a7d4
> # --------------------------------------------------------------------
> # latency: 126337 us, #8207/8207, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
> #    -----------------
> #    | task: CORSurv-352 (uid:0 nice:0 policy:0 rt_prio:0)
> #    -----------------
> #  => started at: _raw_spin_lock_irqsave
> #  => ended at:   _raw_spin_unlock_irqrestore
> #
> #
> #                    _------=> CPU#
> #                   / _-----=> irqs-off/BH-disabled
> #                  | / _----=> need-resched
> #                  || / _---=> hardirq/softirq
> #                  ||| / _--=> preempt-depth
> #                  |||| / _-=> migrate-disable
> #                  ||||| /     delay
> #  cmd     pid     |||||| time  |   caller
> #     \   /        ||||||  \    |    /
>   CORSurv-352       0d....    4us : _raw_spin_lock_irqsave
>   CORSurv-352       0d....   13us+: preempt_count_add
> <-_raw_spin_lock_irqsave
>   CORSurv-352       0d..1.   25us+: do_raw_spin_lock
> <-_raw_spin_lock_irqsave
>   CORSurv-352       0d..1.   36us : get_random_bytes <-__inet_hash_connect
>   CORSurv-352       0d..1.   45us : _get_random_bytes.part.0
> <-__inet_hash_connect
>   CORSurv-352       0d..1.   55us : crng_make_state
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1.   65us+: ktime_get_seconds <-crng_make_state
>   CORSurv-352       0d..1.   77us+: crng_fast_key_erasure <-crng_make_state
>   CORSurv-352       0d..1.   89us+: chacha_block_generic
> <-crng_fast_key_erasure
>   CORSurv-352       0d..1.  101us+: chacha_permute <-chacha_block_generic
>   CORSurv-352       0d..1.  129us : chacha_block_generic
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1.  139us+: chacha_permute <-chacha_block_generic
>   CORSurv-352       0d..1.  160us : chacha_block_generic
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1.  170us+: chacha_permute <-chacha_block_generic
>   CORSurv-352       0d..1.  191us : chacha_block_generic
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1.  200us+: chacha_permute <-chacha_block_generic
>   CORSurv-352       0d..1.  221us : chacha_block_generic
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1.  231us+: chacha_permute <-chacha_block_generic
>
>         8182 x the above two line
>

It seems hard irqs are blocked for short periods, no worries here.

But perhaps your problem is a lack of cond_resched() in a long loop
(_get_random_bytes() I guess)

Problem is : I do not think _get_random_bytes() can always schedule,
we probably would need to add
extra parameters.

>   CORSurv-352       0d..1. 126275us : chacha_block_generic
> <-_get_random_bytes.part.0
>   CORSurv-352       0d..1. 126285us+: chacha_permute <-chacha_block_generic
>   CORSurv-352       0d..1. 126309us : _raw_spin_unlock_irqrestore
> <-__do_once_done
>   CORSurv-352       0d..1. 126318us+: do_raw_spin_unlock
> <-_raw_spin_unlock_irqrestore
>   CORSurv-352       0d..1. 126330us+: _raw_spin_unlock_irqrestore
>   CORSurv-352       0d..1. 126346us+: trace_hardirqs_on
> <-_raw_spin_unlock_irqrestore
>   CORSurv-352       0d..1. 126387us : <stack trace>
>   => tcp_v4_connect
>   => __inet_stream_connect
>   => inet_stream_connect
>   => __sys_connect
>   => system_call_exception
>   => ret_from_syscall

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ