lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com>
Date:   Sat, 1 Oct 2022 10:58:46 -0700
From:   Eric Dumazet <edumazet@...gle.com>
To:     Christophe Leroy <christophe.leroy@...roup.eu>,
        "Jason A. Donenfeld" <Jason@...c4.com>
Cc:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        David Dworken <ddworken@...gle.com>,
        Willem de Bruijn <willemb@...gle.com>,
        "David S. Miller" <davem@...emloft.net>
Subject: Re: 126 ms irqsoff Latency - Possibly due to commit 190cc82489f4
 ("tcp: change source port randomizarion at connect() time")

On Sat, Oct 1, 2022 at 10:43 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Sat, Oct 1, 2022 at 10:16 AM Christophe Leroy
> <christophe.leroy@...roup.eu> wrote:
> >
> > Hi,
> >
> > With recent kernels I have a huge irqsoff latency in my boards, shortly
> > after startup, from the call to net_get_random_once() in
> > __inet_hash_connect().
> >
> > On a non instrumented kernel, IRQs are disabled during approximately 80
> > milliseconds. With the traces in goes to 126 milliseconds.
> >
> > Was apparently introduced by commit 190cc82489f4 ("tcp: change source
> > port randomizarion at connect() time")
> >
> > Trace below.
> >
> > Would there be a way to perform the call to get_random_bytes() without
> > disabling IRQ ?
>
> This looks a question for drivers/char/random.c maintainer, because we
> do not block interrupts at this point in __inet_hash_connect()

Oh well, this is probably coming from __do_once_start() / __do_once_done()

We need something better for process contexts...


>
>
>
>
> >
> > Thanks
> > Christophe
> >
> > # tracer: irqsoff
> > #
> > # irqsoff latency trace v1.1.5 on 6.0.0-rc5-s3k-dev-02351-gebc95f69a7d4
> > # --------------------------------------------------------------------
> > # latency: 126337 us, #8207/8207, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0)
> > #    -----------------
> > #    | task: CORSurv-352 (uid:0 nice:0 policy:0 rt_prio:0)
> > #    -----------------
> > #  => started at: _raw_spin_lock_irqsave
> > #  => ended at:   _raw_spin_unlock_irqrestore
> > #
> > #
> > #                    _------=> CPU#
> > #                   / _-----=> irqs-off/BH-disabled
> > #                  | / _----=> need-resched
> > #                  || / _---=> hardirq/softirq
> > #                  ||| / _--=> preempt-depth
> > #                  |||| / _-=> migrate-disable
> > #                  ||||| /     delay
> > #  cmd     pid     |||||| time  |   caller
> > #     \   /        ||||||  \    |    /
> >   CORSurv-352       0d....    4us : _raw_spin_lock_irqsave
> >   CORSurv-352       0d....   13us+: preempt_count_add
> > <-_raw_spin_lock_irqsave
> >   CORSurv-352       0d..1.   25us+: do_raw_spin_lock
> > <-_raw_spin_lock_irqsave
> >   CORSurv-352       0d..1.   36us : get_random_bytes <-__inet_hash_connect
> >   CORSurv-352       0d..1.   45us : _get_random_bytes.part.0
> > <-__inet_hash_connect
> >   CORSurv-352       0d..1.   55us : crng_make_state
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1.   65us+: ktime_get_seconds <-crng_make_state
> >   CORSurv-352       0d..1.   77us+: crng_fast_key_erasure <-crng_make_state
> >   CORSurv-352       0d..1.   89us+: chacha_block_generic
> > <-crng_fast_key_erasure
> >   CORSurv-352       0d..1.  101us+: chacha_permute <-chacha_block_generic
> >   CORSurv-352       0d..1.  129us : chacha_block_generic
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1.  139us+: chacha_permute <-chacha_block_generic
> >   CORSurv-352       0d..1.  160us : chacha_block_generic
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1.  170us+: chacha_permute <-chacha_block_generic
> >   CORSurv-352       0d..1.  191us : chacha_block_generic
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1.  200us+: chacha_permute <-chacha_block_generic
> >   CORSurv-352       0d..1.  221us : chacha_block_generic
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1.  231us+: chacha_permute <-chacha_block_generic
> >
> >         8182 x the above two line
> >
>
> It seems hard irqs are blocked for short periods, no worries here.
>
> But perhaps your problem is a lack of cond_resched() in a long loop
> (_get_random_bytes() I guess)
>
> Problem is : I do not think _get_random_bytes() can always schedule,
> we probably would need to add
> extra parameters.
>
> >   CORSurv-352       0d..1. 126275us : chacha_block_generic
> > <-_get_random_bytes.part.0
> >   CORSurv-352       0d..1. 126285us+: chacha_permute <-chacha_block_generic
> >   CORSurv-352       0d..1. 126309us : _raw_spin_unlock_irqrestore
> > <-__do_once_done
> >   CORSurv-352       0d..1. 126318us+: do_raw_spin_unlock
> > <-_raw_spin_unlock_irqrestore
> >   CORSurv-352       0d..1. 126330us+: _raw_spin_unlock_irqrestore
> >   CORSurv-352       0d..1. 126346us+: trace_hardirqs_on
> > <-_raw_spin_unlock_irqrestore
> >   CORSurv-352       0d..1. 126387us : <stack trace>
> >   => tcp_v4_connect
> >   => __inet_stream_connect
> >   => inet_stream_connect
> >   => __sys_connect
> >   => system_call_exception
> >   => ret_from_syscall

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ