lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 20 Mar 2023 11:46:48 -0700
From:   Jakub Kicinski <kuba@...nel.org>
To:     Jesper Dangaard Brouer <jbrouer@...hat.com>
Cc:     Jason Xing <kerneljasonxing@...il.com>, brouer@...hat.com,
        davem@...emloft.net, edumazet@...gle.com, pabeni@...hat.com,
        ast@...nel.org, daniel@...earbox.net, hawk@...nel.org,
        john.fastabend@...il.com, stephen@...workplumber.org,
        simon.horman@...igine.com, sinquersw@...il.com,
        bpf@...r.kernel.org, netdev@...r.kernel.org,
        Jason Xing <kernelxing@...cent.com>
Subject: Re: [PATCH v4 net-next 2/2] net: introduce budget_squeeze to help
 us tune rx behavior

On Mon, 20 Mar 2023 14:30:27 +0100 Jesper Dangaard Brouer wrote:
> >> So if you want to monitor a meaningful event in your fleet, I think
> >> a better event to monitor is the number of times ksoftirqd was woken
> >> up and latency of it getting onto the CPU.  
> > 
> > It's a good point. Thanks for your advice.  
> 
> I'm willing to help you out writing a BPF-based tool that can help you
> identify the issue Jakub describe above. Of high latency from when
> softIRQ is raised until softIRQ processing runs on the CPU.
> 
> I have this bpftrace script[1] available that does just that:
> 
>   [1] 
> https://github.com/xdp-project/xdp-project/blob/master/areas/latency/softirq_net_latency.bt
> 
> Perhaps you can take the latency historgrams and then plot a heatmap[2]
> in your monitoring platform.
> 
>   [2] https://www.brendangregg.com/heatmaps.html

FWIW we have this little kludge of code in prod kernels:

https://github.com/kuba-moo/linux/commit/e09006bc08847a218276486817a84e38e82841a6

it tries to measure the latency from xmit to napi reaping completions.
So it covers both NICs IRQs being busted and the noise introduced by 
the scheduler. Not great, those should really be separate.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ