lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ba345b74c19fec2c19f05d2887c6315b0855a75.camel@trillion01.com>
Date: Thu, 01 Aug 2024 19:05:34 -0400
From: Olivier Langlois <olivier@...llion01.com>
To: Pavel Begunkov <asml.silence@...il.com>, io-uring@...r.kernel.org
Cc: netdev@...r.kernel.org
Subject: Re: io_uring NAPI busy poll RCU is causing 50 context
 switches/second to my sqpoll thread

On Wed, 2024-07-31 at 02:00 +0100, Pavel Begunkov wrote:
> 
> I forgot to add, ~50 switches/second for relatively brief RCU
> handling
> is not much, not enough to take 50% of a CPU. I wonder if sqpoll was
> still running but napi busy polling time got accounted to softirq
> because of disabled bh and you didn't include it, hence asking CPU
> stats. Do you see any latency problems for that configuration?
> 
Pavel,

I am not sure if I will ever discover what this 50% CPU usage drop was
exactly.

when I did test
https://lore.kernel.org/io-uring/382791dc97d208d88ee31e5ebb5b661a0453fb79.1722374371.git.olivier@trillion01.com/T/#u

from this custom setup:
https://github.com/axboe/liburing/issues/1190#issuecomment-2258632731

iou-sqp task cpu usage went back to 100%...

there was also my busy_poll config numbers that were inadequate.

I went from:
echo 1000 > /sys/class/net/enp39s0/napi_defer_hard_irqs
echo 500 > /sys/class/net/enp39s0/gro_flush_timeout

to:
echo 5000 > /sys/class/net/enp39s0/napi_defer_hard_irqs
# gro_flush_timeout unit is nanoseconds
echo 100000 > /sys/class/net/enp39s0/gro_flush_timeout

ksoftirqd has stopped being awakening to service NET SOFTIRQS but I
would that this might not be the cause neither

I have no more latency issues. After a lot of efforts during the last 7
days, my system latency have improved by a good 10usec on average over
what it was last week...

but knowing that it can be even better is stopping me from letting
go...

the sporadic CPU1 interrupt can introduce a 27usec delay and this is
the difference between a win or a loss that is at stake...
https://lore.kernel.org/rcu/367dc07b740637f2ce0298c8f19f8aec0bdec123.camel@trillion01.com/T/#m5abf9aa02ec7648c615885a6f8ebdebc57935c35

I want to get rid of that interrupt so hard that is going to provide a
great satidfaction when I will have finally found the cause...


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ