linux-kernel - [sh4][2.6.17] latency peaks with unix sockets on heavy loads

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Date:	Mon, 22 Sep 2008 18:02:40 +0200
From:	guillaume ranquet <guillaume.ranquet@...-organisation.net>
To:	linux-kernel@...r.kernel.org
Subject: [sh4][2.6.17] latency peaks with unix sockets on heavy loads

I'm experiencing little glitches when I try to send/receive datas
with local sockets under heavy loads.
 under normal load it behaves normally, but with load increasing, I
get latency peaks once every second.
 an Image is worth a thousand words:
 http://img255.imageshack.us/img255/3700/capplottsy7.th.png
 X: time elapsed from beginning of execution
 Y: call latency
 red: under heavy load
 green: no load at all
 those peeks of 200ms really disturbs me as I'm using the sockets for
RPC calls and 200ms (and far more with load increasing) is really too
much.
 I've been testing various things
 enabling/disabling kernel preempt : no effects
 active waiting (doing some stuff that consumes cpu) between RPC
calls: no effects (even worse)
 non blocking sockets doesn't show any improvements and never yield
for a EWOULDBLOCK
 setting policy to SCHED_FIFO solves the problem:
http://img47.imageshack.us/img47/4449/capplotschedfifoxh5.th.png
 also, adding a usleep(0) between each call (still with SCHED_NORMAL
policy) removes the peaks
 from my understanding, usleep(0) puts the task in sleeping mode
until the next TICK is emitted and may cause a context switch if
there's another runnable task
 sched_yield()'ing once every 1000 calls helps also greatly (some
peaks still appears here and there though)

 upgrading to 2.6.23:
 h00rray it solves everything:
http://img371.imageshack.us/img371/7028/capplotkernel2623lldnl6.th.png
 still the mean time is a bit higher and provokes a 30% overhead at
running the test
 but my problem is that I can't upgrade my kernel (yet) and need to
find a solution on 2.6.17
 I couldn't reproduce the behavior of the 2.6.17 with the 2.6.23, no
matter the kernel config
 what has changed and could impact on that 'glitch' between the 2
kernels:
 -lock classes of AF_UNIX domain has became bh-unsafe :: seems out of
suspicion since the peaks hasn't shown up with AF_INET sockets
 -scheduler for SCHED_NORMAL tasks has been completely rewritten ::
seems to be guilty of the new (improved?)  behavior

 is that a known bug of the pre-CFS scheduler?
am I totally wrong and should not blame the scheduler?

is there a solution with 2.6.17 and SHED_NORMAL?

ps: since I'm not subscribed (my e-mail account can't handle the traffic),
would you please CC me?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/