lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <F1D642E5A91BC94D9504EC83AE19E6B0128B8D@ecqcmtlmail3.quebec.int.ec.gc.ca>
Date:	Wed, 5 Sep 2007 11:48:22 -0400
From:	"Fortier,Vincent [Montreal]" <Vincent.Fortier1@...GC.CA>
To:	"Eric Dumazet" <dada1@...mosbay.com>
Cc:	<linux-kernel@...r.kernel.org>, "Ingo Molnar" <mingo@...e.hu>
Subject: RE: [question] IPC queue filling-up problem?

> -----Message d'origine-----
> De : linux-kernel-owner@...r.kernel.org 
> [mailto:linux-kernel-owner@...r.kernel.org] De la part de Eric Dumazet
> 
> On Wed, 5 Sep 2007 09:37:50 -0400
> "Fortier,Vincent [Montreal]" <Vincent.Fortier1@...GC.CA> wrote:
> 
> > 
> > Hi all,
> > 
> > We are testing new hardware and planning a switch from our old
redhat
> > 7.3 to a Debian Etch 4.0 for our radar forecast analysis systems.
We 
> > found out that our main IPC dispatcher software module would use
100% 
> > of a CPU all the time and that the IPC queues would fill up quickly
on a
> > 2.6 kernel.  We first tought that it would be a problem of 
> > compatibility between a 2.4 vs 2.6 IPC calls vs our radar analysis 
> > software but after a lot of work we have been able to test a 2.4 
> > kernel on that same hardware and got the exact same problem.
> > 
> > So curiously, on our actual systems (see SYSTEM 1 below) our IPC 
> > dispatcher module works like a charm and queues gets near 0.  On our

> > test system which are way more powerfull systems (see SYSTEM 2) our 
> > IPC dispatcher module queue fills up rapidly (depending of the
msgmnb 
> > queue size it will wimply take a bit longer to fill).
> > 
> > We have tested both our already compiled binaries from rh73 using
gcc
> > 2.9 and a recompiled version of the modules on a debian sarge system

> > and got the exact same problem on either a Debian Sarge 3.1 (running
a 
> > 2.4 or 2.6 kernel) and on a Etch 64bit system (using 32bit compat 
> > layer) with a 2.6 kernel.  In all cases the queues would simply
fill-up.
> > 
> > After strac'ing the module I noticed that the time needed to handle 
> > the signal & ipc calls are way lower on the new system hence I don't

> > see why the dispatcher queue does fill-up like that?!?!?!
> > 
> > Does anyone experienced something similar?  Could this be a kernel 
> > issue vs material, kernel option?  Might this be related to libc?
> > 
> > Help / Clues very much appreciated.
> 
> Hi Vincent
> 
> top shows that something is eating cpu cycles in User mode on 
> your new platform, while old platform consumes cycles both in 
> User and System land.
> 
> This might be related to some programing error, maybe some 
> spinlock in user mode or bad multi-threading synchronization, 
> or scheduling assumptions, that break because of the quad 
> core cpus of your new machine.

Actually, could this be worth trying (adding Ingo in CC):
http://lkml.org/lkml/2007/9/5/75

> So the thread that is supposed to consume IPC messages is not 
> scheduled in time, because CPU starves. (beware the four 
> cores of each CPU compete for ressources)
> 
> You could issue "ps auxm" to check which threads are spining 
> in User mode and try to trace them ?

Effectively in this specific test case I had 5 stuck process using each
100% of a CPU... although 3 other cores where still available so there
is (I believe) no reason why it should had starved that much.

Anyhow, that did not happend on all the other testing I made during the
past 3 weeks (except this one).

I restarted the this test (again using a 2.4.35.1 kernel), mde sure no
process where stuck :), and grabbed the ps aux + ipcs -q info (attached)

Again, the ipcs -q shows that the queue is getting full comparing to
SYSTEM 1 which always has a queue of 0.

Note: Also attached a top.txt file showing that the dispatcher uses 100%
of a CPU on SYSTEM 2.  This never occurs on SYSTEM 1.

> Eric

PS, thnx for replying.

- vin

> > SYSTEM INFORMATION:
> > 
> > SYSTEM 1:
> > ---------
> > HPDL580 G2
> > Quad Intel Xeon 1.90GHz
> > 4G ram
> > DRBD disks on dual-gigabit adapter
> > OS: RedHat 7.3 / kernel: 2.4.33 / libc: 2.2.5 / gcc 2.96
> > 
> > SYSTEM 2:
> > ---------
> > Dell PE2950
> > Dual Intel Quad-Core 2.66GHz
> > 16G ram
> > local 300G 15000 RPM SCSI.
> > OS1: Debian Etch 4.0 / kernels 2.6.18 -> 2.6.22 / libc 2.3.6 / gcc
4.1.2
> > OS2: Debian Sarge 3.1 / kernels 2.4.35, 2.6.18 -> 2.6.22 / libc
2.3.2 / gcc 3.3.5

View attachment "ipcs-q.txt" of type "text/plain" (2133 bytes)

View attachment "ps.auxm.txt" of type "text/plain" (16811 bytes)

View attachment "top.txt" of type "text/plain" (3551 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ