[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <F1D642E5A91BC94D9504EC83AE19E6B0128B8D@ecqcmtlmail3.quebec.int.ec.gc.ca>
Date: Wed, 5 Sep 2007 11:48:22 -0400
From: "Fortier,Vincent [Montreal]" <Vincent.Fortier1@...GC.CA>
To: "Eric Dumazet" <dada1@...mosbay.com>
Cc: <linux-kernel@...r.kernel.org>, "Ingo Molnar" <mingo@...e.hu>
Subject: RE: [question] IPC queue filling-up problem?
> -----Message d'origine-----
> De : linux-kernel-owner@...r.kernel.org
> [mailto:linux-kernel-owner@...r.kernel.org] De la part de Eric Dumazet
>
> On Wed, 5 Sep 2007 09:37:50 -0400
> "Fortier,Vincent [Montreal]" <Vincent.Fortier1@...GC.CA> wrote:
>
> >
> > Hi all,
> >
> > We are testing new hardware and planning a switch from our old
redhat
> > 7.3 to a Debian Etch 4.0 for our radar forecast analysis systems.
We
> > found out that our main IPC dispatcher software module would use
100%
> > of a CPU all the time and that the IPC queues would fill up quickly
on a
> > 2.6 kernel. We first tought that it would be a problem of
> > compatibility between a 2.4 vs 2.6 IPC calls vs our radar analysis
> > software but after a lot of work we have been able to test a 2.4
> > kernel on that same hardware and got the exact same problem.
> >
> > So curiously, on our actual systems (see SYSTEM 1 below) our IPC
> > dispatcher module works like a charm and queues gets near 0. On our
> > test system which are way more powerfull systems (see SYSTEM 2) our
> > IPC dispatcher module queue fills up rapidly (depending of the
msgmnb
> > queue size it will wimply take a bit longer to fill).
> >
> > We have tested both our already compiled binaries from rh73 using
gcc
> > 2.9 and a recompiled version of the modules on a debian sarge system
> > and got the exact same problem on either a Debian Sarge 3.1 (running
a
> > 2.4 or 2.6 kernel) and on a Etch 64bit system (using 32bit compat
> > layer) with a 2.6 kernel. In all cases the queues would simply
fill-up.
> >
> > After strac'ing the module I noticed that the time needed to handle
> > the signal & ipc calls are way lower on the new system hence I don't
> > see why the dispatcher queue does fill-up like that?!?!?!
> >
> > Does anyone experienced something similar? Could this be a kernel
> > issue vs material, kernel option? Might this be related to libc?
> >
> > Help / Clues very much appreciated.
>
> Hi Vincent
>
> top shows that something is eating cpu cycles in User mode on
> your new platform, while old platform consumes cycles both in
> User and System land.
>
> This might be related to some programing error, maybe some
> spinlock in user mode or bad multi-threading synchronization,
> or scheduling assumptions, that break because of the quad
> core cpus of your new machine.
Actually, could this be worth trying (adding Ingo in CC):
http://lkml.org/lkml/2007/9/5/75
> So the thread that is supposed to consume IPC messages is not
> scheduled in time, because CPU starves. (beware the four
> cores of each CPU compete for ressources)
>
> You could issue "ps auxm" to check which threads are spining
> in User mode and try to trace them ?
Effectively in this specific test case I had 5 stuck process using each
100% of a CPU... although 3 other cores where still available so there
is (I believe) no reason why it should had starved that much.
Anyhow, that did not happend on all the other testing I made during the
past 3 weeks (except this one).
I restarted the this test (again using a 2.4.35.1 kernel), mde sure no
process where stuck :), and grabbed the ps aux + ipcs -q info (attached)
Again, the ipcs -q shows that the queue is getting full comparing to
SYSTEM 1 which always has a queue of 0.
Note: Also attached a top.txt file showing that the dispatcher uses 100%
of a CPU on SYSTEM 2. This never occurs on SYSTEM 1.
> Eric
PS, thnx for replying.
- vin
> > SYSTEM INFORMATION:
> >
> > SYSTEM 1:
> > ---------
> > HPDL580 G2
> > Quad Intel Xeon 1.90GHz
> > 4G ram
> > DRBD disks on dual-gigabit adapter
> > OS: RedHat 7.3 / kernel: 2.4.33 / libc: 2.2.5 / gcc 2.96
> >
> > SYSTEM 2:
> > ---------
> > Dell PE2950
> > Dual Intel Quad-Core 2.66GHz
> > 16G ram
> > local 300G 15000 RPM SCSI.
> > OS1: Debian Etch 4.0 / kernels 2.6.18 -> 2.6.22 / libc 2.3.6 / gcc
4.1.2
> > OS2: Debian Sarge 3.1 / kernels 2.4.35, 2.6.18 -> 2.6.22 / libc
2.3.2 / gcc 3.3.5
View attachment "ipcs-q.txt" of type "text/plain" (2133 bytes)
View attachment "ps.auxm.txt" of type "text/plain" (16811 bytes)
View attachment "top.txt" of type "text/plain" (3551 bytes)
Powered by blists - more mailing lists