[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cf3edd8d0801151505r54bfd9d0r89f57e3835f295e1@mail.gmail.com>
Date: Tue, 15 Jan 2008 23:05:17 +0000
From: "Colin Fowler" <elethiomel@...il.com>
To: "Ingo Molnar" <mingo@...e.hu>
Cc: linux-kernel@...r.kernel.org,
"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Subject: Re: Performance loss 2.6.22->22.6.23->2.6.24-rc7 on CPU intensive benchmark on 8 Core Xeon
Hi Ingo,
I'll get the results tomorrow as I'm now out of the office, but
I can perhaps answer some of your queries now.
On Jan 15, 2008 10:06 PM, Ingo Molnar <mingo@...e.hu> wrote:
> hm, the system has considerable idle time left:
>
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 8 0 0 1201920 683840 1039100 0 0 3 2 27 46 1 0 99 0
> 2 0 0 1202168 683840 1039112 0 0 0 0 245 45339 80 2 17 0
> 2 0 0 1202168 683840 1039112 0 0 0 0 263 47349 84 3 14 0
> 2 0 0 1202300 683848 1039112 0 0 0 76 255 47057 84 3 13 0
>
> and context-switches 45K times a second. Do you know what is going on
> there? I thought ray-tracing is something that can be parallelized
> pretty efficiently, without having to contend and schedule too much.
>
This is a RTRT (real-time ray tracing) system and as a result differs
from traditional offline ray-tracers as it is optimised for speed. The
benchmark I ran while these data were collected renders an 80K polygon
scene to a 512x512 buffer at just over 100fps.
The context switches are most likely caused by the pthreads
synchronisation code. There are two mutexs. Each job is a 32x32 tile
and each mutex is therefore unlocked (512/32) * (512/32) * 100 (for
100fps) * 2 =~50k. There's very likely where our context switches are
coming from. Larger tile sizes would of course reduce the locking
overhead, but then the ray-tracer suffers form load imbalance as some
tiles are much quicker to render than others. Empircally we've found
that this tile-size works the best for us.
The CPU idling occurs as the system doesn't yet perform asynchronous
rendering. When all tiles in a current job queue are finished the
current frame is done. At this point all worker threads sleep while
the master thread blits the image to the screen and fills the job
queue for the next frame. The data probably shows that one CPU is kept
maxed and the others reach about 90% most of the time. This is
something on my TODO list to fix along with a myriad of other
optimisations :)
regards,
Colin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists