linux-kernel - Re: Performance loss 2.6.22->22.6.23->2.6.24-rc7 on CPU intensive benchmark on 8 Core Xeon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Wed, 16 Jan 2008 16:10:29 +0000
From:	"Colin Fowler" <elethiomel@...il.com>
To:	"Ingo Molnar" <mingo@...e.hu>
Cc:	linux-kernel@...r.kernel.org,
	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Subject: Re: Performance loss 2.6.22->22.6.23->2.6.24-rc7 on CPU intensive benchmark on 8 Core Xeon

Hi Ingo, I'll need to convince my supervisor first if I can release a
binary. Technically anythin glike this needs to go through our
University's "innovations department" and requires lengthy paperwork
and NDAs :(.

regards,
        Colin

On Jan 16, 2008 3:35 PM, Ingo Molnar <mingo@...e.hu> wrote:
>
> * Colin Fowler <elethiomel@...il.com> wrote:
>
>
> > > and context-switches 45K times a second. Do you know what is going
> > > on there? I thought ray-tracing is something that can be
> > > parallelized pretty efficiently, without having to contend and
> > > schedule too much.
> >
> > This is a RTRT (real-time ray tracing) system and as a result differs
> > from traditional offline ray-tracers as it is optimised for speed. The
> > benchmark I ran while these data were collected renders an 80K polygon
> > scene to a 512x512 buffer at just over 100fps.
> >
> > The context switches are most likely caused by the pthreads
> > synchronisation code. There are two mutexs. Each job is a 32x32 tile
> > and each mutex is therefore unlocked (512/32) * (512/32) * 100 (for
> > 100fps) * 2 =~50k. There's very likely where our context switches are
> > coming from. Larger tile sizes would of course reduce the locking
> > overhead, but then the ray-tracer suffers form load imbalance as some
> > tiles are much quicker to render than others. Empircally we've found
> > that this tile-size works the best for us.
> >
> > The CPU idling occurs as the system doesn't yet perform asynchronous
> > rendering. When all tiles in a current job queue are finished the
> > current frame is done. At this point all worker threads sleep while
> > the master thread blits the image to the screen and fills the job
> > queue for the next frame. The data probably shows that one CPU is kept
> > maxed and the others reach about 90% most of the time. This is
> > something on my TODO list to fix along with a myriad of other
> > optimisations :)
>
> is this something i could run myself and see how it behaves with various
> scheduler settings? (if yes, where can i download it and is there any
> sample scene that would show similar effects.)
>
>         Ingo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/