linux-kernel - Re: Performance loss 2.6.22->22.6.23->2.6.24-rc7 on CPU intensive benchmark on 8 Core Xeon

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20080116153505.GB18553@elte.hu>
Date:	Wed, 16 Jan 2008 16:35:05 +0100
From:	Ingo Molnar <mingo@...e.hu>
To:	Colin Fowler <elethiomel@...il.com>
Cc:	linux-kernel@...r.kernel.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: Performance loss 2.6.22->22.6.23->2.6.24-rc7 on CPU intensive
	benchmark on 8 Core Xeon


* Colin Fowler <elethiomel@...il.com> wrote:

> > and context-switches 45K times a second. Do you know what is going 
> > on there? I thought ray-tracing is something that can be 
> > parallelized pretty efficiently, without having to contend and 
> > schedule too much.
> 
> This is a RTRT (real-time ray tracing) system and as a result differs 
> from traditional offline ray-tracers as it is optimised for speed. The 
> benchmark I ran while these data were collected renders an 80K polygon 
> scene to a 512x512 buffer at just over 100fps.
> 
> The context switches are most likely caused by the pthreads 
> synchronisation code. There are two mutexs. Each job is a 32x32 tile 
> and each mutex is therefore unlocked (512/32) * (512/32) * 100 (for 
> 100fps) * 2 =~50k. There's very likely where our context switches are 
> coming from. Larger tile sizes would of course reduce the locking 
> overhead, but then the ray-tracer suffers form load imbalance as some 
> tiles are much quicker to render than others. Empircally we've found 
> that this tile-size works the best for us.
> 
> The CPU idling occurs as the system doesn't yet perform asynchronous 
> rendering. When all tiles in a current job queue are finished the 
> current frame is done. At this point all worker threads sleep while 
> the master thread blits the image to the screen and fills the job 
> queue for the next frame. The data probably shows that one CPU is kept 
> maxed and the others reach about 90% most of the time. This is 
> something on my TODO list to fix along with a myriad of other 
> optimisations :)

is this something i could run myself and see how it behaves with various 
scheduler settings? (if yes, where can i download it and is there any 
sample scene that would show similar effects.)

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/