lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1233223979.5294.41.camel@bugs-laptop>
Date:	Thu, 29 Jan 2009 11:12:59 +0100
From:	Thomas Pilarski <thomas.pi@...or.de>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Mike Galbraith <efault@....de>,
	Gregory Haskins <ghaskins@...ell.com>,
	bugme-daemon@...zilla.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [Bugme-new] [Bug 12562] New: High overhead while switching or
 synchronizing threads on different cores

> > There is a regression, because of the improved cpu switching. The
> > problem exists in every kernel. 
> 
> This is a contradiction in terms - twice.
> 
> If it is a regression, then clearly things haven't improved.
> 
> If it is a regression, state clearly when it worked last. If it never
> worked, it cannot be a regression.

There is a improvement in load balancing for single threaded
applications. It's a regression for my problem. But the problem exists
in every kernel I have tested.

> > I takes a lot of time to switch between the threads, when they are
> > executed on different cores.
> > Perhaps of the big buffer size of 512KB?
> 
> Of course, pushing 512kb to another cpu means lots and lots of cache
> misses.

I have tried 2.6.15, 2.6.18 and 2.6.20 too, but same behavior as in
2.6.24.
With Windows I can get 64 message every second with a buffer size of 512
KB. It is reduced to 16 messages with a buffer size of 1MB. But I think
it not really comparable, because there is nearby no cpu consumption
with 512kB. Perhaps random() works different. By increasing the cpu
usage eight times in the producer, I can get 16msg/s and both cores are
used about ~50%. Doing the same with linux I get a throughput of
~2msg/s. 

If it is a caching issue, shouldn't it exists in Windows too?

Using a smaller buffer of 4KB, the test is executed on one core only. 
./schedulerissue 1 4096 8 2000
All threads finished: 2000 messages in 1.631 seconds / 1226.076 msg/s
real	0m1.635s
user	0m1.352s
sys	0m0.052s


But I want to use both cores to increase the performance. Adding a
second producer and a second consumer reduces the performance to 33%.
Both cores are used.
./schedulerissue 2 4096 8 2000
All threads finished: 1999 messages in 4.744 seconds / 421.379 msg/s
real	0m4.748s
user	0m3.280s
sys	0m5.852s

I have added a new version as there was a possible deadlock during
shut-down.

View attachment "ThreadSchedulingIssue.c" of type "text/x-csrc" (9411 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ