lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1401081082.5339.41.camel@marge.simpson.net>
Date:	Mon, 26 May 2014 07:11:22 +0200
From:	Mike Galbraith <umgwanakikbuti@...il.com>
To:	Libo Chen <libo.chen@...wei.com>
Cc:	tglx@...utronix.de, mingo@...e.hu,
	LKML <linux-kernel@...r.kernel.org>,
	Greg KH <gregkh@...uxfoundation.org>,
	Li Zefan <lizefan@...wei.com>
Subject: Re: balance storm

On Mon, 2014-05-26 at 11:04 +0800, Libo Chen wrote: 
> hi,
>     my box has 16 cpu (E5-2658,8 core, 2 thread per core), i did a test on
> 3.4.24stable, startup 50 same process, every process is sample:
> 
>  	#include <unistd.h>
> 
>  	int main()
>  	{
>           	for (;;)
>           	{
>                   	unsigned int i = 0;
>                  	 while (i< 100){
>                      	 i++;
>                   	}
>                   	usleep(100);
>           	}
> 
>          	 return 0;
>   	}
> 
> the result is process uses 15% cpu time, perf tool shows 70w migrations in 5 second.

See e0a79f52 sched: Fix select_idle_sibling() bouncing cow syndrome

That commit will fix expensive as hell bouncing for most real loads, but
it won't fix your test.  Doing nothing but wake, select_idle_sibling()
will be traversing all cores/siblings mightily, taking L2 misses as it
traverses, bouncing wakees that do _nothing_ when an idle CPU is found.

Your synthetic test is the absolute worst case scenario.  There has to
be work between wakeups for select_idle_sibling() to have any chance
whatsoever of turning in a win.  At 0 work, it becomes 100% overhead.

> I guess task migration takes up a lot of cpu, so i did another test. use taskset tool to bind
> a task to a fixed cpu. Results in line with expectations, cpu usage is down to 5%.
> 
> other test:
> - 3.15upstream has the same problem with 3.4.24.
> - suse sp2 has low cpu usage about 5%.

SLE11-SP2 has a patch which fixes that behavior, but of course at the
expense of other load types.  A trade.  It also throttles nohz, which
can have substantial cost when cross CPU scheduling.

> so I think 15% cpu usage and migration event are too high, how to fixed?

You can't for free, low latency wakeup can be worth one hell of a lot.

You could do a decayed hit/miss or such to shut the thing off when the
price is just too high.  Restricting migrations per unit time per task
also helps cut the cost, but hurts tasks that could have gotten to the
CPU quicker, and started your next bit of work.  Anything you do there
is going to be a rob Peter to pay Paul thing.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ