lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 1 Nov 2011 12:14:30 +0800
From:	Zhu Yanhai <zhu.yanhai@...il.com>
To:	Henrique de Moraes Holschuh <hmh@....eng.br>
Cc:	"Artem S. Tashkinov" <t.artem@...os.com>,
	linux-kernel@...r.kernel.org
Subject: Re: HT (Hyper Threading) aware process scheduling doesn't work as it should

Hi,
I think the unbalance has got much better on mainline kernel than OS
vendor's, i.e. RHEL6.  Just in case you are interested, below is a
very simple test case I used before against NUMA + CFS group
scheduling extension. I have tested this on a dual-soket Xeon E5620
server.

cat bbb.c
int main()
{
    while(1)
    {
    };
}



cat run.sh

#!/bin/sh
count=0
pids=" "
while [ $count -lt 32 ]
do
	mkdir /cgroup/$count
	echo 1024 > /cgroup/$count/cpu.shares
	# taskset -c 4,5,6,7,12,13,14,15 ./bbb &
	./bbb &
	pid=`echo $!`
	echo $pid > /cgroup/$count/tasks
	pids=`echo $pids" "$pid`
	count=`expr $count + 1`
done
echo "for pid in $pids;do cat /proc/$pid/sched|grep
sum_exec_runtime;done" > show.sh
watch -n1 sh show.sh


Since one e5620 with HT enabled has 8 logical cpus, this dual-socket
box has 16 logical cpus in total. The above test script starts 32
processes, so the intuitively guess is each two of them will run on
one logical cpu. However it's not for current RHEL6 kernel, top shows
that they are keeping migrating and often unbalanced, sometimes worse
and sometimes better. If you watch it for a long time, you may find
sometimes one process occupy the whole logical cpu for a moment, and
several process (far more than 2) congest on a single cpu slot.
Also the 'watch' output shows that the sum_exec_runtime is almost the
same of them, so it seems that the RHEL6 kernel is trying to move a
lucky guy to a free cpu slot, make it hold that position for a while,
then move the next lucky guy there and kick off the previous one to a
crowded slot, which is not a good policy for such totally Independent
processes.
And on the mainline kernel(3.0.0+), they run much more balanced that
above, although I can't identify which commits made this.

--
Regards,
Zhu Yanhai


2011/10/31 Henrique de Moraes Holschuh <hmh@....eng.br>:
> On Mon, 31 Oct 2011, Artem S. Tashkinov wrote:
>> > On Oct 31, 2011, Henrique de Moraes Holschuh  wrote:
>> >
>> > On Sun, 30 Oct 2011, Artem S. Tashkinov wrote:
>> > > > Please make sure both are set to 0.  If they were not 0 at the time you
>> > > > ran your tests, please retest and report back.
>> > >
>> > > That's 0 & 0 for me.
>> >
>> > How idle is your system during the test?
>>
>> load average: 0.00, 0.00, 0.00
>
> I believe cpuidle will interfere with the scheduling in that case.  Could
> you run your test with higher loads (start with one, and go up to eight
> tasks that are CPU-hogs, measuring each step)?
>
>> I have to insist that people conduct this test on their own without trusting my
>> words. Probably there's something I overlook or don't fully understand but from
>
> What you should attempt to do is to give us a reproducible test case.  A
> shell script or C/perl/python/whatever program that when run clearly shows
> the problem you're complaining about on your system.  Failing that, a very
> detailed description (read: step by step) of how you're testing things.
>
> I can't see anything wrong in my X5550 workstation (4 cores, 8 threads,
> single processor, i.e. not NUMA) running 3.0.8.
>
>> what I see, there's a serious issue here (at least Microsoft XP and 7 work exactly
>
> So far it looks like that, since your system is almost entirely idle, it
> could be trying to minimize task-run latency by scheduling work to the few
> cores/threads that are not in deep sleep (they take time to wake up, are
> often cache-cold, etc).
>
> Please use tools/power/x86/turbostat to track core usage and idle-states
> instead of top/htop.  That might give you better information, and I
> think you will appreciate getting to know that tool.  Note: turbostat
> reports *averages* for each thread.
>
> --
>  "One disk to rule them all, One disk to find them. One disk to bring
>  them all and in the darkness grind them. In the Land of Redmond
>  where the shadows lie." -- The Silicon Valley Tarot
>  Henrique Holschuh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists