[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimE1b8HP-q4jgsv5jPD5S-dRoUi_g@mail.gmail.com>
Date: Mon, 13 Jun 2011 17:00:08 -0700
From: Paul Turner <pjt@...gle.com>
To: Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>
Cc: Vladimir Davydov <vdavydov@...allels.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Bharata B Rao <bharata@...ux.vnet.ibm.com>,
Dhaval Giani <dhaval.giani@...il.com>,
Balbir Singh <balbir@...ux.vnet.ibm.com>,
Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
Srivatsa Vaddagiri <vatsa@...ibm.com>,
Ingo Molnar <mingo@...e.hu>,
Pavel Emelianov <xemul@...allels.com>
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned
Hi Kamalesh.
I tried on both friday and again today to reproduce your results
without success. Results are attached below. The margin of error is
the same as the previous (2-level deep case), ~4%. One minor nit, in
your script's input parsing you're calling shift; you don't need to do
this with getopts and it will actually lead to arguments being
dropped.
Are you testing on top of a clean -tip? Do you have any custom
load-balancer or scheduler settings?
Thanks,
- Paul
Hyper-threaded topology:
unpinned:
Average CPU Idle percentage 38.6333%
Bandwidth shared with remaining non-Idle 61.3667%
pinned:
Average CPU Idle percentage 35.2766%
Bandwidth shared with remaining non-Idle 64.7234%
(The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
mirror your 2 socket 8x2 configuration.)
4-way NUMA topology:
unpinned:
Average CPU Idle percentage 5.26667%
Bandwidth shared with remaining non-Idle 94.73333%
pinned:
Average CPU Idle percentage 0.242424%
Bandwidth shared with remaining non-Idle 99.757576%
On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
<kamalesh@...ux.vnet.ibm.com> wrote:
> * Paul Turner <pjt@...gle.com> [2011-06-08 20:25:00]:
>
>> Hi Kamalesh,
>>
>> I'm unable to reproduce the results you describe. One possibility is
>> load-balancer interaction -- can you describe the topology of the
>> platform you are running this on?
>>
>> On both a straight NUMA topology and a hyper-threaded platform I
>> observe a ~4% delta between the pinned and un-pinned cases.
>>
>> Thanks -- results below,
>>
>> - Paul
>>
>>
> (snip)
>
> Hi Paul,
>
> That box is down. I tried running the test on the 2-socket quad-core with
> HT and I was not able to reproduce the issue. CPU idle time reported with
> both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
> of 3 levels above the 5 cgroups, instead of the current hirerachy where all
> the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
> quad-core (HT) box.
>
> -----------
> | cgroups |
> -----------
> |
> -----------
> | level 1 |
> -----------
> |
> -----------
> | level 2 |
> -----------
> |
> -----------
> | level 3 |
> -----------
> / / | \ \
> / / | \ \
> cgrp1 cgrp2 cgrp3 cgrp4 cgrp5
>
>
> Un-pinned run
> --------------
>
> Average CPU Idle percentage 24.8333%
> Bandwidth shared with remaining non-Idle 75.1667%
> Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 1/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 1/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 2/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 2/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
> |...... subgroup 3/1 = 25.0000 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/2 = 24.9100 i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/3 = 25.0800 i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9900 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
> |...... subgroup 4/1 = 12.0200 i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/2 = 12.3800 i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/3 = 13.6300 i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.7000 i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.8000 i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/6 = 11.9600 i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.7400 i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/8 = 11.7300 i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
>
>
> Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
> |...... subgroup 5/1 = 47.7200 i.e = 13.3500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/2 = 5.2000 i.e = 1.4500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/4 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/5 = 7.9800 i.e = 2.2300% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/6 = 5.1800 i.e = 1.4400% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/7 = 7.4900 i.e = 2.0900% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/8 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/9 = 7.7500 i.e = 2.1600% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/10 = 4.8100 i.e = 1.3400% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/11 = 4.9300 i.e = 1.3700% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/12 = 6.8900 i.e = 1.9200% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/13 = 6.0700 i.e = 1.6900% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/14 = 6.5200 i.e = 1.8200% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/15 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/16 = 6.6400 i.e = 1.8500% of 27.9800% Groups non-Idle CPU time
>
> Pinned Run
> ----------
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
> Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 1/1 = 50.0100 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 1/2 = 49.9800 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 2/1 = 50.0000 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
> |...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/4 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
> |...... subgroup 4/1 = 12.5100 i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/2 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/3 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/4 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/5 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
> |...... subgroup 5/1 = 49.9600 i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/2 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/3 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/4 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/7 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/8 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/9 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/10 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/11 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/12 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/13 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/14 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/15 = 6.2300 i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/16 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
>
> Modified script
> ---------------
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT_POINT=/cgroups/
> MOUNT=/cgroups/
> LOAD=./while1
> LEVELS=3
>
> usage()
> {
> echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
> echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
> echo "-s Create sub-groups for every task (default creates sub-group)"
> echo "-p create propotional shares based on cpus"
> exit
> }
> while getopts ":b:s:p:" arg
> do
> case $arg in
> b)
> BANDWIDTH=$OPTARG
> shift
> if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
> then
> usage
> fi
> ;;
> s)
> SUBGROUP=$OPTARG
> shift
> if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
> then
> usage
> fi
> ;;
> p)
> PRO_SHARES=$OPTARG
> shift
> if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
> then
> usage
> fi
> ;;
>
> *)
>
> esac
> done
> if [ ! -d $MOUNT ]
> then
> mkdir -p $MOUNT
> fi
> test()
> {
> echo -n "[ "
> if [ $1 -eq 0 ]
> then
> echo -ne '\E[42;40mOk'
> else
> echo -ne '\E[31;40mFailed'
> tput sgr0
> echo " ]"
> exit
> fi
> tput sgr0
> echo " ]"
> }
> mount_cgrp()
> {
> echo -n "Mounting root cgroup "
> mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
> test $?
> }
>
> umount_cgrp()
> {
> echo -n "Unmounting root cgroup "
> cd /root/
> umount $MOUNT_POINT
> test $?
> }
>
> create_hierarchy()
> {
> mount_cgrp
> cpuset_mem=`cat $MOUNT/cpuset.mems`
> cpuset_cpu=`cat $MOUNT/cpuset.cpus`
> echo -n "creating hierarchy of levels $LEVELS "
> for (( i=1; i<=$LEVELS; i++ ))
> do
> MOUNT="${MOUNT}/level${i}"
> mkdir $MOUNT
> echo $cpuset_mem > $MOUNT/cpuset.mems
> echo $cpuset_cpu > $MOUNT/cpuset.cpus
> echo "-1" > $MOUNT/cpu.cfs_quota_us
> echo "500000" > $MOUNT/cpu.cfs_period_us
> echo -n " .."
> done
> echo " "
> echo $MOUNT
> echo -n "creating groups/sub-groups ..."
> for (( i=1; i<=5; i++ ))
> do
> mkdir $MOUNT/$i
> echo $cpuset_mem > $MOUNT/$i/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
> echo -n ".."
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> mkdir -p $MOUNT/$i/$j
> echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
> echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
> echo -n ".."
> done
> fi
> done
> echo "."
> }
>
> cleanup()
> {
> pkill -9 while1 &> /dev/null
> sleep 10
> echo -n "Umount groups/sub-groups .."
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> rmdir $MOUNT/$i/$j
> echo -n ".."
> done
> fi
> rmdir $MOUNT/$i
> echo -n ".."
> done
> cd $MOUNT
> cd ../
> for (( i=$LEVELS; i>=1; i-- ))
> do
> rmdir level$i
> cd ../
> done
> echo " "
> umount_cgrp
> }
>
> load_tasks()
> {
> for (( i=1; i<=5; i++ ))
> do
> jj=$(eval echo "\$NR_TASKS$i")
> shares="1024"
> if [ $PRO_SHARES -eq 1 ]
> then
> eval shares=$(echo "$jj * 1024" | bc)
> fi
> echo $shares > $MOUNT/$i/cpu.shares
> for (( j=1; j<=$jj; j++ ))
> do
> echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> if [ $SUBGROUP -eq 1 ]
> then
>
> $LOAD &
> echo $! > $MOUNT/$i/$j/tasks
> echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> fi
> else
> $LOAD &
> echo $! > $MOUNT/$i/tasks
> echo $shares > $MOUNT/$i/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> fi
> fi
> done
> done
> echo "Capturing idle cpu time with vmstat...."
> vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
> cpu=0
> count=1
> for (( i=1; i<=5; i++ ))
> do
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> if [ $count -gt 2 ]
> then
> cpu=$((cpu+1))
> count=1
> fi
> echo $cpu > $MOUNT/$i/$j/cpuset.cpus
> count=$((count+1))
> done
> else
> case $i in
> 1)
> echo 0 > $MOUNT/$i/cpuset.cpus;;
> 2)
> echo 1 > $MOUNT/$i/cpuset.cpus;;
> 3)
> echo "2-3" > $MOUNT/$i/cpuset.cpus;;
> 4)
> echo "4-6" > $MOUNT/$i/cpuset.cpus;;
> 5)
> echo "7-15" > $MOUNT/$i/cpuset.cpus;;
> esac
> fi
> done
>
> }
>
> print_results()
> {
> eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
> for (( i=1; i<=5; i++ ))
> do
> eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
> eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
> eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
> echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
> eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
> echo -n "|"
> echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
> done
> fi
> echo " "
> echo " "
> done
> }
>
> capture_results()
> {
> cat /proc/sched_debug > sched_log
> lev=""
> for (( i=1; i<=$LEVELS; i++ ))
> do
> lev="$lev\/level${i}"
> done
> pkill -9 vmstat
> avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')
>
> rem=$(echo "scale=2; 100 - $avg" |bc)
> echo "Average CPU Idle percentage $avg%"
> echo "Bandwidth shared with remaining non-Idle $rem%"
> for (( i=1; i<=5; i++ ))
> do
> cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
> if [ $SUBGROUP -eq 1 ]
> then
> jj=$(eval echo "\$NR_TASKS$i")
> for (( j=1; j<=$jj; j++ ))
> do
> cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
> done
> fi
> done
> print_results $rem
> }
>
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists