lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110610181719.GA30330@linux.vnet.ibm.com>
Date:	Fri, 10 Jun 2011 23:47:20 +0530
From:	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>
To:	Paul Turner <pjt@...gle.com>
Cc:	Vladimir Davydov <vdavydov@...allels.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Bharata B Rao <bharata@...ux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@...il.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Pavel Emelianov <xemul@...allels.com>
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs
 unpinned

* Paul Turner <pjt@...gle.com> [2011-06-08 20:25:00]:

> Hi Kamalesh,
> 
> I'm unable to reproduce the results you describe.  One possibility is
> load-balancer interaction -- can you describe the topology of the
> platform you are running this on?
> 
> On both a straight NUMA topology and a hyper-threaded platform I
> observe a ~4% delta between the pinned and un-pinned cases.
> 
> Thanks -- results below,
> 
> - Paul
> 
> 
(snip)

Hi Paul,

That box is down. I tried running the test on the 2-socket quad-core with 
HT and I was not able to reproduce the issue. CPU idle time reported with 
both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy 
of 3 levels above the 5 cgroups, instead of the current hirerachy where all
the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket 
quad-core (HT) box.

				-----------
				| cgroups |
				-----------
				     |
				-----------
				| level 1 |
				-----------
				     |
				-----------
				| level 2 |
				-----------
				     |
				-----------
				| level 3 |
				-----------
			      /   /   |   \     \
			     /	 /    |    \     \
			cgrp1  cgrp2 cgrp3 cgrp4 cgrp5


Un-pinned run
--------------

Average CPU Idle percentage 24.8333%
Bandwidth shared with remaining non-Idle 75.1667%
Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 1/1	= 49.9900	i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 1/2	= 50.0000	i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
 
 
Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 2/1	= 49.9900	i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 2/2	= 50.0000	i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
 
 
Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
|...... subgroup 3/1	= 25.0000	i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/2	= 24.9100	i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/3	= 25.0800	i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/4	= 24.9900	i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
 
 
Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
|...... subgroup 4/1	= 12.0200	i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/2	= 12.3800	i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/3	= 13.6300	i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/4	= 12.7000	i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/5	= 12.8000	i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/6	= 11.9600	i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/7	= 12.7400	i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/8	= 11.7300	i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
 
 
Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
|...... subgroup 5/1	= 47.7200	i.e = 13.3500%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/2	= 5.2000	i.e = 1.4500% 	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/3	= 6.3600	i.e = 1.7700% 	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/4	= 6.3600	i.e = 1.7700%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/5	= 7.9800	i.e = 2.2300%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/6	= 5.1800	i.e = 1.4400%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/7	= 7.4900	i.e = 2.0900%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/8	= 5.9200	i.e = 1.6500%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/9	= 7.7500	i.e = 2.1600%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/10	= 4.8100	i.e = 1.3400%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/11	= 4.9300	i.e = 1.3700%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/12	= 6.8900	i.e = 1.9200%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/13	= 6.0700	i.e = 1.6900%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/14	= 6.5200	i.e = 1.8200%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/15	= 5.9200	i.e = 1.6500%	of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/16	= 6.6400	i.e = 1.8500% 	of 27.9800% Groups non-Idle CPU time

Pinned Run
----------

Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 1/1	= 50.0100	i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 1/2	= 49.9800	i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
 
 
Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 2/1	= 50.0000	i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 2/2	= 49.9900	i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
 
 
Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
|...... subgroup 3/1	= 25.0100	i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/2	= 25.0000	i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/3	= 24.9900	i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/4	= 24.9900	i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
 
 
Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
|...... subgroup 4/1	= 12.5100	i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/2	= 12.5000	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/3	= 12.5000	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/4	= 12.5000	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/5	= 12.4900	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/6	= 12.4900	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/7	= 12.4900	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/8	= 12.4800	i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
 
 
Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
|...... subgroup 5/1	= 49.9600	i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/2	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/3	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/4	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/5	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/6	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/7	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/8	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/9	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/10	= 6.2500	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/11	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/12	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/13	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/14	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/15	= 6.2300	i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/16	= 6.2400	i.e = 3.1100% of 49.8800% Groups non-Idle CPU time

Modified script
---------------

#!/bin/bash

NR_TASKS1=2
NR_TASKS2=2
NR_TASKS3=4
NR_TASKS4=8
NR_TASKS5=16

BANDWIDTH=1
SUBGROUP=1
PRO_SHARES=0
MOUNT_POINT=/cgroups/
MOUNT=/cgroups/
LOAD=./while1
LEVELS=3

usage()
{
	echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
	echo "-b 1|0 set/unset  Cgroups bandwidth control (default set)"
	echo "-s Create sub-groups for every task (default creates sub-group)"
	echo "-p create propotional shares based on cpus"
	exit
}
while getopts ":b:s:p:" arg
do
	case $arg in
	b)
		BANDWIDTH=$OPTARG
		shift
		if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt  0 ]
		then
			usage
		fi
		;;
	s)
		SUBGROUP=$OPTARG
		shift
		if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
		then
			usage
		fi
		;;
	p)
		PRO_SHARES=$OPTARG
		shift
		if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
		then
			usage
		fi
		;;

	*)

	esac
done
if [ ! -d $MOUNT ]
then
	mkdir -p $MOUNT
fi
test()
{
	echo -n "[ "
	if [ $1 -eq 0 ]
	then
		echo -ne '\E[42;40mOk'
	else
		echo -ne '\E[31;40mFailed'
		tput sgr0
		echo " ]"
		exit
	fi
	tput sgr0
	echo " ]"
}
mount_cgrp()
{
	echo -n "Mounting root cgroup "
	mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
	test $?
}

umount_cgrp()
{
	echo -n "Unmounting root cgroup "
	cd /root/
	umount $MOUNT_POINT
	test $?
}

create_hierarchy()
{
	mount_cgrp
	cpuset_mem=`cat $MOUNT/cpuset.mems`
	cpuset_cpu=`cat $MOUNT/cpuset.cpus`
	echo -n "creating hierarchy of levels $LEVELS "
	for (( i=1; i<=$LEVELS; i++ ))
	do
		MOUNT="${MOUNT}/level${i}"
		mkdir $MOUNT
		echo $cpuset_mem > $MOUNT/cpuset.mems
		echo $cpuset_cpu > $MOUNT/cpuset.cpus
		echo "-1" > $MOUNT/cpu.cfs_quota_us
		echo "500000" > $MOUNT/cpu.cfs_period_us
		echo -n " .."
	done
	echo " "
	echo $MOUNT
	echo -n "creating groups/sub-groups ..."
	for (( i=1; i<=5; i++ ))
	do
		mkdir $MOUNT/$i
		echo $cpuset_mem > $MOUNT/$i/cpuset.mems
		echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
		echo -n ".."
		if [ $SUBGROUP -eq 1 ]
		then
			jj=$(eval echo "\$NR_TASKS$i")
			for (( j=1; j<=$jj; j++ ))
			do
				mkdir -p $MOUNT/$i/$j
				echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
				echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
				echo -n ".."
			done
		fi
	done
	echo "."
}

cleanup()
{
	pkill -9 while1 &> /dev/null
	sleep 10
	echo -n "Umount groups/sub-groups .."
	for (( i=1; i<=5; i++ ))
	do
		if [ $SUBGROUP -eq 1 ]
		then
			jj=$(eval echo "\$NR_TASKS$i")
			for (( j=1; j<=$jj; j++ ))
			do
				rmdir $MOUNT/$i/$j
				echo -n ".."
			done
		fi
		rmdir $MOUNT/$i
		echo -n ".."
	done
	cd $MOUNT
	cd ../
	for (( i=$LEVELS; i>=1; i-- ))
	do
		rmdir level$i
		cd ../	
	done
	echo " "
	umount_cgrp
}

load_tasks()
{
	for (( i=1; i<=5; i++ ))
	do
		jj=$(eval echo "\$NR_TASKS$i")
		shares="1024"
		if [ $PRO_SHARES -eq 1 ]
		then
			eval shares=$(echo "$jj * 1024" | bc)
		fi
		echo $shares > $MOUNT/$i/cpu.shares
		for (( j=1; j<=$jj; j++ ))
		do
			echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
			echo "500000" > $MOUNT/$i/cpu.cfs_period_us
			if [ $SUBGROUP -eq 1 ]
			then

				$LOAD &
				echo $! > $MOUNT/$i/$j/tasks
				echo "1024" > $MOUNT/$i/$j/cpu.shares

				if [ $BANDWIDTH -eq 1 ]
				then
					echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
					echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
				fi
			else
				$LOAD & 
				echo $! > $MOUNT/$i/tasks
				echo $shares > $MOUNT/$i/cpu.shares

				if [ $BANDWIDTH -eq 1 ]
				then
					echo "500000" > $MOUNT/$i/cpu.cfs_period_us
					echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
				fi
			fi
		done
	done
	echo "Capturing idle cpu time with vmstat...."
	vmstat 2 100 &> vmstat_log &
}

pin_tasks()
{
	cpu=0
	count=1
	for (( i=1; i<=5; i++ ))
	do
		if [ $SUBGROUP -eq 1 ]
		then
			jj=$(eval echo "\$NR_TASKS$i")
			for (( j=1; j<=$jj; j++ ))
			do
				if [ $count -gt 2 ]
				then
					cpu=$((cpu+1))
					count=1
				fi
				echo $cpu > $MOUNT/$i/$j/cpuset.cpus
				count=$((count+1))
			done
		else
			case $i in
			1)
				echo 0 > $MOUNT/$i/cpuset.cpus;;
			2)
				echo 1 > $MOUNT/$i/cpuset.cpus;;
			3)
				echo "2-3" > $MOUNT/$i/cpuset.cpus;;
			4)
				echo "4-6" > $MOUNT/$i/cpuset.cpus;;
			5)
				echo "7-15" > $MOUNT/$i/cpuset.cpus;;
			esac
		fi
	done
	
}

print_results()
{
	eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
	for (( i=1; i<=5; i++ ))	
	do
		eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
		eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
		eval avg=$(echo  "scale=4;($temp / $gtot) * 100" | bc)
		eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
		echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
		if [ $SUBGROUP -eq 1 ]
		then
			jj=$(eval echo "\$NR_TASKS$i")
			for (( j=1; j<=$jj; j++ ))
			do
				eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
				eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
				eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
				echo -n "|"
				echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
			done
		fi
		echo " "
		echo " "
	done
}

capture_results()
{
	cat /proc/sched_debug > sched_log
	lev=""
	for (( i=1; i<=$LEVELS; i++ ))
	do
		lev="$lev\/level${i}"	
	done
	pkill -9 vmstat 
	avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')

	rem=$(echo "scale=2; 100 - $avg" |bc)
	echo "Average CPU Idle percentage $avg%"	
	echo "Bandwidth shared with remaining non-Idle $rem%" 
	for (( i=1; i<=5; i++ ))
	do
		cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
		if [ $SUBGROUP -eq 1 ]
		then
			jj=$(eval echo "\$NR_TASKS$i")
			for (( j=1; j<=$jj; j++ ))
			do
				cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
			done
		fi
	done
	print_results $rem
}

create_hierarchy
pin_tasks

load_tasks
sleep 60
capture_results
cleanup
exit

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ