lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimE1b8HP-q4jgsv5jPD5S-dRoUi_g@mail.gmail.com>
Date:	Mon, 13 Jun 2011 17:00:08 -0700
From:	Paul Turner <pjt@...gle.com>
To:	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>
Cc:	Vladimir Davydov <vdavydov@...allels.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Bharata B Rao <bharata@...ux.vnet.ibm.com>,
	Dhaval Giani <dhaval.giani@...il.com>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Ingo Molnar <mingo@...e.hu>,
	Pavel Emelianov <xemul@...allels.com>
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

Hi Kamalesh.

I tried on both friday and again today to reproduce your results
without success.  Results are attached below.  The margin of error is
the same as the previous (2-level deep case), ~4%.  One minor nit, in
your script's input parsing you're calling shift; you don't need to do
this with getopts and it will actually lead to arguments being
dropped.

Are you testing on top of a clean -tip?  Do you have any custom
load-balancer or scheduler settings?

Thanks,

- Paul


Hyper-threaded topology:
unpinned:
Average CPU Idle percentage 38.6333%
Bandwidth shared with remaining non-Idle 61.3667%

pinned:
Average CPU Idle percentage 35.2766%
Bandwidth shared with remaining non-Idle 64.7234%
(The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
mirror your 2 socket 8x2 configuration.)

4-way NUMA topology:
unpinned:
Average CPU Idle percentage 5.26667%
Bandwidth shared with remaining non-Idle 94.73333%

pinned:
Average CPU Idle percentage 0.242424%
Bandwidth shared with remaining non-Idle 99.757576%




On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
<kamalesh@...ux.vnet.ibm.com> wrote:
> * Paul Turner <pjt@...gle.com> [2011-06-08 20:25:00]:
>
>> Hi Kamalesh,
>>
>> I'm unable to reproduce the results you describe.  One possibility is
>> load-balancer interaction -- can you describe the topology of the
>> platform you are running this on?
>>
>> On both a straight NUMA topology and a hyper-threaded platform I
>> observe a ~4% delta between the pinned and un-pinned cases.
>>
>> Thanks -- results below,
>>
>> - Paul
>>
>>
> (snip)
>
> Hi Paul,
>
> That box is down. I tried running the test on the 2-socket quad-core with
> HT and I was not able to reproduce the issue. CPU idle time reported with
> both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
> of 3 levels above the 5 cgroups, instead of the current hirerachy where all
> the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
> quad-core (HT) box.
>
>                                -----------
>                                | cgroups |
>                                -----------
>                                     |
>                                -----------
>                                | level 1 |
>                                -----------
>                                     |
>                                -----------
>                                | level 2 |
>                                -----------
>                                     |
>                                -----------
>                                | level 3 |
>                                -----------
>                              /   /   |   \     \
>                             /   /    |    \     \
>                        cgrp1  cgrp2 cgrp3 cgrp4 cgrp5
>
>
> Un-pinned run
> --------------
>
> Average CPU Idle percentage 24.8333%
> Bandwidth shared with remaining non-Idle 75.1667%
> Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 1/1    = 49.9900       i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 1/2    = 50.0000       i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 2/1    = 49.9900       i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 2/2    = 50.0000       i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
> |...... subgroup 3/1    = 25.0000       i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/2    = 24.9100       i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/3    = 25.0800       i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/4    = 24.9900       i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
> |...... subgroup 4/1    = 12.0200       i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/2    = 12.3800       i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/3    = 13.6300       i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/4    = 12.7000       i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/5    = 12.8000       i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/6    = 11.9600       i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/7    = 12.7400       i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/8    = 11.7300       i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
>
>
> Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
> |...... subgroup 5/1    = 47.7200       i.e = 13.3500%  of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/2    = 5.2000        i.e = 1.4500%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/3    = 6.3600        i.e = 1.7700%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/4    = 6.3600        i.e = 1.7700%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/5    = 7.9800        i.e = 2.2300%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/6    = 5.1800        i.e = 1.4400%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/7    = 7.4900        i.e = 2.0900%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/8    = 5.9200        i.e = 1.6500%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/9    = 7.7500        i.e = 2.1600%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/10   = 4.8100        i.e = 1.3400%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/11   = 4.9300        i.e = 1.3700%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/12   = 6.8900        i.e = 1.9200%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/13   = 6.0700        i.e = 1.6900%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/14   = 6.5200        i.e = 1.8200%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/15   = 5.9200        i.e = 1.6500%   of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/16   = 6.6400        i.e = 1.8500%   of 27.9800% Groups non-Idle CPU time
>
> Pinned Run
> ----------
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
> Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 1/1    = 50.0100       i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 1/2    = 49.9800       i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 2/1    = 50.0000       i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 2/2    = 49.9900       i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
> |...... subgroup 3/1    = 25.0100       i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/2    = 25.0000       i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/3    = 24.9900       i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/4    = 24.9900       i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
> |...... subgroup 4/1    = 12.5100       i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/2    = 12.5000       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/3    = 12.5000       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/4    = 12.5000       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/5    = 12.4900       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/6    = 12.4900       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/7    = 12.4900       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/8    = 12.4800       i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
> |...... subgroup 5/1    = 49.9600       i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/2    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/3    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/4    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/5    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/6    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/7    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/8    = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/9    = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/10   = 6.2500        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/11   = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/12   = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/13   = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/14   = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/15   = 6.2300        i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/16   = 6.2400        i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
>
> Modified script
> ---------------
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT_POINT=/cgroups/
> MOUNT=/cgroups/
> LOAD=./while1
> LEVELS=3
>
> usage()
> {
>        echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
>        echo "-b 1|0 set/unset  Cgroups bandwidth control (default set)"
>        echo "-s Create sub-groups for every task (default creates sub-group)"
>        echo "-p create propotional shares based on cpus"
>        exit
> }
> while getopts ":b:s:p:" arg
> do
>        case $arg in
>        b)
>                BANDWIDTH=$OPTARG
>                shift
>                if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt  0 ]
>                then
>                        usage
>                fi
>                ;;
>        s)
>                SUBGROUP=$OPTARG
>                shift
>                if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
>                then
>                        usage
>                fi
>                ;;
>        p)
>                PRO_SHARES=$OPTARG
>                shift
>                if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
>                then
>                        usage
>                fi
>                ;;
>
>        *)
>
>        esac
> done
> if [ ! -d $MOUNT ]
> then
>        mkdir -p $MOUNT
> fi
> test()
> {
>        echo -n "[ "
>        if [ $1 -eq 0 ]
>        then
>                echo -ne '\E[42;40mOk'
>        else
>                echo -ne '\E[31;40mFailed'
>                tput sgr0
>                echo " ]"
>                exit
>        fi
>        tput sgr0
>        echo " ]"
> }
> mount_cgrp()
> {
>        echo -n "Mounting root cgroup "
>        mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
>        test $?
> }
>
> umount_cgrp()
> {
>        echo -n "Unmounting root cgroup "
>        cd /root/
>        umount $MOUNT_POINT
>        test $?
> }
>
> create_hierarchy()
> {
>        mount_cgrp
>        cpuset_mem=`cat $MOUNT/cpuset.mems`
>        cpuset_cpu=`cat $MOUNT/cpuset.cpus`
>        echo -n "creating hierarchy of levels $LEVELS "
>        for (( i=1; i<=$LEVELS; i++ ))
>        do
>                MOUNT="${MOUNT}/level${i}"
>                mkdir $MOUNT
>                echo $cpuset_mem > $MOUNT/cpuset.mems
>                echo $cpuset_cpu > $MOUNT/cpuset.cpus
>                echo "-1" > $MOUNT/cpu.cfs_quota_us
>                echo "500000" > $MOUNT/cpu.cfs_period_us
>                echo -n " .."
>        done
>        echo " "
>        echo $MOUNT
>        echo -n "creating groups/sub-groups ..."
>        for (( i=1; i<=5; i++ ))
>        do
>                mkdir $MOUNT/$i
>                echo $cpuset_mem > $MOUNT/$i/cpuset.mems
>                echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
>                echo -n ".."
>                if [ $SUBGROUP -eq 1 ]
>                then
>                        jj=$(eval echo "\$NR_TASKS$i")
>                        for (( j=1; j<=$jj; j++ ))
>                        do
>                                mkdir -p $MOUNT/$i/$j
>                                echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
>                                echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
>                                echo -n ".."
>                        done
>                fi
>        done
>        echo "."
> }
>
> cleanup()
> {
>        pkill -9 while1 &> /dev/null
>        sleep 10
>        echo -n "Umount groups/sub-groups .."
>        for (( i=1; i<=5; i++ ))
>        do
>                if [ $SUBGROUP -eq 1 ]
>                then
>                        jj=$(eval echo "\$NR_TASKS$i")
>                        for (( j=1; j<=$jj; j++ ))
>                        do
>                                rmdir $MOUNT/$i/$j
>                                echo -n ".."
>                        done
>                fi
>                rmdir $MOUNT/$i
>                echo -n ".."
>        done
>        cd $MOUNT
>        cd ../
>        for (( i=$LEVELS; i>=1; i-- ))
>        do
>                rmdir level$i
>                cd ../
>        done
>        echo " "
>        umount_cgrp
> }
>
> load_tasks()
> {
>        for (( i=1; i<=5; i++ ))
>        do
>                jj=$(eval echo "\$NR_TASKS$i")
>                shares="1024"
>                if [ $PRO_SHARES -eq 1 ]
>                then
>                        eval shares=$(echo "$jj * 1024" | bc)
>                fi
>                echo $shares > $MOUNT/$i/cpu.shares
>                for (( j=1; j<=$jj; j++ ))
>                do
>                        echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
>                        echo "500000" > $MOUNT/$i/cpu.cfs_period_us
>                        if [ $SUBGROUP -eq 1 ]
>                        then
>
>                                $LOAD &
>                                echo $! > $MOUNT/$i/$j/tasks
>                                echo "1024" > $MOUNT/$i/$j/cpu.shares
>
>                                if [ $BANDWIDTH -eq 1 ]
>                                then
>                                        echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
>                                        echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
>                                fi
>                        else
>                                $LOAD &
>                                echo $! > $MOUNT/$i/tasks
>                                echo $shares > $MOUNT/$i/cpu.shares
>
>                                if [ $BANDWIDTH -eq 1 ]
>                                then
>                                        echo "500000" > $MOUNT/$i/cpu.cfs_period_us
>                                        echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
>                                fi
>                        fi
>                done
>        done
>        echo "Capturing idle cpu time with vmstat...."
>        vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
>        cpu=0
>        count=1
>        for (( i=1; i<=5; i++ ))
>        do
>                if [ $SUBGROUP -eq 1 ]
>                then
>                        jj=$(eval echo "\$NR_TASKS$i")
>                        for (( j=1; j<=$jj; j++ ))
>                        do
>                                if [ $count -gt 2 ]
>                                then
>                                        cpu=$((cpu+1))
>                                        count=1
>                                fi
>                                echo $cpu > $MOUNT/$i/$j/cpuset.cpus
>                                count=$((count+1))
>                        done
>                else
>                        case $i in
>                        1)
>                                echo 0 > $MOUNT/$i/cpuset.cpus;;
>                        2)
>                                echo 1 > $MOUNT/$i/cpuset.cpus;;
>                        3)
>                                echo "2-3" > $MOUNT/$i/cpuset.cpus;;
>                        4)
>                                echo "4-6" > $MOUNT/$i/cpuset.cpus;;
>                        5)
>                                echo "7-15" > $MOUNT/$i/cpuset.cpus;;
>                        esac
>                fi
>        done
>
> }
>
> print_results()
> {
>        eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
>        for (( i=1; i<=5; i++ ))
>        do
>                eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
>                eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
>                eval avg=$(echo  "scale=4;($temp / $gtot) * 100" | bc)
>                eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
>                echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
>                if [ $SUBGROUP -eq 1 ]
>                then
>                        jj=$(eval echo "\$NR_TASKS$i")
>                        for (( j=1; j<=$jj; j++ ))
>                        do
>                                eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
>                                eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
>                                eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
>                                echo -n "|"
>                                echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
>                        done
>                fi
>                echo " "
>                echo " "
>        done
> }
>
> capture_results()
> {
>        cat /proc/sched_debug > sched_log
>        lev=""
>        for (( i=1; i<=$LEVELS; i++ ))
>        do
>                lev="$lev\/level${i}"
>        done
>        pkill -9 vmstat
>        avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')
>
>        rem=$(echo "scale=2; 100 - $avg" |bc)
>        echo "Average CPU Idle percentage $avg%"
>        echo "Bandwidth shared with remaining non-Idle $rem%"
>        for (( i=1; i<=5; i++ ))
>        do
>                cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
>                if [ $SUBGROUP -eq 1 ]
>                then
>                        jj=$(eval echo "\$NR_TASKS$i")
>                        for (( j=1; j<=$jj; j++ ))
>                        do
>                                cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
>                        done
>                fi
>        done
>        print_results $rem
> }
>
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ