linux-kernel - Re: cgroup, RT reservation per core(s)?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b6a2d2e20902100646q6c7073cse86b8f5790a120ac@mail.gmail.com>
Date:	Tue, 10 Feb 2009 14:46:40 +0000
From:	Rolando Martins <rolando.martins@...il.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: cgroup, RT reservation per core(s)?

On 2/10/09, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, 2009-02-09 at 20:04 +0000, Rolando Martins wrote:
>
>  > I should have elaborated this more:
>  >
>  >                      root
>  >                   ----|----
>  >                   |          |
>  > (0.5 mem) 0         1 (100% rt, 0.5 mem)
>  >                          ---------
>  >                          |    |    |
>  >                          2   3   4  (33% rt for each group, 33% mem
>  > per group(0.165))
>  > Rol
>
>
>
> Right, i think this can be done.
>
>  You would indeed need cpusets and sched-cgroups.
>
>  Split the machine in 2 using cpusets.
>
>    ___R___
>   /       \
>   A         B
>
>  Where R is the root cpuset, and A and B are the siblings.
>  Assign A one half the cpus, and B the other half.
>  Disable load-balancing on R.
>
>  Then using sched cgroups create the hierarchy
>
>   ____1____
>   /    |    \
>  2     3     4
>
>  Where 1 can be the root group if you like.
>
>  Assign 1 a utilization limit of 100%, and 2,3 and 4 a utilization limit
>  of 33% each.
>
>  Then place the tasks that get 100% cputime on your 2 cpus in cpuset A
>  and sched group 1.
>
>  Place your other tasks in B,{2-4} respectively.
>
>  The reason this works is that bandwidth distribution is sched domain
>  wide, and by disabling load-balancing on R, you split the schedule
>  domain.
>
>  I've never actually tried anything like this, let me know if it
>  works ;-)
>
On 2/10/09, Peter Zijlstra <peterz@...radead.org> wrote:
> On Mon, 2009-02-09 at 20:04 +0000, Rolando Martins wrote:
>
>  > I should have elaborated this more:
>  >
>  >                      root
>  >                   ----|----
>  >                   |          |
>  > (0.5 mem) 0         1 (100% rt, 0.5 mem)
>  >                          ---------
>  >                          |    |    |
>  >                          2   3   4  (33% rt for each group, 33% mem
>  > per group(0.165))
>  > Rol
>
>
>
> Right, i think this can be done.
>
>  You would indeed need cpusets and sched-cgroups.
>
>  Split the machine in 2 using cpusets.
>
>    ___R___
>   /       \
>   A         B
>
>  Where R is the root cpuset, and A and B are the siblings.
>  Assign A one half the cpus, and B the other half.
>  Disable load-balancing on R.
>
>  Then using sched cgroups create the hierarchy
>
>   ____1____
>   /    |    \
>  2     3     4
>
>  Where 1 can be the root group if you like.
>
>  Assign 1 a utilization limit of 100%, and 2,3 and 4 a utilization limit
>  of 33% each.
>
>  Then place the tasks that get 100% cputime on your 2 cpus in cpuset A
>  and sched group 1.
>
>  Place your other tasks in B,{2-4} respectively.
>
>  The reason this works is that bandwidth distribution is sched domain
>  wide, and by disabling load-balancing on R, you split the schedule
>  domain.
>
>  I've never actually tried anything like this, let me know if it
>  works ;-)
>

Thanks Peter, it works!
I am thinking for different strategies to be used in my rt middleware
project, and I think there is a limitation.
If I wanted to have some RT on the B cpuset, I couldn't because I
assigned A.cpu.rt_runtime_ns = root.cpu.rt_runtime_ns (then subdivided
the A cpuset, with 2,3,4, each one having A.cpu.rt_runtime_ns/3).

This happens because there is a global /proc/sys/kernel/sched_rt_runtime_us and
/proc/sys/kernel/sched_rt_period_us.
What do you think about adding a separate tuple (runtime,period) for
each core/cpu?

In this case:
/proc/sys/kernel/sched_rt_runtime_us_0
/proc/sys/kernel/sched_rt_period_us_0
...
/proc/sys/kernel/sched_rt_runtime_us_n (n, cpu count)
/proc/sys/kernel/sched_rt_period_us_n


Given this, we could the following:

mkdir /dev/cgroup/A
echo 0-1 > /dev/cgroup/A/cpuset.cpus
echo 0 > /dev/cgroup/A/cpuset.mems
echo 1000000 > /dev/cgroup/A/cpu.rt_period_us
echo 1000000 > /dev/cgroup/A/cpu.rt_runtime_us

This would only work if we could allocate
(cpu.rt_runtime_us,cpu.rt_period_us) in both CPU 0 and CPU 1,
otherwise fail.

mkdir /dev/cgroup/B
echo 2-3 > /dev/cgroup/B/cpuset.cpus
echo 0 > /dev/cgroup/B/cpuset.mems
echo 1000000 > /dev/cgroup/B/cpu.rt_period_us
echo 800000 > /dev/cgroup/B/cpu.rt_runtime_us
The same here, failed if we couldn't allocate 0.8 in both CPU 2 and CPU 3.

Does this make sense? ;)

Rol
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/