lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 20 Apr 2018 14:21:30 +0300
From:   Kirill Tkhai <ktkhai@...tuozzo.com>
To:     Juri Lelli <juri.lelli@...il.com>
Cc:     mingo@...hat.com, peterz@...radead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched/rt: Rework for_each_process_thread() iterations in
 tg_has_rt_tasks()

On 20.04.2018 13:58, Juri Lelli wrote:
> On 20/04/18 12:43, Kirill Tkhai wrote:
>> On 20.04.2018 12:25, Juri Lelli wrote:
> 
> [...]
> 
>>> Isn't this however checking against the current (dynamic) number of
>>> runnable tasks/groups instead of the "static" group membership (which
>>> shouldn't be affected by a task running state)?
>>
>> Ah, you are sure. I forgot that rt_nr_running does not contain sleeping tasks.
>> We need to check something else here. I'll try to find another way.
> 
> n/p. Maybe a per rt_rq flag linked to "static" membership (I didn't
> really thought this through though :).

sched_move_task() does not change any rt_rq's fields on moving a dequeued
task, so we definitely can't use rt_rq to detect all the tasks.

> BTW, since you faced this problem, I guess this is on RT_GROUP_SCHED
> enabled boxes, so I'd have a couple of questions (not strictly related
> to the problem at hand):
> 
>  - do such boxes rely on RT throttling being performed at root level?
>  - is RT_RUNTIME_SHARE enabled?

This is a machine with many fair_sched_class tasks, while there are no
(almost) RT tasks there, and there is no "real" real-time with throttling.
They are not interesting from this point of view. Very small number
of task group may have rt_runtime != 0 there, and where so, rt_runtime value
is small. The problem I'm solving happened, when rt_period of one of such groups
were restored, and it hanged for ages, because of enormous number of child
cgroups and tasks (of fair_sched_class) in system.

RT_RUNTIME_SHARE is not enabled, as it's RH7-based 3.10 kernel.

Really, it will be very difficult to provide real-time on machines with
such the big number of tasks, in case of tasks allocates some resources
(not all the memory are pinned from going to slab, there is some IO, etc),
since there are some tasks-to-solve even in !rt case.

Kirill

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ