linux-kernel - Re: SCHED

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151211203918.3e9d4c9a@luca-1225C>
Date:	Fri, 11 Dec 2015 20:39:18 +0100
From:	Luca Abeni <luca.abeni@...tn.it>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Juri Lelli <juri.lelli@...il.com>,
	Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: SCHED_RR vs push-pull

Hi Peter,

On Fri, 11 Dec 2015 15:10:28 +0100
Peter Zijlstra <peterz@...radead.org> wrote:
[...]
> Thomas just reported a 'fun' problem with our rt 'load-balancer'.
I suspect the root of the proble is that rt push/pull do not implement
a load balancer, but just make sure that the M high priority tasks
(where M is the number of CPUs) are scheduled for execution.
The difference with a "real" load balancer can be seen when there are
multiple tasks with the same priority :)


> The problem is 2 cpus 4 equal prio RR tasks.
> Suppose an unequal distribution of these tasks among the CPUs; eg 1:3.
> 
> Now one would expect; barring other constraints; that each CPU would
> get 2 of the tasks and they would RR on their prio level.
> 
> This does not happen.
> 
> The push-pull thing only acts when there's idle or new tasks, and in
> the above scenario, the CPU with only the single RR task will happily
> continue running that task, while the other CPU will have to RR
> between the remaining 3.
I might be wrong, but I think this is due to the
	if (lowest_rq->rt.highest_prio.curr <= task->prio) {
in rt.c::find_lock_lowest_rq().
I suspect that changing "<=" in "<" might fix the issue, at the cost of
generating a lot of useless tasks migrations.

> Now my initial thoughts were to define a global RR order using a
> virtual timeline and you'll get something like EEVDF on a per RR prio
> level with push-pull state between that.
> 
> Which might be a tad over engineered.
I suspect this issue can be fixed in a simpler way, by changing the
check I mentioned above.

If you want to balance SCHED_RR tasks with the same priority, I think
the "lowest_rq->rt.highest_prio.curr <= task->prio" should be extended
to do the migration if:
- the local task has a higher priority than the highest priority task
  on lowest_rq (this is what's currently done)
- the local task has the same priority of the highest priority task on
  lowest_rq and they are SCHED_RR and the number of tasks with
  task->prio on the local RQ is larger than the number of tasks with
  lowest_rq->rt.highest_prio.curr on lowest_rq + 1.

I think this could work, but I just looked at the code, without any
real test. If you provide a simple program implementing a testcase, I
can do some experiments in next week.

The alternative (of course I have to mention it :) would be to use
SCHED_DEADLINE instead of SCHED_RR.



				Luca

> 
> Is there a 'sane' solution to this problem? One that still is
> deterministic, because this is after all, RT scheduling.
> 
> Happy thinking ;-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/