linux-kernel - Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b647ffbd0806110305p2be38608iadc1d2e91e3be57a@mail.gmail.com>
Date:	Wed, 11 Jun 2008 12:05:26 +0200
From:	"Dmitry Adamushko" <dmitry.adamushko@...il.com>
To:	"Peter Zijlstra" <a.p.zijlstra@...llo.nl>
Cc:	"Gregory Haskins" <ghaskins@...ell.com>,
	"Steven Rostedt" <rostedt@...dmis.org>,
	"Ingo Molnar" <mingo@...e.hu>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	linux-kernel@...r.kernel.org
Subject: Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"

2008/6/11 Peter Zijlstra <a.p.zijlstra@...llo.nl>:
> On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
>> Hi Gregory,
>>
>>
>> regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f
>>
>>
>> I think we can do it simpler. Please take a look at the patch below.
>>
>> Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and twice so on x86_64),
>> let's add "exclusive" (the ones that are bound to this CPU) tasks to the head of the queue
>> and "shared" ones -- to the end.
>>
>> In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not queued as now), meaning that
>> a task {i+1} is being placed in front of the previously woken up task {i}. But I don't think that
>> this behavior may cause any realistic problems.
>
> Doesn't this violate POSIX ?
>

If so, then the idea of "prioritize non-migratable tasks over
migratable ones" violates it, not just an artefact of this particular
implementation.

No matter which implementation is used, we have a situation when a
woken-up single-CPU-bound task (let's call it 'p') can preempt a
current task with effects as follows:

- 'current' is not guaranteed to get another CPU;

- there might have been other pending tasks (of equal prio) on this
queue. As a result, 'p' starts running before them violating currently
used (explicitly requested by POSIX?) round-robin behavior.

Anyway, the original (and my rework) algorithm is somewhat
sub-optimal. It considers only "p->rt.nr_cpus_allowed == 1", while the
ideal algorithm would behave as follows:

1) we get a newly woken-up task on CPU_i;

2) this task have a number (!= 1) of allowed CPUs (which is a sub-set
of all) _but_ all of them are busy at the moment serving high prio
tasks;

3) there are no other pending tasks on the queue (of equal prio) that
are in the same condition as 'p' (that's they can't be migrated to
other CPUs at the moment and to wait for CPU_i is the 'best' effort
for them);

4) 'current' is guaranteed to start running immediatelly on another
CPU (apparently, its affinity allows CPUs that can be used neither by
'p', nor by tasks from (3) ), should we preempt it now.

5) then let 'p' preempt 'current'.

We do cause some 'troubles' (in a way of 'cache' efficienty) for the
'current' but probably we win CPU-time-distribution-wise.

Note, hard real-time tasks would not like to find themselves in the
aforemnetioned situation. But then, one probably should have initially
taken care of assigning different prios or better off, isolating
dedicated CPUs for some tasks wrt how much CPU-time they
consume/latency they expect.

All in all, I think that the initial implementation is sub-optimal in
any case and it's not worth introducing additional overhead (another
priority queue array) to address this issue (which is kind of a corner
case).

We may just consider dropping this idea completely.
(my 0.02$)

-- 
Best regards,
Dmitry Adamushko
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/