linux-kernel - Re: [PATCH 0/3] sched: use EDF to throttle RT task groups v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4BA937CE.9060002@sssup.it>
Date:	Tue, 23 Mar 2010 22:51:10 +0100
From:	Tommaso Cucinotta <tommaso.cucinotta@...up.it>
To:	Dhaval Giani <dhaval@...is.sssup.it>
CC:	Peter Zijlstra <peterz@...radead.org>,
	Fabio Checconi <fchecconi@...il.com>,
	Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Paul Turner <pjt@...gle.com>,
	Dario Faggioli <faggioli@...dalf.sssup.it>,
	Michael Trimarchi <michael@...dence.eu.com>,
	Tommaso Cucinotta <t.cucinotta@...up.it>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/3] sched: use EDF to throttle RT task groups v2

Dhaval Giani wrote:
> But I can also see why one would not want a multi-valued interface, esp
> when the idea is just to change the runtimes. (though there is a
> complicated interaction between task_runtime and runtime which I am not
> sure how to avoid).
>
> IOW, this interface sucks :-). We really need something better and
> easier to use. (Sorry for no constructive input)
>   
Hi,

is it really so bad to think of a well-engineered API for real-time 
scheduling services of the OS, to be made available to applications by 
means of a library, and to be implemented by whatever means fits best in 
the current kernel/user-space interaction model ? For example, variants 
on the sched_setscheduler() syscall (remember the 
sched_setscheduler_ex() for SCHED_SPORADIC ?), a completely new set of 
syscalls, a cgroupfs based interaction, a set of binary files within the 
cgroupfs, a set of ioctl()s over cgroupfs entries (somebody must have 
told me this is not possible), or a special device in /dev, /sys, /proc, 
/wherever, etc.

For example, on OS-X there seems to be this THREAD_TIME_CONSTRAINT_POLICY

 http://developer.apple.com/mac/library/documentation/Darwin/Conceptual/KernelProgramming/scheduler/scheduler.html#//apple_ref/doc/uid/TP30000905-CH211-BABCHEEB

which is claimed to be used by multimedia and system interactive 
services, even if at the kernel level I don't know how it is implemented 
and what it actually provides.

Also, in the context of some research projects, a few APIs have come out 
in the last few years for Linux as well. Now, I don't want to say that 
we must have something as ugly as:

  int frsh_contract_set_resource_and_label
  (frsh_contract_t *contract,
   const frsh_resource_type_t resource_type,
   const frsh_resource_id_t resource_id,
   const char *contract_label);

and as complex and multi-faceted as the entire FRESCOR API

  http://www.frescor.org/
  
http://www.frescor.org/index.php?mact=Uploads,cntnt01,getfile,0&cntnt01showtemplate=false&cntnt01upload_id=75&cntnt01returnid=54

pretending to merge into a single framework management of real-time 
computing, networking, storage, or even memory allocation. However, at 
least that experience may help in identifying the requirements for a 
well-engineered approach to a real-time interface. I also know it cannot 
be something as naive and simple as the AQuoSA API

  
http://aquosa.sourceforge.net/aquosa-docs/aquosa-qosres/html/group__QRES__LIB.html

designed around a single-processor embedded (and academic) context.

I'm really scared that this cgroupfs-based kind of interfaces fit well 
only within requirements of "static partitioning" of the system by 
sysadmins, whilst general real-time, interactive and multimedia 
applications cannot easily benefit of the potentially available 
real-time guarantees (in our research we used to dynamically change the 
reserved resources (runtime) for the application every 40ms or so, 
others from the same group desire some kind of "elastic scheduling" 
where the reservation period is changed dynamically for control tasks at 
an even higher rate . . . I know that those ones may represent 
pathologically and polarized scenarios of no general interest as well).

Another example: we can quickly find out that we may need more than 
atomically set 2 parameters, just as an example one may just have:
- runtime
- period
- a set of flags governing the exact scheduling behavior, for example:
  - whether or not it may take more than the assigned runtime
  - if yes, by what means (SCHED_OTHER when runtime exhausted a'la 
AQuoSA, or low priority a'la Sporadic Server, or deadline post-ponement 
a'la Constant Bandwidth Server, or what ?)
  - any weight for governing a weighted fair partitioning of the excess 
bandwidth ?
  - on Mac OS-X, they seem to have a flag driving preemtability of the 
process
  - whether we want partitioned scheduling or global scheduling ?
  - whether we want to allocate on an individual CPU, on all available 
CPUs a'la Fabio's scheduler, or on a cpuset ?
- low priority ?
- signal to be delivered in case of budget overrun ?
- something mad about synchronization, such as blocking times ? (ok, now 
I'm starting to talk real-time-ish, I'll stop).

and, we may need more complex operations than simply reading/writing 
runtimes and periods, such as:
- attaching/detaching threads
- monitoring the available instantaneous budget
- setting-up hierarchical scheduling (ok, for such things the cgroups 
seems just perfect)

My 2 cents (apologies for the length),

    Tommaso

-- 
Tommaso Cucinotta, Computer Engineering PhD, Researcher
ReTiS Lab, Scuola Superiore Sant'Anna, Pisa, Italy
Tel +39 050 882 024, Fax +39 050 882 003
http://retis.sssup.it/people/tommaso

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/