[ Impact: implement fork vruntime boosting when forks are performed from a interactive or timer wakeup chain. ] Add new features: INTERACTIVE_FORK_EXPEDITED TIMER_FORK_EXPEDITED to expedite forks performed from interactive and timer wakeup chains. INTERACTIVE_FORK_EXPEDITED is needed to make timer_create() with sigev_notify = SIGEV_THREAD POSIX API have lower latencies than it currently does. Yes, spawning a new thread each time the timer fires is an utter ugliness, but this is a standard API people rely on. We seem to have a two choices there: either: 1) we push for SIGEV_THREAD deprecation. This is, after all, an utter glibc mess, where thread creation and memory allocation failing is no dealt with, and where the helper thread waiting for the signal is created the first time timer_create() is invoked, and therefore keeps the cgroup/scheduler/etc. state of the first caller. or 2) We try to support this standard behavior at the kernel level, with TIMER_FORK_EXPEDITED. This patch brings down the average latency of wakeup-latency.c from 4000µs down to 160µs by making sure the thread spawned when the timer fires is not put at the end of the current period, but rather gets a vruntime boost. This fork vruntime boost given by executing through an interactive or timer wakeup chain is not transferrable to children. This is intended to try ensuring some degree of safety against timer-based fork bombs. Disabling START_DEBIT instead of doing these *_FORK_EXPEDITED does not give good results under a make -j5 kernel build, uniprocessor machine: Xorg interactivity suffers a lot. Signed-off-by: Mathieu Desnoyers CC: Peter Zijlstra --- include/linux/sched.h | 3 ++- kernel/sched.c | 8 ++++++++ kernel/sched_fair.c | 8 ++++++++ kernel/sched_features.h | 11 +++++++++++ 4 files changed, 29 insertions(+), 1 deletion(-) Index: linux-2.6-lttng.laptop/include/linux/sched.h =================================================================== --- linux-2.6-lttng.laptop.orig/include/linux/sched.h +++ linux-2.6-lttng.laptop/include/linux/sched.h @@ -1131,7 +1131,8 @@ struct sched_entity { struct list_head group_node; unsigned int on_rq:1, interactive:1, - timer:1; + timer:1, + fork_expedited:1; u64 exec_start; u64 sum_exec_runtime; Index: linux-2.6-lttng.laptop/kernel/sched.c =================================================================== --- linux-2.6-lttng.laptop.orig/kernel/sched.c +++ linux-2.6-lttng.laptop/kernel/sched.c @@ -2504,6 +2504,14 @@ void sched_fork(struct task_struct *p, i if (!rt_prio(p->prio)) p->sched_class = &fair_sched_class; + if ((sched_feat(INTERACTIVE_FORK_EXPEDITED) + && (current->sched_wake_interactive || current->se.interactive)) + || (sched_feat(TIMER_FORK_EXPEDITED) + && (current->sched_wake_timer || current->se.timer))) + p->se.fork_expedited = 1; + else + p->se.fork_expedited = 0; + if (p->sched_class->task_fork) p->sched_class->task_fork(p); Index: linux-2.6-lttng.laptop/kernel/sched_fair.c =================================================================== --- linux-2.6-lttng.laptop.orig/kernel/sched_fair.c +++ linux-2.6-lttng.laptop/kernel/sched_fair.c @@ -731,6 +731,14 @@ place_entity(struct cfs_rq *cfs_rq, stru u64 vruntime = cfs_rq->min_vruntime; /* + * Expedite forks when requested rather than putting forked thread in a + * delayed slot. + */ + if ((sched_feat(INTERACTIVE_FORK_EXPEDITED) + || sched_feat(TIMER_FORK_EXPEDITED)) && se->fork_expedited) + initial = 0; + + /* * The 'current' period is already promised to the current tasks, * however the extra weight of the new task will slow them down a * little, place the new task so that it fits in the slot that Index: linux-2.6-lttng.laptop/kernel/sched_features.h =================================================================== --- linux-2.6-lttng.laptop.orig/kernel/sched_features.h +++ linux-2.6-lttng.laptop/kernel/sched_features.h @@ -59,9 +59,20 @@ SCHED_FEAT(DYN_MIN_VRUNTIME, 0) */ SCHED_FEAT(INTERACTIVE, 0) /* + * Expedite forks performed from a wakeup chain coming from the input subsystem. + * Depends on the INTERACTIVE feature for following the wakeup chain across + * threads. + */ +SCHED_FEAT(INTERACTIVE_FORK_EXPEDITED, 0) +/* * Timer subsystem next buddy affinity. Not transitive across new task wakeups. */ SCHED_FEAT(TIMER, 0) +/* + * Expedite forks performed from a wakeup chain coming from the timer subsystem. + * Depends on the TIMER feature for following the wakeup chain across threads. + */ +SCHED_FEAT(TIMER_FORK_EXPEDITED, 0) /* * Spin-wait on mutex acquisition when the mutex owner is running on -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/