[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <jhjmu4eg6ai.mognet@arm.com>
Date: Sun, 05 Jul 2020 01:21:57 +0100
From: Valentin Schneider <valentin.schneider@....com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
vincent.guittot@...aro.org, mgorman@...e.de,
Oleg Nesterov <oleg@...hat.com>, david@...morbit.com,
pauld@...hat.com, dietmar.eggemann@....com
Subject: Re: [PATCH] sched: Better document ttwu()
On 03/07/20 14:32, Peter Zijlstra wrote:
> Dave hit the problem fixed by commit:
>
> b6e13e85829f ("sched/core: Fix ttwu() race")
>
> and failed to understand much of the code involved. Per his request a
> few comments to (hopefully) clarify things.
>
> Requested-by: Dave Chinner <david@...morbit.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Small extra comment below, otherwise FWIW:
Reviewed-by: Valentin Schneider <valentin.schneider@....com>
> +++ b/kernel/sched/core.c
> @@ -79,6 +79,100 @@ __read_mostly int scheduler_running;
> */
> int sysctl_sched_rt_runtime = 950000;
>
> +
> +/*
> + * Serialization rules:
> + *
> + * Lock order:
> + *
> + * p->pi_lock
> + * rq->lock
> + * hrtimer_cpu_base->lock (hrtimer_start() for bandwidth controls)
> + *
> + * rq1->lock
> + * rq2->lock where: rq1 < rq2
> + *
> + * Regular state:
> + *
> + * Normal scheduling state is serialized by rq->lock. __schedule() takes the
> + * local CPU's rq->lock, it optionally removes the task from the runqueue and
> + * always looks at the local rq data structures to find the most elegible task
> + * to run next.
> + *
> + * Task enqueue is also under rq->lock, possibly taken from another CPU.
> + * Wakeups from another LLC domain might use an IPI to transfer the enqueue to
> + * the local CPU to avoid bouncing the runqueue state around [ see
> + * ttwu_queue_wakelist() ]
> + *
> + * Task wakeup, specifically wakeups that involve migration, are horribly
> + * complicated to avoid having to take two rq->locks.
> + *
> + * Special state:
> + *
> + * System-calls and anything external will use task_rq_lock() which acquires
> + * both p->pi_lock and rq->lock. As a consequence the state they change is
> + * stable while holding either lock:
> + *
> + * - sched_setaffinity()/
> + * set_cpus_allowed_ptr(): p->cpus_ptr, p->nr_cpus_allowed
> + * - set_user_nice(): p->se.load, p->*prio
> + * - __sched_setscheduler(): p->sched_class, p->policy, p->*prio,
> + * p->se.load, p->rt_priority,
> + * p->dl.dl_{runtime, deadline, period, flags, bw, density}
Uclamp stuff can also get updated in __sched_setscheduler(); see
__setscheduler_uclamp(). It's only p->uclamp_req AFAICT, but I don't think
there's harm in just saying p->uclamp*.
> + * - sched_setnuma(): p->numa_preferred_nid
> + * - sched_move_task()/
> + * cpu_cgroup_fork(): p->sched_task_group
> + * - uclamp_update_active() p->uclamp*
> + *
Powered by blists - more mailing lists