linux-kernel - Re: [PATCH] sched: Better document ttwu()

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <jhjmu4eg6ai.mognet@arm.com>
Date:   Sun, 05 Jul 2020 01:21:57 +0100
From:   Valentin Schneider <valentin.schneider@....com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...nel.org>, linux-kernel@...r.kernel.org,
        vincent.guittot@...aro.org, mgorman@...e.de,
        Oleg Nesterov <oleg@...hat.com>, david@...morbit.com,
        pauld@...hat.com, dietmar.eggemann@....com
Subject: Re: [PATCH] sched: Better document ttwu()


On 03/07/20 14:32, Peter Zijlstra wrote:
> Dave hit the problem fixed by commit:
>
>   b6e13e85829f ("sched/core: Fix ttwu() race")
>
> and failed to understand much of the code involved. Per his request a
> few comments to (hopefully) clarify things.
>
> Requested-by: Dave Chinner <david@...morbit.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>

Small extra comment below, otherwise FWIW:

Reviewed-by: Valentin Schneider <valentin.schneider@....com>

> +++ b/kernel/sched/core.c
> @@ -79,6 +79,100 @@ __read_mostly int scheduler_running;
>   */
>  int sysctl_sched_rt_runtime = 950000;
>
> +
> +/*
> + * Serialization rules:
> + *
> + * Lock order:
> + *
> + *   p->pi_lock
> + *     rq->lock
> + *       hrtimer_cpu_base->lock (hrtimer_start() for bandwidth controls)
> + *
> + *  rq1->lock
> + *    rq2->lock  where: rq1 < rq2
> + *
> + * Regular state:
> + *
> + * Normal scheduling state is serialized by rq->lock. __schedule() takes the
> + * local CPU's rq->lock, it optionally removes the task from the runqueue and
> + * always looks at the local rq data structures to find the most elegible task
> + * to run next.
> + *
> + * Task enqueue is also under rq->lock, possibly taken from another CPU.
> + * Wakeups from another LLC domain might use an IPI to transfer the enqueue to
> + * the local CPU to avoid bouncing the runqueue state around [ see
> + * ttwu_queue_wakelist() ]
> + *
> + * Task wakeup, specifically wakeups that involve migration, are horribly
> + * complicated to avoid having to take two rq->locks.
> + *
> + * Special state:
> + *
> + * System-calls and anything external will use task_rq_lock() which acquires
> + * both p->pi_lock and rq->lock. As a consequence the state they change is
> + * stable while holding either lock:
> + *
> + *  - sched_setaffinity()/
> + *    set_cpus_allowed_ptr():	p->cpus_ptr, p->nr_cpus_allowed
> + *  - set_user_nice():		p->se.load, p->*prio
> + *  - __sched_setscheduler():	p->sched_class, p->policy, p->*prio,
> + *				p->se.load, p->rt_priority,
> + *				p->dl.dl_{runtime, deadline, period, flags, bw, density}

Uclamp stuff can also get updated in __sched_setscheduler(); see
__setscheduler_uclamp(). It's only p->uclamp_req AFAICT, but I don't think
there's harm in just saying p->uclamp*.

> + *  - sched_setnuma():		p->numa_preferred_nid
> + *  - sched_move_task()/
> + *    cpu_cgroup_fork():	p->sched_task_group
> + *  - uclamp_update_active()	p->uclamp*
> + *