[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200917142438.GH1362448@hirez.programming.kicks-ass.net>
Date: Thu, 17 Sep 2020 16:24:38 +0200
From: peterz@...radead.org
To: Thomas Gleixner <tglx@...utronix.de>
Cc: LKML <linux-kernel@...r.kernel.org>,
Sebastian Siewior <bigeasy@...utronix.de>,
Qais Yousef <qais.yousef@....com>,
Scott Wood <swood@...hat.com>,
Valentin Schneider <valentin.schneider@....com>,
Ingo Molnar <mingo@...nel.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Vincent Donnefort <vincent.donnefort@....com>
Subject: Re: [patch 09/10] sched/core: Add migrate_disable/enable()
On Thu, Sep 17, 2020 at 11:42:11AM +0200, Thomas Gleixner wrote:
> +static inline void update_nr_migratory(struct task_struct *p, long delta)
> +{
> + if (p->nr_cpus_allowed > 1 && p->sched_class->update_migratory)
> + p->sched_class->update_migratory(p, delta);
> +}
Right, so as you know, I totally hate this thing :-) It adds a second
(and radically different) version of changing affinity. I'm working on a
version that uses the normal *set_cpus_allowed*() interface.
> +/*
> + * The migrate_disable/enable() fastpath updates only the tasks migrate
> + * disable count which is sufficient as long as the task stays on the CPU.
> + *
> + * When a migrate disabled task is scheduled out it can become subject to
> + * load balancing. To prevent this, update task::cpus_ptr to point to the
> + * current CPUs cpumask and set task::nr_cpus_allowed to 1.
> + *
> + * If task::cpus_ptr does not point to task::cpus_mask then the update has
> + * been done already. This check is also used in in migrate_enable() as an
> + * indicator to restore task::cpus_ptr to point to task::cpus_mask
> + */
> +static inline void sched_migration_ctrl(struct task_struct *prev, int cpu)
> +{
> + if (!prev->migration_ctrl.disable_cnt ||
> + prev->cpus_ptr != &prev->cpus_mask)
> + return;
> +
> + prev->cpus_ptr = cpumask_of(cpu);
> + update_nr_migratory(prev, -1);
> + prev->nr_cpus_allowed = 1;
> +}
So this thing is called from schedule(), with only rq->lock held, and
that violates the locking rules for changing the affinity.
I have a comment that explains how it's broken and why it's sort-of
working.
> +void migrate_disable(void)
> +{
> + unsigned long flags;
> +
> + if (!current->migration_ctrl.disable_cnt) {
> + raw_spin_lock_irqsave(¤t->pi_lock, flags);
> + current->migration_ctrl.disable_cnt++;
> + raw_spin_unlock_irqrestore(¤t->pi_lock, flags);
> + } else {
> + current->migration_ctrl.disable_cnt++;
> + }
> +}
That pi_lock seems unfortunate, and it isn't obvious what the point of
it is.
> +void migrate_enable(void)
> +{
> + struct task_migrate_data *pending;
> + struct task_struct *p = current;
> + struct rq_flags rf;
> + struct rq *rq;
> +
> + if (WARN_ON_ONCE(p->migration_ctrl.disable_cnt <= 0))
> + return;
> +
> + if (p->migration_ctrl.disable_cnt > 1) {
> + p->migration_ctrl.disable_cnt--;
> + return;
> + }
> +
> + raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
> + p->migration_ctrl.disable_cnt = 0;
> + pending = p->migration_ctrl.pending;
> + p->migration_ctrl.pending = NULL;
> +
> + /*
> + * If the task was never scheduled out while in the migrate
> + * disabled region and there is no migration request pending,
> + * return.
> + */
> + if (!pending && p->cpus_ptr == &p->cpus_mask) {
> + raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
> + return;
> + }
> +
> + rq = __task_rq_lock(p, &rf);
> + /* Was it scheduled out while in a migrate disabled region? */
> + if (p->cpus_ptr != &p->cpus_mask) {
> + /* Restore the tasks CPU mask and update the weight */
> + p->cpus_ptr = &p->cpus_mask;
> + p->nr_cpus_allowed = cpumask_weight(&p->cpus_mask);
> + update_nr_migratory(p, 1);
> + }
> +
> + /* If no migration request is pending, no further action required. */
> + if (!pending) {
> + task_rq_unlock(rq, p, &rf);
> + return;
> + }
> +
> + /* Migrate self to the requested target */
> + pending->res = set_cpus_allowed_ptr_locked(p, pending->mask,
> + pending->check, rq, &rf);
> + complete(pending->done);
> +}
So, what I'm missing with all this are the design contraints for this
trainwreck. Because the 'sane' solution was having migrate_disable()
imply cpus_read_lock(). But that didn't fly because we can't have
migrate_disable() / migrate_enable() schedule for raisins.
And if I'm not mistaken, the above migrate_enable() *does* require being
able to schedule, and our favourite piece of futex:
raw_spin_lock_irq(&q.pi_state->pi_mutex.wait_lock);
spin_unlock(q.lock_ptr);
is broken. Consider that spin_unlock() doing migrate_enable() with a
pending sched_setaffinity().
Let me ponder this more..
Powered by blists - more mailing lists