[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201116193149.GW3371@techsingularity.net>
Date: Mon, 16 Nov 2020 19:31:49 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Will Deacon <will@...nel.org>, Davidlohr Bueso <dave@...olabs.net>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: Loadavg accounting error on arm64
On Mon, Nov 16, 2020 at 03:20:05PM +0100, Peter Zijlstra wrote:
> > I think this is at least one possibility. I think at least that one
> > should only be explicitly set on WF_MIGRATED and explicitly cleared in
> > sched_ttwu_pending. While I haven't audited it fully, it might be enough
> > to avoid a double write outside of the rq lock on the bitfield but I
> > still need to think more about the ordering of sched_contributes_to_load
> > and whether it's ordered by p->on_cpu or not.
>
> The scenario you're worried about is something like:
>
> CPU0 CPU1
>
> schedule()
> prev->sched_contributes_to_load = X;
> deactivate_task(prev);
>
> try_to_wake_up()
> if (p->on_rq &&) // false
> if (smp_load_acquire(&p->on_cpu) && // true
> ttwu_queue_wakelist())
> p->sched_remote_wakeup = Y;
>
> smp_store_release(prev->on_cpu, 0);
>
Yes.
> And then the stores of X and Y clobber one another.. Hummph, seems
> reasonable. One quick thing to test would be something like this:
>
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 7abbdd7f3884..9844e541c94c 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -775,7 +775,9 @@ struct task_struct {
> unsigned sched_reset_on_fork:1;
> unsigned sched_contributes_to_load:1;
> unsigned sched_migrated:1;
> + unsigned :0;
> unsigned sched_remote_wakeup:1;
> + unsigned :0;
> #ifdef CONFIG_PSI
> unsigned sched_psi_wake_requeue:1;
> #endif
And this works.
986.01 1008.17 1013.15 2/855 1212
362.19 824.70 949.75 1/856 1564
133.19 674.65 890.32 1/864 1958
49.04 551.89 834.61 2/871 2339
18.33 451.54 782.41 1/867 2686
6.77 369.37 733.45 1/866 2929
2.55 302.16 687.55 1/864 2931
0.97 247.18 644.52 1/860 2933
0.48 202.23 604.20 1/849 2935
I should have gone with this after rereading the warning about bit fields
having to be protected by the same lock in the "anti-guarantees" section
of memory-barriers.txt :(
sched_psi_wake_requeue can probably stay with the other three fields
given they are under the rq lock but sched_remote_wakeup needs to move
out.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists