[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201116091054.GL3371@techsingularity.net>
Date: Mon, 16 Nov 2020 09:10:54 +0000
From: Mel Gorman <mgorman@...hsingularity.net>
To: Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>
Cc: Davidlohr Bueso <dave@...olabs.net>,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Loadavg accounting error on arm64
Hi,
I got cc'd internal bug report filed against a 5.8 and 5.9 kernel
that loadavg was "exploding" on arch64 on a machines acting as a build
servers. It happened on at least two different arm64 variants. That setup
is complex to replicate but fortunately can be reproduced by running
hackbench-process-pipes while heavily overcomitting a machine with 96
logical CPUs and then checking if loadavg drops afterwards. With an
MMTests clone, I reproduced it as follows
./run-mmtests.sh --config configs/config-workload-hackbench-process-pipes --no-monitor testrun; \
for i in `seq 1 60`; do cat /proc/loadavg; sleep 60; done
Load should drop to 10 after about 10 minutes and it does on x86-64 but
remained at around 200+ on arm64.
The reproduction case simply hammers the case where a task can be
descheduling while also being woken by another task at the same time. It
takes a long time to run but it makes the problem very obvious. The
expectation is that after hackbench has been running and saturating the
machine for a long time.
Commit dbfb089d360b ("sched: Fix loadavg accounting race") fixed a loadavg
accounting race in the generic case. Later it was documented why the
ordering of when p->sched_contributes_to_load is read/updated relative
to p->on_cpu. This is critical when a task is descheduling at the same
time it is being activated on another CPU. While the load/stores happen
under the RQ lock, the RQ lock on its own does not give any guarantees
on the task state.
Over the weekend I convinced myself that it must be because the
implementation of smp_load_acquire and smp_store_release do not appear
to implement acquire/release semantics because I didn't find something
arm64 that was playing with p->state behind the schedulers back (I could
have missed it if it was in an assembly portion as I can't reliablyh read
arm assembler). Similarly, it's not clear why the arm64 implementation
does not call smp_acquire__after_ctrl_dep in the smp_load_acquire
implementation. Even when it was introduced, the arm64 implementation
differed significantly from the arm implementation in terms of what
barriers it used for non-obvious reasons.
Unfortunately, making that work similar to the arch-independent version
did not help but it's not helped that I know nothing about the arm64
memory model.
I'll be looking again today to see can I find a mistake in the ordering for
how sched_contributes_to_load is handled but again, the lack of knowledge
on the arm64 memory model means I'm a bit stuck and a second set of eyes
would be nice :(
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists