[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1585904125-2819-1-git-send-email-laoar.shao@gmail.com>
Date: Fri, 3 Apr 2020 04:55:25 -0400
From: Yafang Shao <laoar.shao@...il.com>
To: hannes@...xchg.org, peterz@...radead.org, akpm@...ux-foundation.org
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Yafang Shao <laoar.shao@...il.com>,
Daniel Drake <drake@...lessm.com>,
Suren Baghdasaryan <surenb@...gle.com>
Subject: [PATCH] psi: fix randomized calculation in record_times()
In record_times() we use 'now' and groupc->state_start to calculate the
delta as bellow,
delta = now - groupc->state_start;
But note that groupc->state_start may be not initialized yet, IOW, the
state_start may be 0 currently. If state_start is 0, this calculation is
same with assigning the lower 32-bit of 'now' to delta, that is a random
value. To fix this value, we should initialize groupc->state_start before.
After we calculate the delta, we will assign 'now' to
groupc->state_start then,
groupc->state_start = now;
This will cause the same issue if groupc->state_start will not be used in a
long period. Let's take an example. We create a cgroup foo and run tasks
in it. Some of these tasks enter into memstall and state_start is set.
Then we move all of these tasks out of cgroup foo for more than (1 << 32)
nsecs, and then move them in. That will cause the same issue as above.
The root cause of these issues is that we don't initialize the state_start
properly. To fix it, we should record how many tasks in this per cpu
psi_group. If there's no task in it, we just set state_start and don't
calculate the delta, that means it is the begin of the pressure. To
avoid redundant calculating the total number of tasks in this per cpu
psi_group, a new member 'total_tasks' is introduced in struct
psi_group_cpu, which is the sum of array members in tasks[].
Fixes: eb414681d5a0 ("psi: pressure stall information for CPU, memory, and IO")
Cc: Johannes Weiner <hannes@...xchg.org>
Cc: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: Daniel Drake <drake@...lessm.com>
Cc: Suren Baghdasaryan <surenb@...gle.com>
Signed-off-by: Yafang Shao <laoar.shao@...il.com>
---
include/linux/psi_types.h | 2 ++
kernel/sched/psi.c | 13 ++++++++++---
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h
index 4b7258495a04..b42cbfdb15e9 100644
--- a/include/linux/psi_types.h
+++ b/include/linux/psi_types.h
@@ -69,6 +69,8 @@ struct psi_group_cpu {
/* States of the tasks belonging to this group */
unsigned int tasks[NR_PSI_TASK_COUNTS];
+ /* Sum of above array members */
+ unsigned int total_tasks;
/* Aggregate pressure state derived from the tasks */
u32 state_mask;
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 8f45cdb6463b..7061529dc406 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -690,7 +690,10 @@ static void psi_group_change(struct psi_group *group, int cpu,
*/
write_seqcount_begin(&groupc->seq);
- record_times(groupc, cpu, false);
+ if (groupc->total_tasks)
+ record_times(groupc, cpu, false);
+ else
+ groupc->state_start = cpu_clock(cpu);
for (t = 0, m = clear; m; m &= ~(1 << t), t++) {
if (!(m & (1 << t)))
@@ -703,11 +706,15 @@ static void psi_group_change(struct psi_group *group, int cpu,
psi_bug = 1;
}
groupc->tasks[t]--;
+ groupc->total_tasks--;
}
- for (t = 0; set; set &= ~(1 << t), t++)
- if (set & (1 << t))
+ for (t = 0; set; set &= ~(1 << t), t++) {
+ if (set & (1 << t)) {
groupc->tasks[t]++;
+ groupc->total_tasks++;
+ }
+ }
/* Calculate state mask representing active states */
for (s = 0; s < NR_PSI_STATES; s++) {
--
2.18.2
Powered by blists - more mailing lists