[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YY9rohAcZ/1IGNDd@hirez.programming.kicks-ass.net>
Date: Sat, 13 Nov 2021 08:39:14 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Johannes Weiner <hannes@...xchg.org>
Cc: Brian Chen <brianchen118@...il.com>, brianc118@...com,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] psi: fix PSI_MEM_FULL state when tasks are in memstall
and doing reclaim
On Fri, Nov 12, 2021 at 11:53:20AM -0500, Johannes Weiner wrote:
> On Wed, Nov 10, 2021 at 09:33:12PM +0000, Brian Chen wrote:
> > We've noticed cases where tasks in a cgroup are stalled on memory but
> > there is little memory FULL pressure since tasks stay on the runqueue
> > in reclaim.
> >
> > A simple example involves a single threaded program that keeps leaking
> > and touching large amounts of memory. It runs in a cgroup with swap
> > enabled, memory.high set at 10M and cpu.max ratio set at 5%. Though
> > there is significant CPU pressure and memory SOME, there is barely any
> > memory FULL since the task enters reclaim and stays on the runqueue.
> > However, this memory-bound task is effectively stalled on memory and
> > we expect memory FULL to match memory SOME in this scenario.
> >
> > The code is confused about memstall && running, thinking there is a
> > stalled task and a productive task when there's only one task: a
> > reclaimer that's counted as both. To fix this, we redefine the
> > condition for PSI_MEM_FULL to check that all running tasks are in an
> > active memstall instead of checking that there are no running tasks.
> >
> > case PSI_MEM_FULL:
> > - return unlikely(tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]);
> > + return unlikely(tasks[NR_MEMSTALL] &&
> > + tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]);
> >
> > This will capture reclaimers. It will also capture tasks that called
> > psi_memstall_enter() and are about to sleep, but this should be
> > negligible noise.
> >
> > Signed-off-by: Brian Chen <brianchen118@...il.com>
>
> Acked-by: Johannes Weiner <hannes@...xchg.org>
>
> This bug essentially causes us to count memory-some in walltime and
> memory-full in tasktime, which can be quite confusing and misleading
> in combined CPU and memory pressure situations.
>
> The fix looks good to me, thanks Brian.
>
> The bug's been there since the initial psi commit, so I don't think a
> stable backport is warranted.
>
> Peter, absent objections, can you please pick this up through -tip?
Yep can do. Note that our psi_group_cpu data structure is now completely
filled (the extra tasks state filled the last hole):
struct psi_group_cpu {
seqcount_t seq __attribute__((__aligned__(64))); /* 0 4 */
unsigned int tasks[5]; /* 4 20 */
u32 state_mask; /* 24 4 */
u32 times[7]; /* 28 28 */
u64 state_start; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
u32 times_prev[2][7] __attribute__((__aligned__(64))); /* 64 56 */
/* size: 128, cachelines: 2, members: 6 */
/* padding: 8 */
/* forced alignments: 2 */
} __attribute__((__aligned__(64)));
Powered by blists - more mailing lists