linux-kernel - Re: [PATCH] psi: fix PSI_MEM_FULL state when tasks are in memstall and doing reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YY9rohAcZ/1IGNDd@hirez.programming.kicks-ass.net>
Date:   Sat, 13 Nov 2021 08:39:14 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Brian Chen <brianchen118@...il.com>, brianc118@...com,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        open list <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] psi: fix PSI_MEM_FULL state when tasks are in memstall
 and doing reclaim

On Fri, Nov 12, 2021 at 11:53:20AM -0500, Johannes Weiner wrote:
> On Wed, Nov 10, 2021 at 09:33:12PM +0000, Brian Chen wrote:
> > We've noticed cases where tasks in a cgroup are stalled on memory but
> > there is little memory FULL pressure since tasks stay on the runqueue
> > in reclaim.
> > 
> > A simple example involves a single threaded program that keeps leaking
> > and touching large amounts of memory. It runs in a cgroup with swap
> > enabled, memory.high set at 10M and cpu.max ratio set at 5%. Though
> > there is significant CPU pressure and memory SOME, there is barely any
> > memory FULL since the task enters reclaim and stays on the runqueue.
> > However, this memory-bound task is effectively stalled on memory and
> > we expect memory FULL to match memory SOME in this scenario.
> > 
> > The code is confused about memstall && running, thinking there is a
> > stalled task and a productive task when there's only one task: a
> > reclaimer that's counted as both. To fix this, we redefine the
> > condition for PSI_MEM_FULL to check that all running tasks are in an
> > active memstall instead of checking that there are no running tasks.
> > 
> >         case PSI_MEM_FULL:
> > -               return unlikely(tasks[NR_MEMSTALL] && !tasks[NR_RUNNING]);
> > +               return unlikely(tasks[NR_MEMSTALL] &&
> > +                       tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]);
> > 
> > This will capture reclaimers. It will also capture tasks that called
> > psi_memstall_enter() and are about to sleep, but this should be
> > negligible noise.
> > 
> > Signed-off-by: Brian Chen <brianchen118@...il.com>
> 
> Acked-by: Johannes Weiner <hannes@...xchg.org>
> 
> This bug essentially causes us to count memory-some in walltime and
> memory-full in tasktime, which can be quite confusing and misleading
> in combined CPU and memory pressure situations.
> 
> The fix looks good to me, thanks Brian.
> 
> The bug's been there since the initial psi commit, so I don't think a
> stable backport is warranted.
> 
> Peter, absent objections, can you please pick this up through -tip?

Yep can do. Note that our psi_group_cpu data structure is now completely
filled (the extra tasks state filled the last hole):

struct psi_group_cpu {
	seqcount_t                 seq __attribute__((__aligned__(64))); /*     0     4 */
	unsigned int               tasks[5];             /*     4    20 */
	u32                        state_mask;           /*    24     4 */
	u32                        times[7];             /*    28    28 */
	u64                        state_start;          /*    56     8 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        times_prev[2][7] __attribute__((__aligned__(64))); /*    64    56 */

	/* size: 128, cachelines: 2, members: 6 */
	/* padding: 8 */
	/* forced alignments: 2 */
} __attribute__((__aligned__(64)));