linux-kernel - Re: [PATCH] mm/memcontrol: fix a data race in scan count

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20200209202840.2bf97ffcfa811550d733c461@linux-foundation.org>
Date:   Sun, 9 Feb 2020 20:28:40 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Qian Cai <cai@....pw>
Cc:     hannes@...xchg.org, mhocko@...nel.org, vdavydov.dev@...il.com,
        cgroups@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm/memcontrol: fix a data race in scan count

On Wed,  5 Feb 2020 22:49:45 -0500 Qian Cai <cai@....pw> wrote:

> struct mem_cgroup_per_node mz.lru_zone_size[zone_idx][lru] could be
> accessed concurrently as noticed by KCSAN,
> 
> ...
>
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 95 PID: 50964 Comm: cc1 Tainted: G        W  O L    5.5.0-next-20200204+ #6
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> 
> The write is under lru_lock, but the read is done as lockless. The scan
> count is used to determine how aggressively the anon and file LRU lists
> should be scanned. Load tearing could generate an inefficient heuristic,
> so fix it by adding READ_ONCE() for the read.
> 
> ...
>
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -533,7 +533,7 @@ unsigned long mem_cgroup_get_zone_lru_size(struct lruvec *lruvec,
>  	struct mem_cgroup_per_node *mz;
>  
>  	mz = container_of(lruvec, struct mem_cgroup_per_node, lruvec);
> -	return mz->lru_zone_size[zone_idx][lru];
> +	return READ_ONCE(mz->lru_zone_size[zone_idx][lru]);
>  }

I worry about the readability/maintainability of these things.  A naive
reader who comes upon this code will wonder "why the heck is it using
READ_ONCE?".  A possibly lengthy trawl through the git history will
reveal the reason but that's rather unkind.  Wouldn't a simple

	/* modified under lru_lock, so use READ_ONCE */

improve the situation?