lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 5 Apr 2018 15:36:40 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Roman Gushchin <guro@...com>
Cc:     linux-mm@...ck.org, Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Vladimir Davydov <vdavydov.dev@...il.com>,
        Tejun Heo <tj@...nel.org>, kernel-team@...com,
        cgroups@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 2/4] mm: memory.low hierarchical behavior

On Thu, Apr 05, 2018 at 07:59:19PM +0100, Roman Gushchin wrote:
> This patch aims to address an issue in current memory.low semantics,
> which makes it hard to use it in a hierarchy, where some leaf memory
> cgroups are more valuable than others.
> 
> For example, there are memcgs A, A/B, A/C, A/D and A/E:
> 
>   A      A/memory.low = 2G, A/memory.current = 6G
>  //\\
> BC  DE   B/memory.low = 3G  B/memory.current = 2G
>          C/memory.low = 1G  C/memory.current = 2G
>          D/memory.low = 0   D/memory.current = 2G
> 	 E/memory.low = 10G E/memory.current = 0
> 
> If we apply memory pressure, B, C and D are reclaimed at
> the same pace while A's usage exceeds 2G.
> This is obviously wrong, as B's usage is fully below B's memory.low,
> and C has 1G of protection as well.
> Also, A is pushed to the size, which is less than A's 2G memory.low,
> which is also wrong.
> 
> A simple bash script (provided below) can be used to reproduce
> the problem. Current results are:
>   A:    1430097920
>   A/B:  711929856
>   A/C:  717426688
>   A/D:  741376
>   A/E:  0
> 
> To address the issue a concept of effective memory.low is introduced.
> Effective memory.low is always equal or less than original memory.low.
> In a case, when there is no memory.low overcommittment (and also for
> top-level cgroups), these two values are equal.
> Otherwise it's a part of parent's effective memory.low, calculated as
> a cgroup's memory.low usage divided by sum of sibling's memory.low
> usages (under memory.low usage I mean the size of actually protected
> memory: memory.current if memory.current < memory.low, 0 otherwise).
> It's necessary to track the actual usage, because otherwise an empty
> cgroup with memory.low set (A/E in my example) will affect actual
> memory distribution, which makes no sense. To avoid traversing
> the cgroup tree twice, page_counters code is reused.
> 
> Calculating effective memory.low can be done in the reclaim path,
> as we conveniently traversing the cgroup tree from top to bottom and
> check memory.low on each level. So, it's a perfect place to calculate
> effective memory low and save it to use it for children cgroups.
> 
> This also eliminates a need to traverse the cgroup tree from bottom
> to top each time to check if parent's guarantee is not exceeded.
> 
> Setting/resetting effective memory.low is intentionally racy, but
> it's fine and shouldn't lead to any significant differences in
> actual memory distribution.
> 
> With this patch applied results are matching the expectations:
>   A:    2147930112
>   A/B:  1428721664
>   A/C:  718393344
>   A/D:  815104
>   A/E:  0
> 
> Test script:
>   #!/bin/bash
> 
>   CGPATH="/sys/fs/cgroup"
> 
>   truncate /file1 --size 2G
>   truncate /file2 --size 2G
>   truncate /file3 --size 2G
>   truncate /file4 --size 50G
> 
>   mkdir "${CGPATH}/A"
>   echo "+memory" > "${CGPATH}/A/cgroup.subtree_control"
>   mkdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E"
> 
>   echo 2G > "${CGPATH}/A/memory.low"
>   echo 3G > "${CGPATH}/A/B/memory.low"
>   echo 1G > "${CGPATH}/A/C/memory.low"
>   echo 0 > "${CGPATH}/A/D/memory.low"
>   echo 10G > "${CGPATH}/A/E/memory.low"
> 
>   echo $$ > "${CGPATH}/A/B/cgroup.procs" && vmtouch -qt /file1
>   echo $$ > "${CGPATH}/A/C/cgroup.procs" && vmtouch -qt /file2
>   echo $$ > "${CGPATH}/A/D/cgroup.procs" && vmtouch -qt /file3
>   echo $$ > "${CGPATH}/cgroup.procs" && vmtouch -qt /file4
> 
>   echo "A:   " `cat "${CGPATH}/A/memory.current"`
>   echo "A/B: " `cat "${CGPATH}/A/B/memory.current"`
>   echo "A/C: " `cat "${CGPATH}/A/C/memory.current"`
>   echo "A/D: " `cat "${CGPATH}/A/D/memory.current"`
>   echo "A/E: " `cat "${CGPATH}/A/E/memory.current"`
> 
>   rmdir "${CGPATH}/A/B" "${CGPATH}/A/C" "${CGPATH}/A/D" "${CGPATH}/A/E"
>   rmdir "${CGPATH}/A"
>   rm /file1 /file2 /file3 /file4
> 
> Signed-off-by: Roman Gushchin <guro@...com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Johannes Weiner <hannes@...xchg.org>
> Cc: Michal Hocko <mhocko@...nel.org>
> Cc: Vladimir Davydov <vdavydov.dev@...il.com>
> Cc: Tejun Heo <tj@...nel.org>
> Cc: kernel-team@...com
> Cc: linux-mm@...ck.org
> Cc: cgroups@...r.kernel.org
> Cc: linux-kernel@...r.kernel.org

Acked-by: Johannes Weiner <hannes@...xchg.org>

Powered by blists - more mailing lists