linux-kernel - Re: [patch 4/8] memcg: rework soft limit reclaim

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTimuRks4+h=Kjt2Lzc-s-XsAHCH9vg@mail.gmail.com>
Date:	Thu, 2 Jun 2011 22:25:29 -0700
From:	Ying Han <yinghan@...gle.com>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Daisuke Nishimura <nishimura@....nes.nec.co.jp>,
	Balbir Singh <balbir@...ux.vnet.ibm.com>,
	Michal Hocko <mhocko@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>,
	Minchan Kim <minchan.kim@...il.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>, Greg Thelen <gthelen@...gle.com>,
	Michel Lespinasse <walken@...gle.com>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [patch 4/8] memcg: rework soft limit reclaim

On Thu, Jun 2, 2011 at 2:55 PM, Ying Han <yinghan@...gle.com> wrote:
> On Tue, May 31, 2011 at 11:25 PM, Johannes Weiner <hannes@...xchg.org> wrote:
>> Currently, soft limit reclaim is entered from kswapd, where it selects
>> the memcg with the biggest soft limit excess in absolute bytes, and
>> reclaims pages from it with maximum aggressiveness (priority 0).
>>
>> This has the following disadvantages:
>>
>>    1. because of the aggressiveness, kswapd can be stalled on a memcg
>>    that is hard to reclaim from for a long time, sending the rest of
>>    the allocators into direct reclaim in the meantime.
>>
>>    2. it only considers the biggest offender (in absolute bytes, no
>>    less, so very unhandy for setups with different-sized memcgs) and
>>    does not apply any pressure at all on other memcgs in excess.
>>
>>    3. because it is only invoked from kswapd, the soft limit is
>>    meaningful during global memory pressure, but it is not taken into
>>    account during hierarchical target reclaim where it could allow
>>    prioritizing memcgs as well.  So while it does hierarchical
>>    reclaim once triggered, it is not a truly hierarchical mechanism.
>>
>> Here is a different approach.  Instead of having a soft limit reclaim
>> cycle separate from the rest of reclaim, this patch ensures that each
>> time a group of memcgs is reclaimed - be it because of global memory
>> pressure or because of a hard limit - memcgs that exceed their soft
>> limit, or contribute to the soft limit excess of one their parents,
>> are reclaimed from at a higher priority than their siblings.
>>
>> This results in the following:
>>
>>    1. all relevant memcgs are scanned with increasing priority during
>>    memory pressure.  The primary goal is to free pages, not to punish
>>    soft limit offenders.
>>
>>    2. increased pressure is applied to all memcgs in excess of their
>>    soft limit, not only the biggest offender.
>>
>>    3. the soft limit becomes meaningful for target reclaim as well,
>>    where it allows prioritizing children of a hierarchy when the
>>    parent hits its limit.
>>
>>    4. direct reclaim now also applies increased soft limit pressure,
>>    not just kswapd anymore.
>>
>> Signed-off-by: Johannes Weiner <hannes@...xchg.org>
>> ---
>>  include/linux/memcontrol.h |    7 +++++++
>>  mm/memcontrol.c            |   26 ++++++++++++++++++++++++++
>>  mm/vmscan.c                |    8 ++++++--
>>  3 files changed, 39 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 8f402b9..7d99e87 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -104,6 +104,7 @@ extern void mem_cgroup_end_migration(struct mem_cgroup *mem,
>>  struct mem_cgroup *mem_cgroup_hierarchy_walk(struct mem_cgroup *,
>>                                             struct mem_cgroup *);
>>  void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *, struct mem_cgroup *);
>> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *, struct mem_cgroup *);
>>
>>  /*
>>  * For memory reclaim.
>> @@ -345,6 +346,12 @@ static inline void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *r,
>>  {
>>  }
>>
>> +static inline bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root,
>> +                                                 struct mem_cgroup *mem)
>> +{
>> +       return false;
>> +}
>> +
>>  static inline void
>>  mem_cgroup_print_oom_info(struct mem_cgroup *memcg, struct task_struct *p)
>>  {
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 983efe4..94f77cc3 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -1460,6 +1460,32 @@ void mem_cgroup_stop_hierarchy_walk(struct mem_cgroup *root,
>>                css_put(&mem->css);
>>  }
>>
>> +/**
>> + * mem_cgroup_soft_limit_exceeded - check if a memcg (hierarchically)
>> + *                                  exceeds a soft limit
>> + * @root: highest ancestor of @mem to consider
>> + * @mem: memcg to check for excess
>> + *
>> + * The function indicates whether @mem has exceeded its own soft
>> + * limit, or contributes to the soft limit excess of one of its
>> + * parents in the hierarchy below @root.
>> + */
>> +bool mem_cgroup_soft_limit_exceeded(struct mem_cgroup *root,
>> +                                   struct mem_cgroup *mem)
>> +{
>> +       for (;;) {
>> +               if (mem == root_mem_cgroup)
>> +                       return false;
>> +               if (res_counter_soft_limit_excess(&mem->res))
>> +                       return true;
>> +               if (mem == root)
>> +                       return false;
>> +               mem = parent_mem_cgroup(mem);
>> +               if (!mem)
>> +                       return false;
>> +       }
>> +}
>> +
>>  static unsigned long mem_cgroup_reclaim(struct mem_cgroup *mem,
>>                                        gfp_t gfp_mask,
>>                                        unsigned long flags)
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index c7d4b44..0163840 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -1988,9 +1988,13 @@ static void shrink_zone(int priority, struct zone *zone,
>>                unsigned long reclaimed = sc->nr_reclaimed;
>>                unsigned long scanned = sc->nr_scanned;
>>                unsigned long nr_reclaimed;
>> +               int epriority = priority;
>> +
>> +               if (mem_cgroup_soft_limit_exceeded(root, mem))
>> +                       epriority -= 1;
>
> Here we grant the ability to shrink from all the memcgs, but only
> higher the priority for those exceed the soft_limit. That is a design
> change
> for the "soft_limit" which giving a hint to which memcgs to reclaim
> from first under global memory pressure.


Basically, we shouldn't reclaim from a memcg under its soft_limit
unless we have trouble reclaim pages from others. Something like the
following makes better sense:

diff --git a/mm/vmscan.c b/mm/vmscan.c
index bdc2fd3..b82ba8c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1989,6 +1989,8 @@ restart:
        throttle_vm_writeout(sc->gfp_mask);
 }

+#define MEMCG_SOFTLIMIT_RECLAIM_PRIORITY       2
+
 static void shrink_zone(int priority, struct zone *zone,
                                struct scan_control *sc)
 {
@@ -2001,13 +2003,13 @@ static void shrink_zone(int priority, struct zone *zone,
                unsigned long reclaimed = sc->nr_reclaimed;
                unsigned long scanned = sc->nr_scanned;
                unsigned long nr_reclaimed;
-               int epriority = priority;

-               if (mem_cgroup_soft_limit_exceeded(root, mem))
-                       epriority -= 1;
+               if (!mem_cgroup_soft_limit_exceeded(root, mem) &&
+                               priority > MEMCG_SOFTLIMIT_RECLAIM_PRIORITY)
+                       continue;

                sc->mem_cgroup = mem;
-               do_shrink_zone(epriority, zone, sc);
+               do_shrink_zone(priority, zone, sc);
                mem_cgroup_count_reclaim(mem, current_is_kswapd(),
                                         mem != root, /* limit or hierarchy? */
                                         sc->nr_scanned - scanned,

--Ying
>
> --Ying
>
>
>>
>>                sc->mem_cgroup = mem;
>> -               do_shrink_zone(priority, zone, sc);
>> +               do_shrink_zone(epriority, zone, sc);
>>                mem_cgroup_count_reclaim(mem, current_is_kswapd(),
>>                                         mem != root, /* limit or hierarchy? */
>>                                         sc->nr_scanned - scanned,
>> @@ -2480,7 +2484,7 @@ loop_again:
>>                         * Call soft limit reclaim before calling shrink_zone.
>>                         * For now we ignore the return value
>>                         */
>> -                       mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
>> +                       //mem_cgroup_soft_limit_reclaim(zone, order, sc.gfp_mask);
>>
>>                        /*
>>                         * We put equal pressure on every zone, unless
>> --
>> 1.7.5.2
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/