linux-kernel - Re: [PATCH] memcg, oom: move out_of

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <xr93woubj3ur.fsf@gthelen.svl.corp.google.com>
Date:   Tue, 03 Jul 2018 16:29:00 -0700
From:   Greg Thelen <gthelen@...gle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Shakeel Butt <shakeelb@...gle.com>, linux-mm@...ck.org,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] memcg, oom: move out_of_memory back to the charge path

Michal Hocko <mhocko@...nel.org> wrote:

> On Tue 03-07-18 00:08:05, Greg Thelen wrote:
>> Michal Hocko <mhocko@...nel.org> wrote:
>> 
>> > On Fri 29-06-18 11:59:04, Greg Thelen wrote:
>> >> Michal Hocko <mhocko@...nel.org> wrote:
>> >> 
>> >> > On Thu 28-06-18 16:19:07, Greg Thelen wrote:
>> >> >> Michal Hocko <mhocko@...nel.org> wrote:
>> >> > [...]
>> >> >> > +	if (mem_cgroup_out_of_memory(memcg, mask, order))
>> >> >> > +		return OOM_SUCCESS;
>> >> >> > +
>> >> >> > +	WARN(1,"Memory cgroup charge failed because of no reclaimable memory! "
>> >> >> > +		"This looks like a misconfiguration or a kernel bug.");
>> >> >> 
>> >> >> I'm not sure here if the warning should here or so strongly worded.  It
>> >> >> seems like the current task could be oom reaped with MMF_OOM_SKIP and
>> >> >> thus mem_cgroup_out_of_memory() will return false.  So there's nothing
>> >> >> alarming in that case.
>> >> >
>> >> > If the task is reaped then its charges should be released as well and
>> >> > that means that we should get below the limit. Sure there is some room
>> >> > for races but this should be still unlikely. Maybe I am just
>> >> > underestimating though.
>> >> >
>> >> > What would you suggest instead?
>> >> 
>> >> I suggest checking MMF_OOM_SKIP or deleting the warning.
>> >
>> > So what do you do when you have MMF_OOM_SKIP task? Do not warn? Checking
>> > for all the tasks would be quite expensive and remembering that from the
>> > task selection not nice either. Why do you think it would help much?
>> 
>> I assume we could just check current's MMF_OOM_SKIP - no need to check
>> all tasks.
>
> I still do not follow. If you are after a single task memcg then we
> should be ok. try_charge has a runaway for oom victims
> 	if (unlikely(tsk_is_oom_victim(current) ||
> 		     fatal_signal_pending(current) ||
> 		     current->flags & PF_EXITING))
> 		goto force;
>
> regardless of MMF_OOM_SKIP. So if there is a single process in the
> memcg, we kill it and the oom reaper kicks in and sets MMF_OOM_SKIP then
> we should bail out there. Or do I miss your intention?

For a single task memcg it seems that racing process cgroup migration
could trigger the new warning (I have attempted to reproduce this):

Processes A,B in memcg M1,M2.  M1 is oom.

  Process A[M1]               Process B[M2]

  M1 is oom
  try_charge(M1)
                              Move A M1=>M2
  mem_cgroup_oom()
  mem_cgroup_out_of_memory()
    out_of_memory()
      select_bad_process()
        sees nothing in M1
      return 0
    return 0
  WARN()


Another variant might be possible, this time with global oom:

Processes A,B in memcg M1,M2.  M1 is oom.

  Process A[M1]               Process B[M2]

  try_charge()
                              trigger global oom
                              reaper sets A.MMF_OOM_SKIP
  mem_cgroup_oom()
  mem_cgroup_out_of_memory()
    out_of_memory()
      select_bad_process()
        sees nothing in M1
      return 0
    return 0
  WARN()


These seem unlikely, so I'm fine with taking a wait-and-see approach.