linux-kernel - Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <202003170318.02H3IpSx047471@www262.sakura.ne.jp>
Date:   Tue, 17 Mar 2020 12:18:51 +0900
From:   Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
To:     David Rientjes <rientjes@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        Vlastimil Babka <vbabka@...e.cz>,
        Michal Hocko <mhocko@...nel.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org
Subject: Re: [patch] mm, oom: prevent soft lockup on memcg oom for UP systems

David Rientjes wrote:
> On Sat, 14 Mar 2020, Tetsuo Handa wrote:
> > If current thread is an OOM victim, schedule_timeout_killable(1) will give other
> > threads (including the OOM reaper kernel thread) CPU time to run, by leaving
> > try_charge() path due to should_force_charge() == true and reaching do_exit() path
> > instead of returning to userspace code doing "for (;;);".
> > 
> > Unless the problem is that current thread cannot reach should_force_charge() check,
> > schedule_timeout_killable(1) should work.
> > 
> 
> No need to yield if current is the oom victim, allowing the oom reaper to 
> run when it may not actually be able to free memory is not required.  It 
> increases the likelihood that some other process schedules and is unable 
> to yield back due to the memcg oom condition such that the victim doesn't 
> get a chance to run again.
> 
> This happens because the victim is allowed to overcharge but other 
> processes attached to an oom memcg hierarchy simply fail the charge.  We 
> are then reliant on all memory chargers in the kernel to yield if their 
> charges fail due to oom.  It's the only way to allow the victim to 
> eventually run.
> 
> So the only change that I would make to your patch is to do this in 
> mem_cgroup_out_of_memory() instead:
> 
> 	if (!fatal_signal_pending(current))
> 		schedule_timeout_killable(1);
> 
> So we don't have this reliance on all other memory chargers to yield when 
> their charge fails and there is no delay for victims themselves.

I see. You want below functions for environments where current thread can
fail to resume execution for long if current thread once reschedules (e.g.
UP kernel, many threads contended on one CPU).

/*
 * Give other threads CPU time, unless current thread was already killed.
 * Used when we prefer killed threads to continue execution (in a hope that
 * killed threads terminate quickly) over giving other threads CPU time.
 */
signed long __sched schedule_timeout_killable_expedited(signed long timeout)
{
	if (unlikely(fatal_signal_pending(current)))
		return timeout;
	return schedule_timeout_killable(timeout);
}

/*
 * Latency reduction via explicit rescheduling in places that are safe,
 * but becomes no-op if current thread was already killed. Used when we
 * prefer killed threads to continue execution (in a hope that killed
 * threads terminate quickly) over giving other threads CPU time.
 */
int cond_resched_expedited(void)
{
	if (unlikely(fatal_signal_pending(current)))
		return 0;
	return cond_resched();
}

> 
>  [ I'll still propose my change that adds cond_resched() to 
>    shrink_node_memcgs() because we can see need_resched set for a 
>    prolonged period of time without scheduling. ]

As long as there is schedule_timeout_killable(), I'm fine with adding
cond_resched() in other places.

> 
> If you agree, I'll propose your patch with a changelog that indicates it 
> can fix the soft lockup issue for UP and can likely get a tested-by for 
> it.
> 

Please go ahead.