lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87364hwf7d.fsf@x220.int.ebiederm.org>
Date:   Thu, 20 Aug 2020 09:43:18 -0500
From:   ebiederm@...ssion.com (Eric W. Biederman)
To:     Oleg Nesterov <oleg@...hat.com>
Cc:     Michal Hocko <mhocko@...e.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        christian.brauner@...ntu.com, mingo@...nel.org,
        peterz@...radead.org, tglx@...utronix.de, esyr@...hat.com,
        christian@...lner.me, areber@...hat.com, shakeelb@...gle.com,
        cyphar@...har.com, adobriyan@...il.com, akpm@...ux-foundation.org,
        gladkov.alexey@...il.com, walken@...gle.com,
        daniel.m.jordan@...cle.com, avagin@...il.com,
        bernd.edlinger@...mail.de, john.johansen@...onical.com,
        laoar.shao@...il.com, timmurray@...gle.com, minchan@...nel.org,
        kernel-team@...roid.com, linux-kernel@...r.kernel.org,
        linux-fsdevel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH 1/1] mm, oom_adj: don't loop through tasks in __set_oom_adj when not necessary

Oleg Nesterov <oleg@...hat.com> writes:

> On 08/20, Eric W. Biederman wrote:
>>
>> --- a/fs/exec.c
>> +++ b/fs/exec.c
>> @@ -1139,6 +1139,10 @@ static int exec_mmap(struct mm_struct *mm)
>>  	vmacache_flush(tsk);
>>  	task_unlock(tsk);
>>  	if (old_mm) {
>> +		mm->oom_score_adj = old_mm->oom_score_adj;
>> +		mm->oom_score_adj_min = old_mm->oom_score_adj_min;
>> +		if (tsk->vfork_done)
>> +			mm->oom_score_adj = tsk->vfork_oom_score_adj;
>
> too late, ->vfork_done is NULL after mm_release().

Good point.  

> And this can race with __set_oom_adj(). Yes, the current code is racy too,
> but this change adds another race, __set_oom_adj() could already observe
> ->mm != NULL and update mm->oom_score_adj.

I am not certain about races but we should be able to do something like:

in exec_mmap:
        if (old_mm) {
		mm->oom_score_adj = old_mm->oom_score_adj;
        	mm->oom_score_adj_min = old_mm->oom_score_adj_min;
        	if (tsk->signal->vfork_oom_score_adj_set) {
                	mm->oom_score_adj = tsk->vfork_oom_score_adj;
                	tsk->signal->vfork_oom_score_adj_set = false;
                }
        }

in __set_oom_adj:
	if (mm) {
		mm->oom_score_adj = oom_adj;
                tsk->signal->vfork_oom_score_adj_set = false;
        } else {
		tsk->vfork_score_adj = old_mm->oom_score_adj;
                tsk->signal->vfork_oom_score_adj_set = true;
        }

There might even be a special oom_score_adj value we can use instead of
a separate flag.  I am just not familiar enough with oom_score_adj to know.

We should be able to do something like that where we know the value is
set and only use it if so.  And a subsequent _set_oom_adj without
observing vfork_done set will clear the value in signal_struct.

We have to be a bit careful to get the details right but it should be
straight forward.


Michal also has a point about oom_score_adj_min, and I really don't
understand the oom logic value well enough to guess how that should
work.


Although to deal with some of the races it probably only makes sense
to call complete_vfork_done in exec after the new mm has been installed,
and while exec_update_mutex is held.  I don't think anyone every
anticipated using vfork_done as a flag.

Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ