lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <f3f665a5-0e80-2953-7ab6-258f5b7d0225@redhat.com>
Date:   Tue, 8 Mar 2022 17:24:19 -0700
From:   Nico Pache <npache@...hat.com>
To:     Michal Hocko <mhocko@...e.com>
Cc:     Waiman Long <longman@...hat.com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, akpm@...ux-foundation.org,
        jsavitz@...hat.com, peterz@...radead.org, tglx@...utronix.de,
        mingo@...hat.com, dvhart@...radead.org, dave@...olabs.net,
        andrealmeid@...labora.com
Subject: Re: [PATCH v3] mm/oom: do not oom reap task with an unresolved robust
 futex



On 3/3/22 00:48, Michal Hocko wrote:
> On Wed 02-03-22 12:26:45, Nico Pache wrote:
>>
>>
>> On 3/2/22 09:24, Michal Hocko wrote:
>>> Sorry, this has slipped through cracks.
>>>
>>> On Mon 14-02-22 15:39:31, Nico Pache wrote:
>>> [...]
>>>> We've recently been discussing the following if statement in __oom_reap_task_mm:
>>>> 	if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED))
>>>>
>>>> Given the comment above it, and some of the upstream discussion the original
>>>> RFC, we are struggling to see why this should be a `||` and not an `&&`. If we
>>>> only want to reap anon memory and reaping shared memory can be dangerous is this
>>>> statement incorrect?
>>>>
>>>> We have a patch queued up to make this change, but wanted to get your opinion on
>>>> why this was originally designed this way in case we are missing something.
>>>
>>> I do not really see why this would be wrong. Private file backed
>>> mappings can contain a reapable memory as well. I do not see how this
>>> would solve the futex issue
>> We were basing our discussion around the following comment:
>> /*
>>  * Only anonymous pages have a good chance to be dropped
>>  * without additional steps which we cannot afford as we
>>  * are OOM already.
>>  *
>>  * We do not even care about fs backed pages because all
>>  * which are reclaimable have already been reclaimed and
>>  * we do not want to block exit_mmap by keeping mm ref
>>  * count elevated without a good reason.
>>  */
>>
>> So changing to an && would align the functionality with this comment by ignoring
>> fs backed pages, and additionally it prevents shared mappings from being reaped.
>> We have tested this change and found we can no longer reproduce the issue. In
>> our case we allocate the mutex on a MAP_SHARED|MAP_ANONYMOUS mmap so the if-
>> statement in question would no longer return true after the && change.
>>
>> If it is the case that private fs backed pages matter perhaps we want something
>> like this:
>> 	if ((vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED))
>> 	||(!vma_is_anonymous(vma) && !(vma->vm_flags & VM_SHARED)))
>>
>> or more simply:
>> 	if(!(vma->vm_flags & VM_SHARED))
>>
>> to exclude all VM_SHARED mappings.
> 
> I would have to think about that some more but I do not really see how
> this is related to the futex issue. In other words what kind of problem
> does this solve?
> 

We had a misunderstanding of what vma_is_anonymous actually checks for... It
returns true if the VMA is PRIVATE|ANONYMOUS. We may follow up with a patch to
change the name of this function or at least add a comment at the top of the
function to be more descriptive. Furthermore, we ended up being able to
reproduce this issue on the && kernel.

We have also found the actual cause of the issue, and we'll post that fix. Its
related to the glibc allocation done for pthreads as we discussed earlier in
this thread. The mapping that stores the futex robust list is in userspace and a
race occurs between the oom_reap_task_mm and the exit path that handles the
futex cleanup.

Cheers,
-- Nico

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ