lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <14ed5468-156f-6ac4-ba1e-283a88bab917@oracle.com>
Date:   Tue, 9 Feb 2021 10:56:20 -0800
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Miaohe Lin <linmiaohe@...wei.com>, akpm@...ux-foundation.org
Cc:     almasrymina@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] hugetlb_cgroup: fix unbalanced css_put for shared
 mappings

On 2/8/21 7:27 PM, Miaohe Lin wrote:
> On 2021/2/9 3:52, Mike Kravetz wrote:
>> On 1/23/21 1:31 AM, Miaohe Lin wrote:
>>> The current implementation of hugetlb_cgroup for shared mappings could have
>>> different behavior. Consider the following two scenarios:
>>>
>>> 1.Assume initial css reference count of hugetlb_cgroup is 1:
>>>   1.1 Call hugetlb_reserve_pages with from = 1, to = 2. So css reference
>>> count is 2 associated with 1 file_region.
>>>   1.2 Call hugetlb_reserve_pages with from = 2, to = 3. So css reference
>>> count is 3 associated with 2 file_region.
>>>   1.3 coalesce_file_region will coalesce these two file_regions into one.
>>> So css reference count is 3 associated with 1 file_region now.
>>>
>>> 2.Assume initial css reference count of hugetlb_cgroup is 1 again:
>>>   2.1 Call hugetlb_reserve_pages with from = 1, to = 3. So css reference
>>> count is 2 associated with 1 file_region.
>>>
>>> Therefore, we might have one file_region while holding one or more css
>>> reference counts. This inconsistency could lead to unbalanced css_put().
>>> If we do css_put one by one (i.g. hole punch case), scenario 2 would put
>>> one more css reference. If we do css_put all together (i.g. truncate case),
>>> scenario 1 will leak one css reference.
>>
>> Sorry for the delay in replying.  This is tricky code and I needed some quiet
>> time to study it.
>>
> 
> That's fine. I was trying to catch more buggy case too.
> 
>> I agree that the issue described exists.  Can you describe what a user would
>> see in the above imbalance scenarios?  What happens if we do one too many
>> css_put calls?  What happens if we leak the reference and do not do the
>> required number of css_puts?
>>
> 
> The imbalanced css_get/css_put would result in a non-zero reference when we try to
> destroy the hugetlb cgroup. The hugetlb cgroup dir is removed __but__ associated
> resource is not freed. This might result in OOM or can not create a new hugetlb cgroup
> in a really busy workload finally.
> 
>> The code changes look correct.
>>
>> I just wish this code was not so complicated.  I think the private mapping
>> case could be simplified to only take a single css_ref per reserve map.
> 
> Could you explain this more?
> It seems one reserve map already takes a single css_ref. And a hugepage outside
> reservation would take a single css_ref too.

Let me preface this by saying that my cgroup knowledge is limited.
For private mappings, all reservations will be associated with the same cgroup.
This is because, only one process can access the mapping.  Since there is only
one process, we only need to hold one css reference.  Individual counters can
be incremented as needed without increasing the css reference count.  We
take a reference when the reserv map is created and drop the reference when it
is deleted.

This does not work for shared mappings as you can have multiple processes in
multiple cgroups taking reservations on the same file.  This is why you need
per-reservation reference accounting in this case.

-- 
Mike Kravetz

> 
>> However, for shared mappings we need to track each individual reservation
>> which adds the complexity.  I can not think of a better way to do things.
>>
> 
> I can't figure out one too. And the fix might make the code more complex. :(
> 
>> Please update commit message with an explanation of what users might see
>> because of this issue and resubmit as a patch.
>>
> 
> Will do. Thanks.
> 
>> Thanks,
>>
> 
> Many thanks for reply. :)
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ