linux-kernel - Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a4ed0dce-3741-9a93-fc67-6955ecac2454@oracle.com>
Date:   Mon, 28 Aug 2017 11:51:28 -0700
From:   Mike Kravetz <mike.kravetz@...cle.com>
To:     Michal Hocko <mhocko@...nel.org>
Cc:     Nadav Amit <namit@...are.com>,
        Nadia Yvette Chambers <nyc@...omorphy.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Eric Biggers <ebiggers3@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] hugetlbfs: change put_page/unlock_page order in
 hugetlbfs_fallocate()

On 08/28/2017 11:09 AM, Michal Hocko wrote:
> On Mon 28-08-17 10:45:58, Mike Kravetz wrote:
>> Adding Andrew, Michal on CC
>>
>> On 08/27/2017 01:08 PM, Nadav Amit wrote:
>>> Mike Kravetz <mike.kravetz@...cle.com> wrote:
>>>
>>>> On 08/26/2017 12:11 PM, Nadav Amit wrote:
>>>>> hugetlfs_fallocate() currently performs put_page() before unlock_page().
>>>>> This scenario opens a small time window, from the time the page is added
>>>>> to the page cache, until it is unlocked, in which the page might be
>>>>> removed from the page-cache by another core. If the page is removed
>>>>> during this time windows, it might cause a memory corruption, as the
>>>>> wrong page will be unlocked.
>>>>>
>>>>> It is arguable whether this scenario can happen in a real system, and
>>>>> there are several mitigating factors. The issue was found by code
>>>>> inspection (actually grep), and not by actually triggering the flow.
>>>>> Yet, since putting the page before unlocking is incorrect it should be
>>>>> fixed, if only to prevent future breakage or someone copy-pasting this
>>>>> code.
>>>>>
>>>>> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()")
>>>>>
>>>>> cc: Eric Biggers <ebiggers3@...il.com>
>>>>> cc: Mike Kravetz <mike.kravetz@...cle.com>
>>>>>
>>>>> Signed-off-by: Nadav Amit <namit@...are.com>
>>>>
>>>> Thank you Nadav.
>>>
>>> No problem.
>>>
>>>>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@...cle.com>
>>>>
>>>> Since hugetlbfs is an in memory filesystem, the only way one 'should' be
>>>> able to remove a page (file content) is through an inode operation such as
>>>> truncate, hole punch, or unlink.  That was the basis for my response that
>>>> the inode lock would be required for page freeing.
>>>>
>>>> Eric's question about sys_fadvise64(POSIX_FADV_DONTNEED) is interesting.
>>>> I was expecting to see a check for hugetlbfs pages and exit (without
>>>> modification) if encountered.  A quick review of the code did not find
>>>> any such checks.
>>>>
>>>> I'll take a closer look to determine exactly how hugetlbfs files are
>>>> handled.  IMO, there should be something similar to the DAX check where
>>>> the routine quickly exits.
>>>
>>> I did not cc stable when submitting the patch, based on your previous
>>> response. Let me know if you want me to send v2 which does so.
>>
>> I still do not believe there is a need to change this in stable.  Your patch
>> should be sufficient to ensure we do the right thing going forward.
>>
>> Looking at and testing the sys_fadvise64(POSIX_FADV_DONTNEED) code with
>> hugetlbfs does indeed show a more general problem.  One can use
>> sys_fadvise64() to remove a huge page from a hugetlbfs file. :(  This does
>> not go through the special hugetlbfs page handling code, but rather the
>> normal mm paths.  As a result hugetlbfs accounting (like reserve counts)
>> gets out of sync and the hugetlbfs filesystem may become unusable.  Sigh!!!
>>
>> I will address this issue in a separate patch.
> 
> I didn't check very carefully but it seems that
> http://ozlabs.org/~akpm/mmotm/broken-out/mm-fadvise-avoid-fadvise-for-fs-without-backing-device.patch
> should help here, right?

Thanks Michal.

Yes, that patch addresses the above issue with hugetlbfs.  I was also
wondering if there were similar issues with other in memory filesystems.
Looks like there are.

-- 
Mike Kravetz