lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e3383f71-6b33-9e44-d66e-aa889a2f183d@redhat.com>
Date:   Wed, 7 Jun 2023 21:27:53 +0200
From:   David Hildenbrand <david@...hat.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>,
        Gerd Hoffmann <kraxel@...hat.com>
Cc:     Junxiao Chang <junxiao.chang@...el.com>, akpm@...ux-foundation.org,
        kirill.shutemov@...ux.intel.com, mhocko@...e.com,
        jmarchan@...hat.com, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, muchun.song@...ux.dev,
        Vivek Kasireddy <vivek.kasireddy@...el.com>,
        Dongwon Kim <dongwon.kim@...el.com>,
        James Houghton <jthoughton@...gle.com>,
        dri-devel@...ts.freedesktop.org
Subject: Re: [PATCH] mm: fix hugetlb page unmap count balance issue

On 17.05.23 00:34, Mike Kravetz wrote:
> On 05/15/23 10:04, Mike Kravetz wrote:
>> On 05/12/23 16:29, Mike Kravetz wrote:
>>> On 05/12/23 14:26, James Houghton wrote:
>>>> On Fri, May 12, 2023 at 12:20 AM Junxiao Chang <junxiao.chang@...el.com> wrote:
>>>>
>>>> This alone doesn't fix mapcounting for PTE-mapped HugeTLB pages. You
>>>> need something like [1]. I can resend it if that's what we should be
>>>> doing, but this mapcounting scheme doesn't work when the page structs
>>>> have been freed.
>>>>
>>>> It seems like it was a mistake to include support for hugetlb memfds in udmabuf.
>>>
>>> IIUC, it was added with commit 16c243e99d33 udmabuf: Add support for mapping
>>> hugepages (v4).  Looks like it was never sent to linux-mm?  That is unfortunate
>>> as hugetlb vmemmap freeing went in at about the same time.  And, as you have
>>> noted udmabuf will not work if hugetlb vmemmap freeing is enabled.
>>>
>>> Sigh!
>>>
>>> Trying to think of a way forward.
>>> -- 
>>> Mike Kravetz
>>>
>>>>
>>>> [1]: https://lore.kernel.org/linux-mm/20230306230004.1387007-2-jthoughton@google.com/
>>>>
>>>> - James
>>
>> Adding people and list on Cc: involved with commit 16c243e99d33.
>>
>> There are several issues with trying to map tail pages of hugetllb pages
>> not taken into account with udmabuf.  James spent quite a bit of time trying
>> to understand and address all the issues with the HGM code.  While using
>> the scheme proposed by James, may be an approach to the mapcount issue there
>> are also other issues that need attention.  For example, I do not see how
>> the fault code checks the state of the hugetlb page (such as poison) as none
>> of that state is carried in tail pages.
>>
>> The more I think about it, the more I think udmabuf should treat hugetlb
>> pages as hugetlb pages.  They should be mapped at the appropriate level
>> in the page table.  Of course, this would impose new restrictions on the
>> API (mmap and ioctl) that may break existing users.  I have no idea how
>> extensively udmabuf is being used with hugetlb mappings.
> 
> Verified that using udmabug on a hugetlb mapping with vmemmap optimization will
> BUG as:
> 
> [14106.812312] BUG: unable to handle page fault for address: ffffea000a7c4030
> [14106.813704] #PF: supervisor write access in kernel mode
> [14106.814791] #PF: error_code(0x0003) - permissions violation
> [14106.815921] PGD 27fff9067 P4D 27fff9067 PUD 27fff8067 PMD 17ec34067 PTE 8000000285dab021
> [14106.818489] Oops: 0003 [#1] PREEMPT SMP PTI
> [14106.819345] CPU: 2 PID: 2313 Comm: udmabuf Not tainted 6.4.0-rc1-next-20230508+ #44
> [14106.820906] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc37 04/01/2014
> [14106.822679] RIP: 0010:page_add_file_rmap+0x2e/0x270
> 
> I started looking more closely at the driver and I do not fully understand the
> usage model.  I took clues from the selftest and driver.  It seems the first
> step is to create a buffer via the UDMABUF_CREATE ioctl.  This will copy 4K
> pages from the page cache to an array associated with a file.  I did note that
> hugetlb and shm behavior is different here as the driver can not add missing
> hugetlb pages to the cache as it does with shm.  However, what seems more
> concerning is that there is nothing to prevent the pages from being replaced
> in the cache before being added to a udmabuf mapping.  This means udmabuf
> mapping and original memfd could be operating on a different set of pages.
> Is this acceptable, or somehow prevented?
> 
> In my role, I am more interested in udmabuf handling of hugetlb pages.
> Trying to use individual 4K pages of hugetlb pages is something that
> should be avoided here.  Would it be acceptable to change code so that
> only whole hugetlb pages are used by udmabuf?  If not, then perhaps the
> existing hugetlb support can be removed?

I'm wondering if that VMA shouldn't be some kind of special mapping 
(VM_PFNMAP), such that the struct page is entirely ignored?

I'm quite confused and concerned when I read that code (what the hell is 
it doing with shmem/hugetlb pages? why does the mapcount even get adjusted?)

This all has a bad smell to it, I hope I'm missing something important ...

-- 
Cheers,

David / dhildenb

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ