[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <276d2b85-0d5a-4bff-a408-604d823efef0@vivo.com>
Date: Mon, 7 Apr 2025 16:55:40 +0800
From: Huan Yang <link@...o.com>
To: Muchun Song <muchun.song@...ux.dev>
Cc: bingbu.cao@...ux.intel.com, Christoph Hellwig <hch@....de>,
Matthew Wilcox <willy@...radead.org>, Gerd Hoffmann <kraxel@...hat.com>,
Vivek Kasireddy <vivek.kasireddy@...el.com>,
Sumit Semwal <sumit.semwal@...aro.org>,
Christian König <christian.koenig@....com>,
Andrew Morton <akpm@...ux-foundation.org>,
Uladzislau Rezki <urezki@...il.com>, Shuah Khan <shuah@...nel.org>,
linux-kernel@...r.kernel.org, dri-devel@...ts.freedesktop.org,
linux-media@...r.kernel.org, linaro-mm-sig@...ts.linaro.org,
linux-mm@...ck.org, linux-kselftest@...r.kernel.org,
opensource.kernel@...o.com
Subject: Re: CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP is broken, was Re: [RFC
PATCH 0/6] Deep talk about folio vmap
在 2025/4/7 15:22, Muchun Song 写道:
>
>> On Apr 7, 2025, at 15:09, Huan Yang <link@...o.com> wrote:
>>
>>
>> 在 2025/4/7 14:43, Muchun Song 写道:
>>>> On Apr 7, 2025, at 11:37, Muchun Song <muchun.song@...ux.dev> wrote:
>>>>
>>>>
>>>>
>>>>> On Apr 7, 2025, at 11:21, Huan Yang <link@...o.com> wrote:
>>>>>
>>>>>
>>>>> 在 2025/4/7 10:57, Muchun Song 写道:
>>>>>>> On Apr 7, 2025, at 09:59, Huan Yang <link@...o.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>> 在 2025/4/4 18:07, Muchun Song 写道:
>>>>>>>>> On Apr 4, 2025, at 17:38, Muchun Song <muchun.song@...ux.dev> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Apr 4, 2025, at 17:01, Christoph Hellwig <hch@....de> wrote:
>>>>>>>>>>
>>>>>>>>>> After the btrfs compressed bio discussion I think the hugetlb changes that
>>>>>>>>>> skip the tail pages are fundamentally unsafe in the current kernel.
>>>>>>>>>>
>>>>>>>>>> That is because the bio_vec representation assumes tail pages do exist, so
>>>>>>>>>> as soon as you are doing direct I/O that generates a bvec starting beyond
>>>>>>>>>> the present head page things will blow up. Other users of bio_vecs might
>>>>>>>>>> do the same, but the way the block bio_vecs are generated are very suspect
>>>>>>>>>> to that. So we'll first need to sort that out and a few other things
>>>>>>>>>> before we can even think of enabling such a feature.
>>>>>>>>>>
>>>>>>>>> I would like to express my gratitude to Christoph for including me in the
>>>>>>>>> thread. I have carefully read the cover letter in [1], which indicates
>>>>>>>>> that an issue has arisen due to the improper use of `vmap_pfn()`. I'm
>>>>>>>>> wondering if we could consider using `vmap()` instead. In the HVO scenario,
>>>>>>>>> the tail struct pages do **exist**, but they are read-only. I've examined
>>>>>>>>> the code of `vmap()`, and it appears that it only reads the struct page.
>>>>>>>>> Therefore, it seems feasible for us to use `vmap()` (I am not a expert in
>>>>>>>>> udmabuf.). Right?
>>>>>>>> I believe my stance is correct. I've also reviewed another thread in [2].
>>>>>>>> Allow me to clarify and correct the viewpoints you presented. You stated:
>>>>>>>> "
>>>>>>>> So by HVO, it also not backed by pages, only contains folio head, each
>>>>>>>> tail pfn's page struct go away.
>>>>>>>> "
>>>>>>>> This statement is entirely inaccurate. The tail pages do not cease to exist;
>>>>>>>> rather, they are read-only. For your specific use-case, please use `vmap()`
>>>>>>>> to resolve the issue at hand. If you wish to gain a comprehensive understanding
>>>>>>> I see the document give a simple graph to point:
>>>>>>>
>>>>>>> +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+
>>>>>>> | | | 0 | -------------> | 0 |
>>>>>>> | | +-----------+ +-----------+
>>>>>>> | | | 1 | -------------> | 1 |
>>>>>>> | | +-----------+ +-----------+
>>>>>>> | | | 2 | ----------------^ ^ ^ ^ ^ ^
>>>>>>> | | +-----------+ | | | | |
>>>>>>> | | | 3 | ------------------+ | | | |
>>>>>>> | | +-----------+ | | | |
>>>>>>> | | | 4 | --------------------+ | | |
>>>>>>> | PMD | +-----------+ | | |
>>>>>>> | level | | 5 | ----------------------+ | |
>>>>>>> | mapping | +-----------+ | |
>>>>>>> | | | 6 | ------------------------+ |
>>>>>>> | | +-----------+ |
>>>>>>> | | | 7 | --------------------------+
>>>>>>> | | +-----------+
>>>>>>> | |
>>>>>>> | |
>>>>>>> | |
>>>>>>> +-----------+
>>>>>>>
>>>>>>> If I understand correct, each 2-7 tail's page struct is freed, so if I just need map page 2-7, can we use vmap do
>>>>>>>
>>>>>>> something correctly?
>>>>>> The answer is you can. It is essential to distinguish between virtual
>>>>> Thanks for your reply, but I still can't understand it. For example, I need vmap a hugetlb HVO folio's
>>>>>
>>>>> 2-7 page:
>>>>>
>>>>> struct page **pages = kvmalloc(sizeof(*pages), 6, GFP_KENREL);
>>>>>
>>>>> for (i = 2; i < 8; ++i)
>>>>>
>>>>> pages[i] = folio_page(folio, i); //set 2-7 range page into pages,
>>>>>
>>>>> void *vaddr = vmap(pages, 6, 0, PAGE_KERNEL);
>>>>>
>>>>> For no HVO pages, this can work. If HVO enabled, do "pages[i] = folio_page(folio, i);" just
>>>>>
>>>>> got the head page? and how vmap can correctly map each page?
>>>> Why do you think folio_page(folio, i) (i ≠ 0) returns the head page?
>>>> Is it speculation or tested? Please base it on the actual situation
>>>> instead of indulging in wild thoughts.
>>> By the way, in case you truly struggle to comprehend the fundamental
>>> aspects of HVO, I would like to summarize for you the user-visible
>>> behaviors in comparison to the situation where HVO is disabled.
>>>
>>> HVO Status Tail Page Structures Head Page Structures
>>> Enabled Read-Only (RO) Read-Write (RW)
>>> Disabled Read-Write (RW) Read-Write (RW)
>>>
>>> The sole distinction between the two scenarios lies in whether the
>>> tail page structures are allowed to be written or not. Please refrain
>>> from getting bogged down in the details of the implementation of HVO.
>> Thanks, I do a test, an figure out that I'm totally misunderstand it.
>>
>> Even if HVO enabled, tail page struct freed and point to head, linear mapping still exist, so that any page_to_pfn,
>>
>> page_to_virt(also folio's version), if start from head page can compute each need page like folio_page, can still work:
>>
>> hvo head 0xfffff9de849d0000, pfn=0x127400, wish offset_pfn 0x1275f1, idx 497 is 0xfffff9de849d7c40, pfn=0x1275f1.
>>
>> When vmap, we no need to touch actually page's content, just turn to pfn, so, work well.
> You are able to read those tail page structures. The reason why vmap can
> function is not that it doesn't read those page structures. What I mean
> is that vmap will still work even if it does read the page structures,
> because those tail page structures do indeed exist.
>
>> BTW, even if we need to touch actually input page struct, it point to head page, I guess will effect nothing.
> Allow me to clarify this for you to ensure that we have a shared understanding.
> Those tail page structures (virtual addresses in the vmemmap area) are mapped
> to the same page frame (physical page) to which the head page structures (virtual
> addresses in the vmemmap area) are mapped. It is analogous to the shared-mapping
> mechanism in the user space.
Thank you for your answer. I may understand it.
HVO do not release vmemmap page struct pointer array, just change it's va point to head page's.(vmemmap_remap_pte)
So:
1. any deal of page struct pointer still work, can get right pfn or something.
2. Any read of this va still work, we can get correct folio info, but can't change it.(PAGE_KERNEL_RO)
What I misunderstand ahead is vmemmap's page struct pointer also freed, what a fool. :(
Thanks,
Huan Yang
>
>> If anything still misunderstand, please corrent me. :)
>>
>> Muchun, thank you for your patience,
>>
>> Huan Yang
>>
>>> Thanks,
>>> Muchun.
>>>
>>>> Thanks,
>>>> Muchun.
>>>>
>>>>> Please correct me. :)
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Huan Yang
>>>>>
>>>>>> address (VA) and physical address (PA). The VAs of tail struct pages
>>>>>> aren't freed but remapped to the physical page mapped by the VA of the
>>>>>> head struct page (since contents of those tail physical pages are the
>>>>>> same). Thus, the freed pages are the physical pages mapped by original
>>>>>> tail struct pages, not their virtual addresses. Moreover, while it
>>>>>> is possible to read the virtual addresses of these tail struct pages,
>>>>>> any write operations are prohibited since it is within the realm of
>>>>>> acceptability that the kernel is expected to perform write operations
>>>>>> solely on the head struct page of a compound head and conduct read
>>>>>> operations only on the tail struct pages. BTW, folio infrastructure
>>>>>> is also based on this assumption.
>>>>>>
>>>>>> Thanks,
>>>>>> Muchun.
>>>>>>
>>>>>>> Or something I still misunderstand, please correct me.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Huan Yang
>>>>>>>
>>>>>>>> of the fundamentals of HVO, I kindly suggest a thorough review of the document
>>>>>>>> in [3].
>>>>>>>>
>>>>>>>> [2] https://lore.kernel.org/lkml/5229b24f-1984-4225-ae03-8b952de56e3b@vivo.com/#t
>>>>>>>> [3] Documentation/mm/vmemmap_dedup.rst
>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/linux-mm/20250327092922.536-1-link@vivo.com/T/#m055b34978cf882fd44d2d08d929b50292d8502b4
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Muchun.
>
>
Powered by blists - more mailing lists