linux-kernel - Re: [PATCH 2/8] mm: memory: extend finish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <043a4f2d-e08d-4cea-a73d-819586509b12@linux.alibaba.com>
Date: Wed, 8 May 2024 17:06:28 +0800
From: Baolin Wang <baolin.wang@...ux.alibaba.com>
To: David Hildenbrand <david@...hat.com>, Ryan Roberts
 <ryan.roberts@....com>, akpm@...ux-foundation.org, hughd@...gle.com
Cc: willy@...radead.org, ioworker0@...il.com, wangkefeng.wang@...wei.com,
 ying.huang@...el.com, 21cnbao@...il.com, shy828301@...il.com,
 ziy@...dia.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/8] mm: memory: extend finish_fault() to support large
 folio



On 2024/5/8 15:15, David Hildenbrand wrote:
> On 08.05.24 05:44, Baolin Wang wrote:
>>
>>
>> On 2024/5/7 18:37, Ryan Roberts wrote:
>>> On 06/05/2024 09:46, Baolin Wang wrote:
>>>> Add large folio mapping establishment support for finish_fault() as 
>>>> a preparation,
>>>> to support multi-size THP allocation of anonymous shmem pages in the 
>>>> following
>>>> patches.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@...ux.alibaba.com>
>>>> ---
>>>>    mm/memory.c | 43 +++++++++++++++++++++++++++++++++----------
>>>>    1 file changed, 33 insertions(+), 10 deletions(-)
>>>>
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index eea6e4984eae..936377220b77 100644
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -4747,9 +4747,12 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>>    {
>>>>        struct vm_area_struct *vma = vmf->vma;
>>>>        struct page *page;
>>>> +    struct folio *folio;
>>>>        vm_fault_t ret;
>>>>        bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
>>>>                  !(vma->vm_flags & VM_SHARED);
>>>> +    int type, nr_pages, i;
>>>> +    unsigned long addr = vmf->address;
>>>>        /* Did we COW the page? */
>>>>        if (is_cow)
>>>> @@ -4780,24 +4783,44 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>>>>                return VM_FAULT_OOM;
>>>>        }
>>>> +    folio = page_folio(page);
>>>> +    nr_pages = folio_nr_pages(folio);
>>>> +
>>>> +    if (unlikely(userfaultfd_armed(vma))) {
>>>> +        nr_pages = 1;
>>>> +    } else if (nr_pages > 1) {
>>>> +        unsigned long start = ALIGN_DOWN(vmf->address, nr_pages * 
>>>> PAGE_SIZE);
>>>> +        unsigned long end = start + nr_pages * PAGE_SIZE;
>>>> +
>>>> +        /* In case the folio size in page cache beyond the VMA 
>>>> limits. */
>>>> +        addr = max(start, vma->vm_start);
>>>> +        nr_pages = (min(end, vma->vm_end) - addr) >> PAGE_SHIFT;
>>>> +
>>>> +        page = folio_page(folio, (addr - start) >> PAGE_SHIFT);
>>>
>>> I still don't really follow the logic in this else if block. Isn't it 
>>> possible
>>> that finish_fault() gets called with a page from a folio that isn't 
>>> aligned with
>>> vmf->address?
>>>
>>> For example, let's say we have a file who's size is 64K and which is 
>>> cached in a
>>> single large folio in the page cache. But the file is mapped into a 
>>> process at
>>> VA 16K to 80K. Let's say we fault on the first page (VA=16K). You 
>>> will calculate
>>
>> For shmem, this doesn't happen because the VA is aligned with the
>> hugepage size in the shmem_get_unmapped_area() function. See patch 7.
> 
> Does that cover mremap() and MAP_FIXED as well.

Good point. Thanks for pointing this out.

> We should try doing this as cleanly as possible, to prepare for the 
> future / corner cases.

Sure. Let me re-think about the algorithm.