linux-kernel - Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <03bd739f-cceb-4024-a2fb-5331ba258d36@bytedance.com>
Date: Thu, 5 Dec 2024 11:23:01 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 Qi Zheng <zhengqi.arch@...edance.com>, david@...hat.com, hughd@...gle.com,
 willy@...radead.org, muchun.song@...ux.dev, vbabka@...nel.org,
 peterx@...hat.com, mgorman@...e.de, catalin.marinas@....com,
 will@...nel.org, dave.hansen@...ux.intel.com, luto@...nel.org,
 peterz@...radead.org, x86@...nel.org, lorenzo.stoakes@...cle.com,
 zokeefe@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in
 madvise(MADV_DONTNEED)



On 2024/12/5 06:47, Jann Horn wrote:
> On Wed, Dec 4, 2024 at 11:36 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>>
>> On Wed,  4 Dec 2024 19:09:49 +0800 Qi Zheng <zhengqi.arch@...edance.com> wrote:
>>> As a first step, this commit aims to synchronously free the empty PTE
>>> pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE
>>> pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude
>>> cases other than madvise(MADV_DONTNEED).
>>>
>>> Once an empty PTE is detected, we first try to hold the pmd lock within
>>> the pte lock. If successful, we clear the pmd entry directly (fast path).
>>> Otherwise, we wait until the pte lock is released, then re-hold the pmd
>>> and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect
>>> whether the PTE page is empty and free it (slow path).
>>
>> "wait until the pte lock is released" sounds nasty.  I'm not
>> immediately seeing the code which does this.  PLease provide more
>> description?
> 
> It's worded a bit confusingly, but it's fine; a better description
> might be "if try_get_and_clear_pmd() fails to trylock the PMD lock
> (against lock order), then later, after we have dropped the PTE lock,
> try_to_free_pte() takes the PMD and PTE locks in the proper lock
> order".
> 
> The "wait until the pte lock is released" part is just supposed to
> mean that the try_to_free_pte() call is placed after the point where
> the PTE lock has been dropped (which makes it possible to take the PMD
> lock). It does not refer to waiting for other threads.

Yes. Thanks for helping to clarify my vague statement!

> 
>>> +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
>>> +                  struct mmu_gather *tlb)
>>> +{
>>> +     pmd_t pmdval;
>>> +     spinlock_t *pml, *ptl;
>>> +     pte_t *start_pte, *pte;
>>> +     int i;
>>> +
>>> +     pml = pmd_lock(mm, pmd);
>>> +     start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl);
>>> +     if (!start_pte)
>>> +             goto out_ptl;
>>> +     if (ptl != pml)
>>> +             spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
>>> +
>>> +     /* Check if it is empty PTE page */
>>> +     for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) {
>>> +             if (!pte_none(ptep_get(pte)))
>>> +                     goto out_ptl;
>>> +     }
>>
>> Are there any worst-case situations in which we'll spend uncceptable
>> mounts of time running this loop?
> 
> This loop is just over a single page table, that should be no more
> expensive than what we already do in other common paths like
> zap_pte_range().

Agree.