[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <03bd739f-cceb-4024-a2fb-5331ba258d36@bytedance.com>
Date: Thu, 5 Dec 2024 11:23:01 +0800
From: Qi Zheng <zhengqi.arch@...edance.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
Qi Zheng <zhengqi.arch@...edance.com>, david@...hat.com, hughd@...gle.com,
willy@...radead.org, muchun.song@...ux.dev, vbabka@...nel.org,
peterx@...hat.com, mgorman@...e.de, catalin.marinas@....com,
will@...nel.org, dave.hansen@...ux.intel.com, luto@...nel.org,
peterz@...radead.org, x86@...nel.org, lorenzo.stoakes@...cle.com,
zokeefe@...gle.com, rientjes@...gle.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in
madvise(MADV_DONTNEED)
On 2024/12/5 06:47, Jann Horn wrote:
> On Wed, Dec 4, 2024 at 11:36 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>>
>> On Wed, 4 Dec 2024 19:09:49 +0800 Qi Zheng <zhengqi.arch@...edance.com> wrote:
>>> As a first step, this commit aims to synchronously free the empty PTE
>>> pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE
>>> pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude
>>> cases other than madvise(MADV_DONTNEED).
>>>
>>> Once an empty PTE is detected, we first try to hold the pmd lock within
>>> the pte lock. If successful, we clear the pmd entry directly (fast path).
>>> Otherwise, we wait until the pte lock is released, then re-hold the pmd
>>> and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect
>>> whether the PTE page is empty and free it (slow path).
>>
>> "wait until the pte lock is released" sounds nasty. I'm not
>> immediately seeing the code which does this. PLease provide more
>> description?
>
> It's worded a bit confusingly, but it's fine; a better description
> might be "if try_get_and_clear_pmd() fails to trylock the PMD lock
> (against lock order), then later, after we have dropped the PTE lock,
> try_to_free_pte() takes the PMD and PTE locks in the proper lock
> order".
>
> The "wait until the pte lock is released" part is just supposed to
> mean that the try_to_free_pte() call is placed after the point where
> the PTE lock has been dropped (which makes it possible to take the PMD
> lock). It does not refer to waiting for other threads.
Yes. Thanks for helping to clarify my vague statement!
>
>>> +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
>>> + struct mmu_gather *tlb)
>>> +{
>>> + pmd_t pmdval;
>>> + spinlock_t *pml, *ptl;
>>> + pte_t *start_pte, *pte;
>>> + int i;
>>> +
>>> + pml = pmd_lock(mm, pmd);
>>> + start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl);
>>> + if (!start_pte)
>>> + goto out_ptl;
>>> + if (ptl != pml)
>>> + spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
>>> +
>>> + /* Check if it is empty PTE page */
>>> + for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) {
>>> + if (!pte_none(ptep_get(pte)))
>>> + goto out_ptl;
>>> + }
>>
>> Are there any worst-case situations in which we'll spend uncceptable
>> mounts of time running this loop?
>
> This loop is just over a single page table, that should be no more
> expensive than what we already do in other common paths like
> zap_pte_range().
Agree.
Powered by blists - more mailing lists