lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez1HjoZiyk+_JOxcA5eM797vCmzvXaEVUgd6Mkze-aykbg@mail.gmail.com>
Date: Wed, 4 Dec 2024 23:47:30 +0100
From: Jann Horn <jannh@...gle.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Qi Zheng <zhengqi.arch@...edance.com>, david@...hat.com, hughd@...gle.com, 
	willy@...radead.org, muchun.song@...ux.dev, vbabka@...nel.org, 
	peterx@...hat.com, mgorman@...e.de, catalin.marinas@....com, will@...nel.org, 
	dave.hansen@...ux.intel.com, luto@...nel.org, peterz@...radead.org, 
	x86@...nel.org, lorenzo.stoakes@...cle.com, zokeefe@...gle.com, 
	rientjes@...gle.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v4 09/11] mm: pgtable: reclaim empty PTE page in madvise(MADV_DONTNEED)

On Wed, Dec 4, 2024 at 11:36 PM Andrew Morton <akpm@...ux-foundation.org> wrote:
>
> On Wed,  4 Dec 2024 19:09:49 +0800 Qi Zheng <zhengqi.arch@...edance.com> wrote:
> > As a first step, this commit aims to synchronously free the empty PTE
> > pages in madvise(MADV_DONTNEED) case. We will detect and free empty PTE
> > pages in zap_pte_range(), and will add zap_details.reclaim_pt to exclude
> > cases other than madvise(MADV_DONTNEED).
> >
> > Once an empty PTE is detected, we first try to hold the pmd lock within
> > the pte lock. If successful, we clear the pmd entry directly (fast path).
> > Otherwise, we wait until the pte lock is released, then re-hold the pmd
> > and pte locks and loop PTRS_PER_PTE times to check pte_none() to re-detect
> > whether the PTE page is empty and free it (slow path).
>
> "wait until the pte lock is released" sounds nasty.  I'm not
> immediately seeing the code which does this.  PLease provide more
> description?

It's worded a bit confusingly, but it's fine; a better description
might be "if try_get_and_clear_pmd() fails to trylock the PMD lock
(against lock order), then later, after we have dropped the PTE lock,
try_to_free_pte() takes the PMD and PTE locks in the proper lock
order".

The "wait until the pte lock is released" part is just supposed to
mean that the try_to_free_pte() call is placed after the point where
the PTE lock has been dropped (which makes it possible to take the PMD
lock). It does not refer to waiting for other threads.

> > +void try_to_free_pte(struct mm_struct *mm, pmd_t *pmd, unsigned long addr,
> > +                  struct mmu_gather *tlb)
> > +{
> > +     pmd_t pmdval;
> > +     spinlock_t *pml, *ptl;
> > +     pte_t *start_pte, *pte;
> > +     int i;
> > +
> > +     pml = pmd_lock(mm, pmd);
> > +     start_pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pmdval, &ptl);
> > +     if (!start_pte)
> > +             goto out_ptl;
> > +     if (ptl != pml)
> > +             spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
> > +
> > +     /* Check if it is empty PTE page */
> > +     for (i = 0, pte = start_pte; i < PTRS_PER_PTE; i++, pte++) {
> > +             if (!pte_none(ptep_get(pte)))
> > +                     goto out_ptl;
> > +     }
>
> Are there any worst-case situations in which we'll spend uncceptable
> mounts of time running this loop?

This loop is just over a single page table, that should be no more
expensive than what we already do in other common paths like
zap_pte_range().

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ