[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpFC05vCwAONO7YxG=LhqteyYmOy1Nprg2NyjQ6hKaHgOA@mail.gmail.com>
Date: Tue, 27 Jun 2023 09:23:26 -0700
From: Suren Baghdasaryan <surenb@...gle.com>
To: Peter Xu <peterx@...hat.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, hannes@...xchg.org,
mhocko@...e.com, josef@...icpanda.com, jack@...e.cz,
ldufour@...ux.ibm.com, laurent.dufour@...ibm.com,
michel@...pinasse.org, liam.howlett@...cle.com, jglisse@...gle.com,
vbabka@...e.cz, minchan@...gle.com, dave@...olabs.net,
punit.agrawal@...edance.com, lstoakes@...il.com, hdanton@...a.com,
apopple@...dia.com, ying.huang@...el.com, david@...hat.com,
yuzhao@...gle.com, dhowells@...hat.com, hughd@...gle.com,
viro@...iv.linux.org.uk, brauner@...nel.org,
pasha.tatashin@...een.com, linux-mm@...ck.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
kernel-team@...roid.com
Subject: Re: [PATCH v3 7/8] mm: drop VMA lock before waiting for migration
On Tue, Jun 27, 2023 at 8:49 AM Peter Xu <peterx@...hat.com> wrote:
>
> On Mon, Jun 26, 2023 at 09:23:20PM -0700, Suren Baghdasaryan wrote:
> > migration_entry_wait does not need VMA lock, therefore it can be
> > dropped before waiting.
>
> Hmm, I'm not sure..
>
> Note that we're still dereferencing *vmf->pmd when waiting, while *pmd is
> on the page table and IIUC only be guaranteed if the vma is still there.
> If without both mmap / vma lock I don't see what makes sure the pgtable is
> always there. E.g. IIUC a race can happen where unmap() runs right after
> vma_end_read() below but before pmdp_get_lockless() (inside
> migration_entry_wait()), then pmdp_get_lockless() can read some random
> things if the pgtable is freed.
That sounds correct. I thought ptl would keep pmd stable but there is
time between vma_end_read() and spin_lock(ptl) when it can be freed
from under us. I think it would work if we do vma_end_read() after
spin_lock(ptl) but that requires code refactoring. I'll probably drop
this optimization from the patchset for now to keep things simple and
will get back to it later.
>
> >
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
> > ---
> > mm/memory.c | 14 ++++++++++++--
> > 1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 5caaa4c66ea2..bdf46fdc58d6 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3715,8 +3715,18 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > entry = pte_to_swp_entry(vmf->orig_pte);
> > if (unlikely(non_swap_entry(entry))) {
> > if (is_migration_entry(entry)) {
> > - migration_entry_wait(vma->vm_mm, vmf->pmd,
> > - vmf->address);
> > + /* Save mm in case VMA lock is dropped */
> > + struct mm_struct *mm = vma->vm_mm;
> > +
> > + if (vmf->flags & FAULT_FLAG_VMA_LOCK) {
> > + /*
> > + * No need to hold VMA lock for migration.
> > + * WARNING: vma can't be used after this!
> > + */
> > + vma_end_read(vma);
> > + ret |= VM_FAULT_COMPLETED;
> > + }
> > + migration_entry_wait(mm, vmf->pmd, vmf->address);
> > } else if (is_device_exclusive_entry(entry)) {
> > vmf->page = pfn_swap_entry_to_page(entry);
> > ret = remove_device_exclusive_entry(vmf);
> > --
> > 2.41.0.178.g377b9f9a00-goog
> >
>
> --
> Peter Xu
>
Powered by blists - more mailing lists