linux-kernel - Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <YwgHcjBmldFiI71O@xz-m1.local>
Date:   Thu, 25 Aug 2022 19:36:18 -0400
From:   Peter Xu <peterx@...hat.com>
To:     Alistair Popple <apopple@...dia.com>
Cc:     "Huang, Ying" <ying.huang@...el.com>,
        Nadav Amit <nadav.amit@...il.com>,
        huang ying <huang.ying.caritas@...il.com>,
        Linux MM <linux-mm@...ck.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Sierra Guiza, Alejandro (Alex)" <alex.sierra@....com>,
        Felix Kuehling <Felix.Kuehling@....com>,
        Jason Gunthorpe <jgg@...dia.com>,
        John Hubbard <jhubbard@...dia.com>,
        David Hildenbrand <david@...hat.com>,
        Ralph Campbell <rcampbell@...dia.com>,
        Matthew Wilcox <willy@...radead.org>,
        Karol Herbst <kherbst@...hat.com>,
        Lyude Paul <lyude@...hat.com>, Ben Skeggs <bskeggs@...hat.com>,
        Logan Gunthorpe <logang@...tatee.com>, paulus@...abs.org,
        linuxppc-dev@...ts.ozlabs.org, stable@...r.kernel.org
Subject: Re: [PATCH v2 1/2] mm/migrate_device.c: Copy pte dirty bit to page

On Fri, Aug 26, 2022 at 08:09:28AM +1000, Alistair Popple wrote:
> > I looked at some of the callers, it seems not all of them are ready to
> > handle that (__kvmppc_svm_page_out() or svm_migrate_vma_to_vram()).  Is it
> > safe?  Do the callers need to always properly handle that (unless the
> > migration is only a best-effort, but it seems not always the case).
> 
> Migration is always best effort. Callers need to be prepared to handle
> failure of a particular page to migrate, but I could believe not all of
> them are.

Ok, I see that ppc list is in the loop, hopefully this issue is aware since
afaict ppc will sigbus when migrate_vma_setup() fails, otoh the svm code
just dumps some device error (and I didn't check upper the stack from there).

> 
> > Besides, since I read the old code of prepare(), I saw this comment:
> >
> > -		if (!(migrate->src[i] & MIGRATE_PFN_LOCKED)) {
> > -			/*
> > -			 * Because we are migrating several pages there can be
> > -			 * a deadlock between 2 concurrent migration where each
> > -			 * are waiting on each other page lock.
> > -			 *
> > -			 * Make migrate_vma() a best effort thing and backoff
> > -			 * for any page we can not lock right away.
> > -			 */
> > -			if (!trylock_page(page)) {
> > -				migrate->src[i] = 0;
> > -				migrate->cpages--;
> > -				put_page(page);
> > -				continue;
> > -			}
> > -			remap = false;
> > -			migrate->src[i] |= MIGRATE_PFN_LOCKED;
> > -		}
> >
> > I'm a bit curious whether that deadlock mentioned in the comment is
> > observed in reality?
> >
> > If the page was scanned in the same address space, logically the lock order
> > should be guaranteed (if both page A&B, both threads should lock in order).
> > I think the order can be changed if explicitly did so (e.g. fork() plus
> > mremap() for anonymous here) but I just want to make sure I get the whole
> > point of it.
> 
> You seem to have the point of it. The trylock_page() is to avoid
> deadlock, and failure is always an option for migration. Drivers can
> always retry if they really need the page to migrate, although success
> is never guaranteed. For example the page might be pinned (or have
> swap-cache allocated to it, but I'm hoping to at least get that fixed).

If so I'd suggest even more straightforward document for either this
trylock() or on the APIs (e.g. for migrate_vma_setup()).  This behavior is
IMHO hiding deep and many people may not realize.  I'll comment in the
comment update patch.

Thanks.

-- 
Peter Xu