linux-kernel - Re: [RFC 2/6] mm/migrate_pages: split unmap_and_move() to _unmap() and

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87ill937qe.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Tue, 27 Sep 2022 09:51:21 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Alistair Popple <apopple@...dia.com>
Cc:     Yang Shi <shy828301@...il.com>, <linux-mm@...ck.org>,
        <linux-kernel@...r.kernel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Zi Yan <ziy@...dia.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Oscar Salvador <osalvador@...e.de>,
        Matthew Wilcox <willy@...radead.org>
Subject: Re: [RFC 2/6] mm/migrate_pages: split unmap_and_move() to _unmap()
 and _move()

Alistair Popple <apopple@...dia.com> writes:

> Yang Shi <shy828301@...il.com> writes:
>
>> On Mon, Sep 26, 2022 at 2:37 AM Alistair Popple <apopple@...dia.com> wrote:
>>>
>>>
>>> Huang Ying <ying.huang@...el.com> writes:
>>>
>>> > This is a preparation patch to batch the page unmapping and moving
>>> > for the normal pages and THP.
>>> >
>>> > In this patch, unmap_and_move() is split to migrate_page_unmap() and
>>> > migrate_page_move().  So, we can batch _unmap() and _move() in
>>> > different loops later.  To pass some information between unmap and
>>> > move, the original unused newpage->mapping and newpage->private are
>>> > used.
>>>
>>> This looks like it could cause a deadlock between two threads migrating
>>> the same pages if force == true && mode != MIGRATE_ASYNC as
>>> migrate_page_unmap() will call lock_page() while holding the lock on
>>> other pages in the list. Therefore the two threads could deadlock if the
>>> pages are in a different order.
>>
>> It seems unlikely to me since the page has to be isolated from lru
>> before migration. The isolating from lru is atomic, so the two threads
>> unlikely see the same pages on both lists.
>
> Oh thanks! That is a good point and I agree since lru isolation is
> atomic the two threads won't see the same pages. migrate_vma_setup()
> does LRU isolation after locking the page which is why the potential
> exists there. We could potentially switch that around but given
> ZONE_DEVICE pages aren't on an lru it wouldn't help much.
>
>> But there might be other cases which may incur deadlock, for example,
>> filesystem writeback IIUC. Some filesystems may lock a bunch of pages
>> then write them back in a batch. The same pages may be on the
>> migration list and they are also dirty and seen by writeback. I'm not
>> sure whether I miss something that could prevent such a deadlock from
>> happening.
>
> I'm not overly familiar with that area but I would assume any filesystem
> code doing this would already have to deal with deadlock potential.

Thank you very much for pointing this out.  I think the deadlock is a
real issue.  Anyway, we shouldn't forbid other places in kernel to lock
2 pages at the same time.

The simplest solution is to batch page migration only if mode ==
MIGRATE_ASYNC.  Then we may consider to fall back to non-batch mode if
mode != MIGRATE_ASYNC and trylock page fails.

Best Regards,
Huang, Ying

[snip]