lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YCw5d2BTsFgq/mZa@google.com>
Date:   Tue, 16 Feb 2021 13:30:31 -0800
From:   Minchan Kim <minchan@...nel.org>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     Andrew Morton <akpm@...ux-foundation.org>,
        linux-mm <linux-mm@...ck.org>,
        LKML <linux-kernel@...r.kernel.org>, cgoldswo@...eaurora.org,
        linux-fsdevel@...r.kernel.org, mhocko@...e.com, david@...hat.com,
        vbabka@...e.cz, viro@...iv.linux.org.uk, joaodias@...gle.com
Subject: Re: [RFC 1/2] mm: disable LRU pagevec during the migration
 temporarily

On Tue, Feb 16, 2021 at 06:22:42PM +0000, Matthew Wilcox wrote:
> On Tue, Feb 16, 2021 at 09:03:47AM -0800, Minchan Kim wrote:
> > LRU pagevec holds refcount of pages until the pagevec are drained.
> > It could prevent migration since the refcount of the page is greater
> > than the expection in migration logic. To mitigate the issue,
> > callers of migrate_pages drains LRU pagevec via migrate_prep or
> > lru_add_drain_all before migrate_pages call.
> > 
> > However, it's not enough because pages coming into pagevec after the
> > draining call still could stay at the pagevec so it could keep
> > preventing page migration. Since some callers of migrate_pages have
> > retrial logic with LRU draining, the page would migrate at next trail
> > but it is still fragile in that it doesn't close the fundamental race
> > between upcoming LRU pages into pagvec and migration so the migration
> > failure could cause contiguous memory allocation failure in the end.
> 
> Have you been able to gather any numbers on this?  eg does migration
> now succeed 5% more often?

What I measured was how many times migrate_pages retried with force mode
below debug code.
The test was android apps launching with cma allocation in background.
Total cma allocation count was about 500 during the entire testing 
and have seen about 400 retrial with below debug code.
With this patchset(with bug fix), the retrial count was reduced under 30.

What I measured was how many times the migrate_pages 
diff --git a/mm/migrate.c b/mm/migrate.c
index 04a98bb2f568..caa661be2d16 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1459,6 +1459,11 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
                                                private, page, pass > 2, mode,
                                                reason);

+                       if (rc && reason == MR_CONTIG_RANGE && pass > 2) {
+                               printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc);
+                               dump_page(page, "fail to migrate");
+                       }
+

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ