lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <874jr5atqf.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Wed, 01 Mar 2023 14:08:56 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Hugh Dickins <hughd@...gle.com>
Cc:     Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, "Xu, Pengfei" <pengfei.xu@...el.com>,
        Christoph Hellwig <hch@....de>,
        Stefan Roesch <shr@...kernel.io>, Tejun Heo <tj@...nel.org>,
        Xin Hao <xhao@...ux.alibaba.com>, Zi Yan <ziy@...dia.com>,
        Yang Shi <shy828301@...il.com>,
        Baolin Wang <baolin.wang@...ux.alibaba.com>,
        Matthew Wilcox <willy@...radead.org>,
        Mike Kravetz <mike.kravetz@...cle.com>
Subject: Re: [PATCH 3/3] migrate_pages: try migrate in batch asynchronously
 firstly

Hugh Dickins <hughd@...gle.com> writes:

> On Tue, 28 Feb 2023, Huang, Ying wrote:
>> Hugh Dickins <hughd@...gle.com> writes:
>> > On Fri, 24 Feb 2023, Huang Ying wrote:
>> >> 
>> >> diff --git a/mm/migrate.c b/mm/migrate.c
>> >> index 91198b487e49..c17ce5ee8d92 100644
>> >> --- a/mm/migrate.c
>> >> +++ b/mm/migrate.c
>> >> @@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
>> >>  	return rc;
>> >>  }
>> >>  
>> >> +static int migrate_pages_sync(struct list_head *from, new_page_t get_new_page,
>> >> +		free_page_t put_new_page, unsigned long private,
>> >> +		enum migrate_mode mode, int reason, struct list_head *ret_folios,
>> >> +		struct list_head *split_folios, struct migrate_pages_stats *stats)
>> >> +{
>> >> +	int rc, nr_failed = 0;
>> >> +	LIST_HEAD(folios);
>> >> +	struct migrate_pages_stats astats;
>> >> +
>> >> +	memset(&astats, 0, sizeof(astats));
>> >> +	/* Try to migrate in batch with MIGRATE_ASYNC mode firstly */
>> >> +	rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC,
>> >> +				 reason, &folios, split_folios, &astats,
>> >> +				 NR_MAX_MIGRATE_PAGES_RETRY);
>> >
>> > I wonder if that and below would better be NR_MAX_MIGRATE_PAGES_RETRY / 2.
>> >
>> > Though I've never got down to adjusting that number (and it's not a job
>> > to be done in this set of patches), those 10 retries sometimes terrify
>> > me, from a latency point of view.  They can have such different weights:
>> > in the unmapped case, 10 retries is okay; but when a pinned page is mapped
>> > into 1000 processes, the thought of all that unmapping and TLB flushing
>> > and remapping is terrifying.
>> >
>> > Since you're retrying below, halve both numbers of retries for now?
>> 
>> Yes.  These are reasonable concerns.
>> 
>> And in the original implementation, we only wait to lock page and wait
>> the writeback to complete if pass > 2.  This is kind of trying to
>> migrate asynchronously for 3 times before the real synchronous
>> migration.  So, should we delete the "force" logic (in
>> migrate_folio_unmap()), and try to migrate asynchronously for 3 times in
>> batch before migrating synchronously for 7 times one by one?
>
> Oh, that's a good idea (but please don't imagine I've thought it through):
> I hadn't realized the way in which your migrate_pages_sync() addition is
> kind of duplicating the way that the "force" argument conditions behaviour,
> It would be very appealing to delete the "force" argument now if you can.

Sure.  Will do that in the next version.

> But aside from that, you've also made me wonder (again, please remember I
> don't have a good picture of the new migrate_pages() sequence in my head)
> whether you have already made a *great* strike against my 10 retries
> terror.  Am I reading it right, that the unmapping is now done on the
> first try, and the remove_migration_ptes after the last try (all the
> pages involved having remained locked throughout)?

Yes.  You are right.  Now, unmapping and moving are two separate steps,
and they are retried separately.  After a folio has been unmapped
successfully, we will not remap/unmap it 10 times if the folio is pinned
so that failed to move (migrate_folio_move()).  So the latency caused by
retrying is much better now.  But I still tend to keep the total retry
number as before.  Do you agree?

Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ