[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1910221734470.126424@chino.kir.corp.google.com>
Date: Tue, 22 Oct 2019 17:52:07 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Waiman Long <longman@...hat.com>
cc: Michal Hocko <mhocko@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
Roman Gushchin <guro@...com>, Vlastimil Babka <vbabka@...e.cz>,
Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
Jann Horn <jannh@...gle.com>, Song Liu <songliubraving@...com>,
Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
Rafael Aquini <aquini@...hat.com>, Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading
/proc/pagetypeinfo
On Tue, 22 Oct 2019, Waiman Long wrote:
> >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE
> >>> is usually the largest one on large memory systems, this is the one
> >>> to be skipped. Since the printing order is migration-type => order, we
> >>> will have to store the counts in an internal 2D array before printing
> >>> them out.
> >>>
> >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the
> >>> zone lock for too long blocking out other zone lock waiters from being
> >>> run. This can be problematic for systems with large amount of memory.
> >>> So a check is added to temporarily release the lock and reschedule if
> >>> more than 64k of list entries have been iterated for each order. With
> >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list
> >>> entries before releasing the lock.
> >> But you are still iterating through the whole free_list at once so if it
> >> gets really large then this is still possible. I think it would be
> >> preferable to use per migratetype nr_free if it doesn't cause any
> >> regressions.
> >>
> > Yes, it is still theoretically possible. I will take a further look at
> > having per-migrate type nr_free. BTW, there is one more place where the
> > free lists are being iterated with zone lock held - mark_free_pages().
>
> Looking deeper into the code, the exact migration type is not stored in
> the page itself. An initial movable page can be stolen to be put into
> another migration type. So in a delete or move from free_area, we don't
> know exactly what migration type the page is coming from. IOW, it is
> hard to get accurate counts of the number of entries in each lists.
>
I think the suggestion is to maintain a nr_free count of the free_list for
each order for each migratetype so anytime a page is added or deleted from
the list, the nr_free is adjusted. Then the free_area's nr_free becomes
the sum of its migratetype's nr_free at that order. That's possible to do
if you track the migratetype per page, as you said, or like pcp pages
track it as part of page->index. It's a trade-off on whether you want to
impact the performance of maintaining these new nr_frees anytime you
manipulate the freelists.
I think Vlastimil and I discussed per order per migratetype nr_frees in
the past and it could be a worthwhile improvement for other reasons,
specifically it leads to heuristics that can be used to determine how
fragmentated a certain migratetype is for a zone, i.e. a very quick way to
determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are
free.
Or maybe there are other reasons why these nr_frees can't be maintained
anymore? (I had a patch to do it on 4.3.)
You may also find systems where MIGRATE_MOVABLE is not actually the
longest free_list compared to other migratetypes on a severely fragmented
system, so special casing MIGRATE_MOVABLE might not be the best way
forward.
Powered by blists - more mailing lists