linux-kernel - Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading /proc/pagetypeinfo

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.DEB.2.21.1910221734470.126424@chino.kir.corp.google.com>
Date:   Tue, 22 Oct 2019 17:52:07 -0700 (PDT)
From:   David Rientjes <rientjes@...gle.com>
To:     Waiman Long <longman@...hat.com>
cc:     Michal Hocko <mhocko@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
        Roman Gushchin <guro@...com>, Vlastimil Babka <vbabka@...e.cz>,
        Konstantin Khlebnikov <khlebnikov@...dex-team.ru>,
        Jann Horn <jannh@...gle.com>, Song Liu <songliubraving@...com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        Rafael Aquini <aquini@...hat.com>, Mel Gorman <mgorman@...e.de>
Subject: Re: [PATCH] mm/vmstat: Reduce zone lock hold time when reading
 /proc/pagetypeinfo

On Tue, 22 Oct 2019, Waiman Long wrote:

> >>> and used nr_free to compute the missing count. Since MIGRATE_MOVABLE
> >>> is usually the largest one on large memory systems, this is the one
> >>> to be skipped. Since the printing order is migration-type => order, we
> >>> will have to store the counts in an internal 2D array before printing
> >>> them out.
> >>>
> >>> Even by skipping the MIGRATE_MOVABLE pages, we may still be holding the
> >>> zone lock for too long blocking out other zone lock waiters from being
> >>> run. This can be problematic for systems with large amount of memory.
> >>> So a check is added to temporarily release the lock and reschedule if
> >>> more than 64k of list entries have been iterated for each order. With
> >>> a MAX_ORDER of 11, the worst case will be iterating about 700k of list
> >>> entries before releasing the lock.
> >> But you are still iterating through the whole free_list at once so if it
> >> gets really large then this is still possible. I think it would be
> >> preferable to use per migratetype nr_free if it doesn't cause any
> >> regressions.
> >>
> > Yes, it is still theoretically possible. I will take a further look at
> > having per-migrate type nr_free. BTW, there is one more place where the
> > free lists are being iterated with zone lock held - mark_free_pages().
> 
> Looking deeper into the code, the exact migration type is not stored in
> the page itself. An initial movable page can be stolen to be put into
> another migration type. So in a delete or move from free_area, we don't
> know exactly what migration type the page is coming from. IOW, it is
> hard to get accurate counts of the number of entries in each lists.
> 

I think the suggestion is to maintain a nr_free count of the free_list for 
each order for each migratetype so anytime a page is added or deleted from 
the list, the nr_free is adjusted.  Then the free_area's nr_free becomes 
the sum of its migratetype's nr_free at that order.  That's possible to do 
if you track the migratetype per page, as you said, or like pcp pages 
track it as part of page->index.  It's a trade-off on whether you want to 
impact the performance of maintaining these new nr_frees anytime you 
manipulate the freelists.

I think Vlastimil and I discussed per order per migratetype nr_frees in 
the past and it could be a worthwhile improvement for other reasons, 
specifically it leads to heuristics that can be used to determine how 
fragmentated a certain migratetype is for a zone, i.e. a very quick way to 
determine what ratio of pages over all MIGRATE_UNMOVABLE pageblocks are 
free.

Or maybe there are other reasons why these nr_frees can't be maintained 
anymore?  (I had a patch to do it on 4.3.)

You may also find systems where MIGRATE_MOVABLE is not actually the 
longest free_list compared to other migratetypes on a severely fragmented 
system, so special casing MIGRATE_MOVABLE might not be the best way 
forward.