linux-kernel - Re: [PATCH 8/8] mm: Remove __GFP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f6505442-98a9-12e4-b2cd-0fa83874c159@suse.cz>
Date:   Thu, 19 Oct 2017 15:42:12 +0200
From:   Vlastimil Babka <vbabka@...e.cz>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Andrew Morton <akpm@...ux-foundation.org>
Cc:     Linux-MM <linux-mm@...ck.org>,
        Linux-FSDevel <linux-fsdevel@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, Jan Kara <jack@...e.cz>,
        Andi Kleen <ak@...ux.intel.com>,
        Dave Hansen <dave.hansen@...el.com>,
        Dave Chinner <david@...morbit.com>
Subject: Re: [PATCH 8/8] mm: Remove __GFP_COLD

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>

I've updated patch https://marc.info/?l=linux-mm&m=150831216224521&w=2 on top
of this. It's a small non-functional change, so it might even be folded.

----8<----
>From b002266c1a826805a50087db851f93e7a87ceb2f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@...e.cz>
Date: Tue, 17 Oct 2017 16:03:02 +0200
Subject: [PATCH] mm, page_alloc: simplify list handling in rmqueue_bulk()

The rmqueue_bulk() function fills an empty pcplist with pages from the free
list. It tries to preserve increasing order by pfn to the caller, because it
leads to better performance with some I/O controllers, as explained in
e084b2d95e48 ("page-allocator: preserve PFN ordering when __GFP_COLD is set").

To preserve the order, it's sufficient to add pages to the tail of the list
as they are retrieved. The current code instead adds to the head of the list,
but then updates the list head pointer to the last added page, in each step.
This does result in the same order, but is needlessly confusing and potentially
wasteful, with no apparent benefit. This patch simplifies the code and adjusts
comment accordingly.

Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
---
 mm/page_alloc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e60d8a37e3f..4e1b3c686cc2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2330,16 +2330,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			continue;
 
 		/*
-		 * Split buddy pages returned by expand() are received here
-		 * in physical page order. The page is added to the callers and
-		 * list and the list head then moves forward. From the callers
-		 * perspective, the linked list is ordered by page number in
-		 * some conditions. This is useful for IO devices that can
-		 * merge IO requests if the physical pages are ordered
-		 * properly.
+		 * Split buddy pages returned by expand() are received here in
+		 * physical page order. The page is added to the tail of
+		 * caller's list. From the callers perspective, the linked list
+		 * is ordered by page number under some conditions. This is
+		 * useful for IO devices that can forward direction from the
+		 * head, thus also in the physical page order. This is useful
+		 * for IO devices that can merge IO requests if the physical
+		 * pages are ordered properly.
 		 */
-		list_add(&page->lru, list);
-		list = &page->lru;
+		list_add_tail(&page->lru, list);
 		alloced++;
 		if (is_migrate_cma(get_pcppage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-- 
2.14.2