[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f9bca947-f88e-51a7-fdaf-4403fda1b783@intel.com>
Date: Thu, 11 Jul 2019 11:21:49 -0700
From: Dave Hansen <dave.hansen@...el.com>
To: Nitesh Narayan Lal <nitesh@...hat.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
pbonzini@...hat.com, lcapitulino@...hat.com, pagupta@...hat.com,
wei.w.wang@...el.com, yang.zhang.wz@...il.com, riel@...riel.com,
david@...hat.com, mst@...hat.com, dodgen@...gle.com,
konrad.wilk@...cle.com, dhildenb@...hat.com, aarcange@...hat.com,
alexander.duyck@...il.com, john.starks@...rosoft.com,
mhocko@...e.com
Subject: Re: [RFC][Patch v11 1/2] mm: page_hinting: core infrastructure
On 7/10/19 12:51 PM, Nitesh Narayan Lal wrote:
> +static void bm_set_pfn(struct page *page)
> +{
> + struct zone *zone = page_zone(page);
> + int zone_idx = page_zonenum(page);
> + unsigned long bitnr = 0;
> +
> + lockdep_assert_held(&zone->lock);
> + bitnr = pfn_to_bit(page, zone_idx);
> + /*
> + * TODO: fix possible underflows.
> + */
> + if (free_area[zone_idx].bitmap &&
> + bitnr < free_area[zone_idx].nbits &&
> + !test_and_set_bit(bitnr, free_area[zone_idx].bitmap))
> + atomic_inc(&free_area[zone_idx].free_pages);
> +}
Let's say I have two NUMA nodes, each with ZONE_NORMAL and ZONE_MOVABLE
and each zone with 1GB of memory:
Node: 0 1
NORMAL 0->1GB 2->3GB
MOVABLE 1->2GB 3->4GB
This code will allocate two bitmaps. The ZONE_NORMAL bitmap will
represent data from 0->3GB and the ZONE_MOVABLE bitmap will represent
data from 1->4GB. That's the result of this code:
> + if (free_area[zone_idx].base_pfn) {
> + free_area[zone_idx].base_pfn =
> + min(free_area[zone_idx].base_pfn,
> + zone->zone_start_pfn);
> + free_area[zone_idx].end_pfn =
> + max(free_area[zone_idx].end_pfn,
> + zone->zone_start_pfn +
> + zone->spanned_pages);
But that means that both bitmaps will have space for PFNs in the other
zone type, which is completely bogus. This is fundamental because the
data structures are incorrectly built per zone *type* instead of per zone.
Powered by blists - more mailing lists