linux-kernel - Re: [PATCH 1/4] x86_64: Switch to SPARSE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0704021258500.31698@schroedinger.engr.sgi.com>
Date:	Mon, 2 Apr 2007 13:11:04 -0700 (PDT)
From:	Christoph Lameter <clameter@....com>
To:	Dave Hansen <hansendc@...ibm.com>
cc:	Andi Kleen <ak@...e.de>, linux-kernel@...r.kernel.org,
	linux-arch@...r.kernel.org, Martin Bligh <mbligh@...gle.com>,
	linux-mm@...ck.org,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>
Subject: Re: [PATCH 1/4] x86_64: Switch to SPARSE_VIRTUAL

On Mon, 2 Apr 2007, Dave Hansen wrote:

> MAX_ORDER, and the section size is at least MAX_ORDER.  If we *did* have
> this, then the page allocator would already be broken for these
> nodes. ;)

Ahh... Ok.

> So, this SPARSE_VIRTUAL does introduce a new dependency, which Andi
> calculated above.  But, in reality, I don't think it's a big deal.  Just
> to spell it out a bit more, if this:
> 
> 	VMEMMAP_MAPPING_SIZE/sizeof(struct page) * PAGE_SIZE
> 
> (where VMEMMAP_MAPPING_SIZE is PMD_SIZE in your case) is any larger than
> the granularity on which your NUMA nodes are divided, then you might
> have a problem with mem_map for one NUMA node getting allocated on
> another.  

This is only a problem if

1. We are not on NUMA emulation. In that case: Who cares. The SPARSEMEM
   sections make sure that the MAX_ORDER blocks do not overlap.

2. There is a hole less than 128 MB between the nodes.

3. The maximum overlap can then be theoretically less than 2M in terms of
   page structs. That is less than 128MB can overlap at the beginning of a 
   node. Typically the start of a node gets used for allocation system
   control areas. I.e. node data, vmemmap 2M blocks etc. For those we
   only use the page structs during bootstrap. They are not performance
   critical. We can just ignore the problem for those.

   In order for this to become a problem the overlap would need to
   more than the management data at the front of a node. The larger 
   the zone is the more 2M blocks will be allocated from the beginning a 
   node and the less we can actually get into this situation.

If this is an actual problem then we could take out this particular
2M page and replace it with single 4K pages that can be individually
placed. Yaw.... Too complex.

I think we can ignore this. The only problem could be reduced performance
accessing page structs of some small portion of a node.

> It might be worth a comment, or at least some kind of WARN_ON().
> Perhaps we can stick something in online_page() to check if:
> 
> 	page_to_nid(page) == page_to_nid(virt_to_page(page))

Could do that but the check is going to be too agressive. Check would 
have to be done after all control information has been allocated. WARN_ON 
would be sufficient since this is not going to impact functionality.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/