lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.02.1405291555120.9336@chino.kir.corp.google.com>
Date:	Thu, 29 May 2014 16:01:55 -0700 (PDT)
From:	David Rientjes <rientjes@...gle.com>
To:	Marcelo Tosatti <mtosatti@...hat.com>
cc:	Li Zefan <lizefan@...wei.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Lai Jiangshan <laijs@...fujitsu.com>,
	Mel Gorman <mgorman@...e.de>, Tejun Heo <tj@...nel.org>,
	Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] page_alloc: skip cpuset enforcement for lower zone
 allocations (v4)

On Thu, 29 May 2014, Marcelo Tosatti wrote:

> Zone specific allocations, such as GFP_DMA32, should not be restricted
> to cpusets allowed node list: the zones which such allocations demand
> might be contained in particular nodes outside the cpuset node list.
> 
> Necessary for the following usecase:
> - driver which requires zone specific memory (such as KVM, which
> requires root pagetable at paddr < 4GB).
> - user wants to limit allocations of application to nodeX, and nodeX has
> no memory < 4GB.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@...hat.com>
> 
> diff --git a/kernel/cpuset.c b/kernel/cpuset.c
> index 3d54c41..3bbc23f 100644
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -2374,6 +2374,7 @@ static struct cpuset *nearest_hardwall_ancestor(struct cpuset *cs)
>   * variable 'wait' is not set, and the bit ALLOC_CPUSET is not set
>   * in alloc_flags.  That logic and the checks below have the combined
>   * affect that:
> + *	gfp_zone(mask) < policy_zone - any node ok
>   *	in_interrupt - any node ok (current task context irrelevant)
>   *	GFP_ATOMIC   - any node ok
>   *	TIF_MEMDIE   - any node ok
> @@ -2392,6 +2393,10 @@ int __cpuset_node_allowed_softwall(int node, gfp_t gfp_mask)
>  
>  	if (in_interrupt() || (gfp_mask & __GFP_THISNODE))
>  		return 1;
> +#ifdef CONFIG_NUMA
> +	if (gfp_zone(gfp_mask) < policy_zone)
> +		return 1;
> +#endif
>  	might_sleep_if(!(gfp_mask & __GFP_HARDWALL));
>  	if (node_isset(node, current->mems_allowed))
>  		return 1;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5dba293..a0ce1ba 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2726,6 +2726,11 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
>  retry_cpuset:
>  	cpuset_mems_cookie = read_mems_allowed_begin();
>  
> +#ifdef CONFIG_NUMA
> +	if (gfp_zone(gfp_mask) < policy_zone)
> +		nodemask = &node_states[N_ONLINE];
> +#endif
> +
>  	/* The preferred zone is used for statistics later */
>  	first_zones_zonelist(zonelist, high_zoneidx,
>  				nodemask ? : &cpuset_current_mems_allowed,

There are still three issues with this, two of which are only minor and 
one that needs more thought:

 (1) this doesn't affect only cpusets which the changelog indicates, it 
     also bypasses mempolicies for GFP_DMA and GFP_DMA32 allocations since
     the nodemask != NULL in the page allocator when there is an effective
     mempolicy.  That may be precisely what you're trying to do (do the
     same for mempolicies as you're doing for cpusets), but the comment 
     now in the code specifically refers to cpusets.  Can you make a case
     for the mempolicies exception as well?  Otherwise, we'll need to do

	if (!nodemask && gfp_zone(gfp_mask) < policy_zone)
		nodemask = &node_states[N_ONLINE];

And the two minors:

 (2) this should be &node_states[N_MEMORY], not &node_states[N_ONLINE] 
     since memoryless nodes should not be included.  Note that
     guarantee_online_mems() looks at N_MEMORY and
     cpuset_current_mems_allowed is defined for N_MEMORY without
     cpusets.

 (3) it's unnecessary for this to be after the "retry_cpuset" label and
     check the gfp mask again if we need to relook at the allowed cpuset
     mask.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ