linux-kernel - Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or suspending

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1244800695.7172.115.camel@pasglop>
Date:	Fri, 12 Jun 2009 19:58:15 +1000
From:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:	Pekka J Enberg <penberg@...helsinki.fi>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, npiggin@...e.de,
	akpm@...ux-foundation.org, cl@...ux-foundation.org,
	torvalds@...ux-foundation.org
Subject: Re: [PATCH v2] slab,slub: ignore __GFP_WAIT if we're booting or 
 suspending

On Fri, 2009-06-12 at 12:45 +0300, Pekka J Enberg wrote:
> On Fri, 12 Jun 2009, Benjamin Herrenschmidt wrote:
> > Take a break, take a step back, and look at the big picture. Do you
> > really want to find all the needles in the haystack or just make sure
> > you wear gloves when handling the hay ? :-)
> 
> Well, I would like to find the needles but I think we should do it with 
> gloves on.
> 
> If everyone is happy with this version of Ben's patch, I'm going to just 
> apply it and push it to Linus.

Thanks :-) Looks right at first glance. I'll test tomorrow.

Cheers,
Ben.

> 			Pekka
> 
> >From 7760451b006b165bce052622af7316b54bd5a122 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@...nel.crashing.org>
> Date: Fri, 12 Jun 2009 12:39:58 +0300
> Subject: [PATCH] Sanitize "gfp" flags during boot
> 
> With the recent shuffle of initialization order to move memory related
> inits earlier, various subtle breakage was introduced in archs like
> powerpc due to code somewhat assuming that GFP_KERNEL can be used as
> soon as the allocators are up. This is not true because any __GFP_WAIT
> allocation will cause interrupts to be enabled, which can be fatal if
> it happens too early.
> 
> This isn't trivial to fix on every call site. For example, powerpc's
> ioremap implementation needs to be called early. For that, it uses two
> different mechanisms to carve out virtual space. Before memory init,
> by moving down VMALLOC_END, and then, by calling get_vm_area().
> Unfortunately, the later does GFK_KERNEL allocations. But we can't do
> anything else because once vmalloc's been initialized, we can no longer
> safely move VMALLOC_END to carve out space.
> 
> There are other examples, wehere can can be called either very early
> or later on when devices are hot-plugged. It would be a major pain for
> such code to have to "know" whether it's in a context where it should
> use GFP_KERNEL or GFP_NOWAIT.
> 
> Finally, by having the ability to silently removed __GFP_WAIT from
> allocations, we pave the way for suspend-to-RAM to use that feature
> to also remove __GFP_IO from allocations done after suspending devices
> has started. This is important because such allocations may hang if
> devices on the swap-out path have been suspended, but not-yet suspended
> drivers don't know about it, and may deadlock themselves by being hung
> into a kmalloc somewhere while holding a mutex for example.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@...nel.crashing.org>
> Signed-off-by: Pekka Enberg <penberg@...helsinki.fi>
> ---
>  include/linux/gfp.h |    9 +++++++++
>  init/main.c         |    1 +
>  mm/page_alloc.c     |   18 ++++++++++++++++++
>  mm/slab.c           |    9 +++++++++
>  mm/slub.c           |    3 +++
>  5 files changed, 40 insertions(+), 0 deletions(-)
> 
> diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> index 0bbc15f..8d2ea79 100644
> --- a/include/linux/gfp.h
> +++ b/include/linux/gfp.h
> @@ -99,6 +99,13 @@ struct vm_area_struct;
>  /* 4GB DMA on some platforms */
>  #define GFP_DMA32	__GFP_DMA32
>  
> +extern gfp_t gfp_allowed_bits;
> +
> +static inline gfp_t gfp_sanitize(gfp_t gfp_flags)
> +{
> +	return gfp_flags & gfp_allowed_bits;
> +}
> +
>  /* Convert GFP flags to their corresponding migrate type */
>  static inline int allocflags_to_migratetype(gfp_t gfp_flags)
>  {
> @@ -245,4 +252,6 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp);
>  void drain_all_pages(void);
>  void drain_local_pages(void *dummy);
>  
> +void mm_late_init(void);
> +
>  #endif /* __LINUX_GFP_H */
> diff --git a/init/main.c b/init/main.c
> index b3e8f14..34c6e7e 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -640,6 +640,7 @@ asmlinkage void __init start_kernel(void)
>  				 "enabled early\n");
>  	early_boot_irqs_on();
>  	local_irq_enable();
> +	mm_late_init();
>  
>  	/*
>  	 * HACK ALERT! This is early. We're enabling the console before
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 7760ef9..a42e4c7 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -77,6 +77,13 @@ int percpu_pagelist_fraction;
>  int pageblock_order __read_mostly;
>  #endif
>  
> +/*
> + * We set up the page allocator and the slab allocator early on with interrupts
> + * disabled. Therefore, make sure that we sanitize GFP flags accordingly before
> + * everything is up and running.
> + */
> +gfp_t gfp_allowed_bits = ~(__GFP_WAIT|__GFP_FS | __GFP_IO);
> +
>  static void __free_pages_ok(struct page *page, unsigned int order);
>  
>  /*
> @@ -1473,6 +1480,9 @@ __alloc_pages_internal(gfp_t gfp_mask, unsigned int order,
>  	unsigned long did_some_progress;
>  	unsigned long pages_reclaimed = 0;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(gfp_mask);
>  
>  	might_sleep_if(wait);
> @@ -4728,3 +4738,11 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  }
>  #endif
> +
> +void mm_late_init(void)
> +{
> +	/*
> +	 * Interrupts are enabled now so all GFP allocations are safe.
> +	 */
> +	gfp_allowed_bits = __GFP_BITS_MASK;
> +}
> diff --git a/mm/slab.c b/mm/slab.c
> index f46b65d..87b166e 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2791,6 +2791,9 @@ static int cache_grow(struct kmem_cache *cachep,
>  	gfp_t local_flags;
>  	struct kmem_list3 *l3;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	/*
>  	 * Be lazy and only check for valid flags here,  keeping it out of the
>  	 * critical path in kmem_cache_alloc().
> @@ -3212,6 +3215,9 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags)
>  	void *obj = NULL;
>  	int nid;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	if (flags & __GFP_THISNODE)
>  		return NULL;
>  
> @@ -3434,6 +3440,9 @@ __cache_alloc(struct kmem_cache *cachep, gfp_t flags, void *caller)
>  	unsigned long save_flags;
>  	void *objp;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfp_mask = gfp_sanitize(gfp_mask);
> +
>  	lockdep_trace_alloc(flags);
>  
>  	if (slab_should_failslab(cachep, flags))
> diff --git a/mm/slub.c b/mm/slub.c
> index 3964d3c..5c646f7 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1512,6 +1512,9 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	/* We handle __GFP_ZERO in the caller */
>  	gfpflags &= ~__GFP_ZERO;
>  
> +	/* Sanitize flags so we don't enable irqs too early during boot */
> +	gfpflags = gfp_sanitize(gfpflags);
> +
>  	if (!c->page)
>  		goto new_slab;
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/