lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 10 Feb 2012 11:19:13 +0000
From:	Mel Gorman <mel@....ul.ie>
To:	Marek Szyprowski <m.szyprowski@...sung.com>
Cc:	linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
	linux-media@...r.kernel.org, linux-mm@...ck.org,
	linaro-mm-sig@...ts.linaro.org,
	'Michal Nazarewicz' <mina86@...a86.com>,
	'Kyungmin Park' <kyungmin.park@...sung.com>,
	'Russell King' <linux@....linux.org.uk>,
	'Andrew Morton' <akpm@...ux-foundation.org>,
	'KAMEZAWA Hiroyuki' <kamezawa.hiroyu@...fujitsu.com>,
	'Daniel Walker' <dwalker@...eaurora.org>,
	'Arnd Bergmann' <arnd@...db.de>,
	'Jesse Barker' <jesse.barker@...aro.org>,
	'Jonathan Corbet' <corbet@....net>,
	'Shariq Hasnain' <shariq.hasnain@...aro.org>,
	'Chunsang Jeong' <chunsang.jeong@...aro.org>,
	'Dave Hansen' <dave@...ux.vnet.ibm.com>,
	'Benjamin Gaignard' <benjamin.gaignard@...aro.org>,
	'Rob Clark' <rob.clark@...aro.org>,
	'Ohad Ben-Cohen' <ohad@...ery.com>
Subject: Re: [PATCH 11/15] mm: trigger page reclaim in alloc_contig_range()
 to stabilize watermarks

On Wed, Feb 08, 2012 at 04:14:46PM +0100, Marek Szyprowski wrote:
> > > <SNIP>
> > > +static int __reclaim_pages(struct zone *zone, gfp_t gfp_mask, int count)
> > > +{
> > > +	enum zone_type high_zoneidx = gfp_zone(gfp_mask);
> > > +	struct zonelist *zonelist = node_zonelist(0, gfp_mask);
> > > +	int did_some_progress = 0;
> > > +	int order = 1;
> > > +	unsigned long watermark;
> > > +
> > > +	/*
> > > +	 * Increase level of watermarks to force kswapd do his job
> > > +	 * to stabilize at new watermark level.
> > > +	 */
> > > +	min_free_kbytes += count * PAGE_SIZE / 1024;
> > 
> > There is a risk of overflow here although it is incredibly
> > small. Still, a potentially nicer way of doing this was
> > 
> > count << (PAGE_SHIFT - 10)
> > 
> > > +	setup_per_zone_wmarks();
> > > +
> > 
> > Nothing prevents two or more processes updating the wmarks at the same
> > time which is racy and unpredictable. Today it is not much of a problem
> > but CMA makes this path hotter than it was and you may see weirdness
> > if two processes are updating zonelists at the same time. Swap-over-NFS
> > actually starts with a patch that serialises setup_per_zone_wmarks()
> > 
> > You also potentially have a BIG problem here if this happens
> > 
> > min_free_kbytes = 32768
> > Process a: min_free_kbytes  += 65536
> > Process a: start direct reclaim
> > echo 16374 > /proc/sys/vm/min_free_kbytes
> > Process a: exit direct_reclaim
> > Process a: min_free_kbytes -= 65536
> > 
> > min_free_kbytes now wraps negative and the machine hangs.
> > 
> > The damage is confined to CMA though so I am not going to lose sleep
> > over it but you might want to consider at least preventing parallel
> > updates to min_free_kbytes from proc.
> 
> Right. This approach was definitely too hacky. What do you think about replacing 
> it with the following code (I assume that setup_per_zone_wmarks() serialization 
> patch will be merged anyway so I skipped it here):
> 

It's part of a larger series and the rest of that series is
controversial. That single patch can be split out obviously so feel free
to add it to your series and stick your Signed-off-by on the end of it.

> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 82f4fa5..bb9ae41 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -371,6 +371,13 @@ struct zone {
>         /* see spanned/present_pages for more description */
>         seqlock_t               span_seqlock;
>  #endif
> +#ifdef CONFIG_CMA
> +       /*
> +        * CMA needs to increase watermark levels during the allocation
> +        * process to make sure that the system is not starved.
> +        */
> +       unsigned long           min_cma_pages;
> +#endif
>         struct free_area        free_area[MAX_ORDER];
> 
>  #ifndef CONFIG_SPARSEMEM
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 824fb37..1ca52f0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5044,6 +5044,11 @@ void setup_per_zone_wmarks(void)
> 
>                 zone->watermark[WMARK_LOW]  = min_wmark_pages(zone) + (tmp >> 2);
>                 zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + (tmp >> 1);
> +#ifdef CONFIG_CMA
> +               zone->watermark[WMARK_MIN] += zone->min_cma_pages;
> +               zone->watermark[WMARK_LOW] += zone->min_cma_pages;
> +               zone->watermark[WMARK_HIGH] += zone->min_cma_pages;
> +#endif
>                 setup_zone_migrate_reserve(zone);
>                 spin_unlock_irqrestore(&zone->lock, flags);
>         }

This is better in that it is not vunerable to parallel updates of
min_free_kbytes. It would be slightly tidier to introduce something
like cma_wmark_pages() that returns min_cma_pages if CONFIG_CMA and 0
otherwise. Use the helper to get right of this ifdef CONFIG_CMA within
setup_per_zone_wmarks().

You'll still have the problem of kswapd not taking CMA pages properly into
account when deciding whether to reclaim or not though.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ