linux-kernel - Re: Deadlock possibly caused by too_many

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100915184434.18e2d933@notabene>
Date:	Wed, 15 Sep 2010 18:44:34 +1000
From:	Neil Brown <neilb@...e.de>
To:	Wu Fengguang <fengguang.wu@...el.com>
Cc:	Rik van Riel <riel@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	Li Shaohua <shaohua.li@...el.com>
Subject: Re: Deadlock possibly caused by too_many_isolated.

On Wed, 15 Sep 2010 16:28:43 +0800
Wu Fengguang <fengguang.wu@...el.com> wrote:

> Neil,
> 
> Sorry for the rushed and imaginary ideas this morning..
> 
> > @@ -1101,6 +1101,12 @@ static unsigned long shrink_inactive_lis
> >  	int lumpy_reclaim = 0;
> >  
> >  	while (unlikely(too_many_isolated(zone, file, sc))) {
> > +		if ((sc->gfp_mask & GFP_IOFS) != GFP_IOFS)
> > +			/* Not allowed to do IO, so mustn't wait
> > +			 * on processes that might try to
> > +			 */
> > +			return SWAP_CLUSTER_MAX;
> > +
> 
> The above patch should behavior like this: it returns SWAP_CLUSTER_MAX
> to cheat all the way up to believe "enough pages have been reclaimed".
> So __alloc_pages_direct_reclaim() see non-zero *did_some_progress and
> go on to call get_page_from_freelist(). That normally fails because
> the task didn't really scanned the LRU lists. However it does have the
> possibility to succeed -- when so many processes are doing concurrent
> direct reclaims, it may luckily get one free page reclaimed by other
> tasks. What's more, if it does fail to get a free page, the upper
> layer __alloc_pages_slowpath() will be repeat recalling
> __alloc_pages_direct_reclaim(). So, sooner or later it will succeed in
> "stealing" a free page reclaimed by other tasks.
> 
> In summary, the patch behavior for !__GFP_IO/FS is
> - won't do any page reclaim
> - won't fail the page allocation (unexpected)
> - will wait and steal one free page from others (unreasonable)
> 
> So it will address the problem you encountered, however it sounds
> pretty unexpected and illogical behavior, right?
> 
> I believe this patch will address the problem equally well.
> What do you think?

Thank you for the detailed explanation.  Is agree with your reasoning and
now understand why your patch is sufficient.

I will get it tested and let you know how that goes.

Thanks,
NeilBrown


> 
> Thanks,
> Fengguang
> ---
> 
> mm: Avoid possible deadlock caused by too_many_isolated()
> 
> Neil finds that if too_many_isolated() returns true while performing
> direct reclaim we can end up waiting for other threads to complete their
> direct reclaim.  If those threads are allowed to enter the FS or IO to
> free memory, but this thread is not, then it is possible that those
> threads will be waiting on this thread and so we get a circular
> deadlock.
> 
> some task enters direct reclaim with GFP_KERNEL
>   => too_many_isolated() false
>     => vmscan and run into dirty pages
>       => pageout()
>         => take some FS lock
> 	  => fs/block code does GFP_NOIO allocation
> 	    => enter direct reclaim again
> 	      => too_many_isolated() true
> 		=> waiting for others to progress, however the other
> 		   tasks may be circular waiting for the FS lock..
> 
> The fix is to let !__GFP_IO and !__GFP_FS direct reclaims enjoy higher
> priority than normal ones, by honouring them higher throttle threshold.
> 
> Now !__GFP_IO/FS reclaims won't be waiting for __GFP_IO/FS reclaims to
> progress. They will be blocked only when there are too many concurrent
> !__GFP_IO/FS reclaims, however that's very unlikely because the IO-less
> direct reclaims is able to progress much more faster, and they won't
> deadlock each other. The threshold is raised high enough for them, so
> that there can be sufficient parallel progress of !__GFP_IO/FS reclaims.
> 
> Reported-by: NeilBrown <neilb@...e.de>
> Signed-off-by: Wu Fengguang <fengguang.wu@...el.com>
> ---
>  mm/vmscan.c |    5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> --- linux-next.orig/mm/vmscan.c	2010-09-15 11:58:58.000000000 +0800
> +++ linux-next/mm/vmscan.c	2010-09-15 15:36:14.000000000 +0800
> @@ -1141,36 +1141,39 @@ int isolate_lru_page(struct page *page)
>  	return ret;
>  }
>  
>  /*
>   * Are there way too many processes in the direct reclaim path already?
>   */
>  static int too_many_isolated(struct zone *zone, int file,
>  		struct scan_control *sc)
>  {
>  	unsigned long inactive, isolated;
> +	int ratio;
>  
>  	if (current_is_kswapd())
>  		return 0;
>  
>  	if (!scanning_global_lru(sc))
>  		return 0;
>  
>  	if (file) {
>  		inactive = zone_page_state(zone, NR_INACTIVE_FILE);
>  		isolated = zone_page_state(zone, NR_ISOLATED_FILE);
>  	} else {
>  		inactive = zone_page_state(zone, NR_INACTIVE_ANON);
>  		isolated = zone_page_state(zone, NR_ISOLATED_ANON);
>  	}
>  
> -	return isolated > inactive;
> +	ratio = sc->gfp_mask & (__GFP_IO | __GFP_FS) ? 1 : 8;
> +
> +	return isolated > inactive * ratio;
>  }
>  
>  /*
>   * TODO: Try merging with migrations version of putback_lru_pages
>   */
>  static noinline_for_stack void
>  putback_lru_pages(struct zone *zone, struct scan_control *sc,
>  				unsigned long nr_anon, unsigned long nr_file,
>  				struct list_head *page_list)
>  {

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/