linux-kernel - Re: [PATCH 02/12] vmscan: shrinker->nr updates race and go wrong

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110620012531.GN561@dastard>
Date:	Mon, 20 Jun 2011 11:25:31 +1000
From:	Dave Chinner <david@...morbit.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-mm@...ck.org, xfs@....sgi.com
Subject: Re: [PATCH 02/12] vmscan: shrinker->nr updates race and go wrong

On Mon, Jun 20, 2011 at 09:46:54AM +0900, KOSAKI Motohiro wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 48e3fbd..dce2767 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -251,17 +251,29 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> >  		unsigned long total_scan;
> >  		unsigned long max_pass;
> >  		int shrink_ret = 0;
> > +		long nr;
> > +		long new_nr;
> >  
> > +		/*
> > +		 * copy the current shrinker scan count into a local variable
> > +		 * and zero it so that other concurrent shrinker invocations
> > +		 * don't also do this scanning work.
> > +		 */
> > +		do {
> > +			nr = shrinker->nr;
> > +		} while (cmpxchg(&shrinker->nr, nr, 0) != nr);
> > +
> > +		total_scan = nr;
> >  		max_pass = do_shrinker_shrink(shrinker, shrink, 0);
> >  		delta = (4 * nr_pages_scanned) / shrinker->seeks;
> >  		delta *= max_pass;
> >  		do_div(delta, lru_pages + 1);
> > -		shrinker->nr += delta;
> > -		if (shrinker->nr < 0) {
> > +		total_scan += delta;
> > +		if (total_scan < 0) {
> >  			printk(KERN_ERR "shrink_slab: %pF negative objects to "
> >  			       "delete nr=%ld\n",
> > -			       shrinker->shrink, shrinker->nr);
> > -			shrinker->nr = max_pass;
> > +			       shrinker->shrink, total_scan);
> > +			total_scan = max_pass;
> >  		}
> >  
> >  		/*
> > @@ -269,13 +281,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> >  		 * never try to free more than twice the estimate number of
> >  		 * freeable entries.
> >  		 */
> > -		if (shrinker->nr > max_pass * 2)
> > -			shrinker->nr = max_pass * 2;
> > +		if (total_scan > max_pass * 2)
> > +			total_scan = max_pass * 2;
> >  
> > -		total_scan = shrinker->nr;
> > -		shrinker->nr = 0;
> >  
> > -		trace_mm_shrink_slab_start(shrinker, shrink, nr_pages_scanned,
> > +		trace_mm_shrink_slab_start(shrinker, shrink, nr, nr_pages_scanned,
> >  					lru_pages, max_pass, delta, total_scan);
> >  
> >  		while (total_scan >= SHRINK_BATCH) {
> > @@ -295,8 +305,19 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> >  			cond_resched();
> >  		}
> >  
> > -		shrinker->nr += total_scan;
> > -		trace_mm_shrink_slab_end(shrinker, shrink_ret, total_scan);
> > +		/*
> > +		 * move the unused scan count back into the shrinker in a
> > +		 * manner that handles concurrent updates. If we exhausted the
> > +		 * scan, there is no need to do an update.
> > +		 */
> > +		do {
> > +			nr = shrinker->nr;
> > +			new_nr = total_scan + nr;
> > +			if (total_scan <= 0)
> > +				break;
> > +		} while (cmpxchg(&shrinker->nr, nr, new_nr) != nr);
> > +
> > +		trace_mm_shrink_slab_end(shrinker, shrink_ret, nr, new_nr);
> >  	}
> >  	up_read(&shrinker_rwsem);
> >  out:
> 
> Looks great fix. Please remove tracepoint change from this patch and send it
> to -stable. iow, I expect I'll ack your next spin.

I don't believe such a change belongs in -stable. This code has been
buggy for many years and as I mentioned it actually makes existing
bad shrinker behaviour worse. I don't test stable kernels, so I've
got no idea what side effects it will have outside of this series.
I'm extremely hesitant to change VM behaviour in stable kernels
without having tested first, so I'm not going to push it for stable
kernels.

If you want it in stable kernels, then you can always let
stable@...nel.org know once the commits are in the mainline tree and
you've tested them...

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/