[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20160509084155.GA507@swordfish>
Date: Mon, 9 May 2016 17:52:31 +0900
From: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To: Minchan Kim <minchan@...nel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
"[4.3+]" <stable@...r.kernel.org>
Subject: Re: [PATCH] zsmalloc: fix zs_can_compact() integer overflow
Hello,
On (05/09/16 17:07), Minchan Kim wrote:
[..]
> > Depending on the circumstances, OBJ_ALLOCATED can become less
> > than OBJ_USED, which can result in either very high or negative
> > `total_scan' value calculated in do_shrink_slab().
>
> So, do you see pr_err("shrink_slab: %pF negative objects xxxx)
> in vmscan.c and skip shrinking?
yes
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
> It would be better to explain what's the result without this patch
> and end-user effect for going -stable.
the problem is that it seems that not every overflowed value returned from
zs_can_compact() is getting detected in do_shrink_slab():
freeable = shrinker->count_objects(shrinker, shrinkctl);
if (freeable == 0)
return 0;
/*
* copy the current shrinker scan count into a local variable
* and zero it so that other concurrent shrinker invocations
* don't also do this scanning work.
*/
nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0);
total_scan = nr;
delta = (4 * nr_scanned) / shrinker->seeks;
delta *= freeable;
do_div(delta, nr_eligible + 1);
total_scan += delta;
if (total_scan < 0) {
pr_err("shrink_slab: %pF negative objects to delete nr=%ld\n",
shrinker->scan_objects, total_scan);
total_scan = freeable;
}
this calculation can hide the shrinker->count_objects() error. I added
some debugging code (on x86_64), and the output was:
[ 59.041959] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[ 59.041963] vmscan: >> but total_scan > 0: 92679974445502
[ 59.041964] vmscan: >> resulting total_scan: 92679974445502
[ 59.192734] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[ 59.192737] vmscan: >> but total_scan > 0: 5830197242006811
[ 59.192738] vmscan: >> resulting total_scan: 5830197242006811
[ 59.259805] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[ 59.259809] vmscan: >> but total_scan > 0: 23649671889371219
[ 59.259810] vmscan: >> resulting total_scan: 23649671889371219
[ 76.279767] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[ 76.279770] vmscan: >> but total_scan > 0: 895907920044174
[ 76.279771] vmscan: >> resulting total_scan: 895907920044174
[ 84.807837] vmscan: >> OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
[ 84.807841] vmscan: >> but total_scan > 0: 22634041808232578
[ 84.807842] vmscan: >> resulting total_scan: 22634041808232578
so we can end up with insanely huge total_scan, which is then used in
this while loop:
while (total_scan >= batch_size ||
total_scan >= freeable) {
unsigned long ret;
unsigned long nr_to_scan = min(batch_size, total_scan);
shrinkctl->nr_to_scan = nr_to_scan;
ret = shrinker->scan_objects(shrinker, shrinkctl);
if (ret == SHRINK_STOP)
break;
freed += ret;
count_vm_events(SLABS_SCANNED, nr_to_scan);
total_scan -= nr_to_scan;
cond_resched();
}
`total_scan >= batch_size' is true for a very-very long time, I guess.
'total_scan >= freeable' is also true for quite some time: freeable is `< 0'
and total_scan is 18446744073709551615, for example. so it's up to
shrinker->scan_objects() == SHRINK_STOP test, which is, I assume, a bit
too weak to rely on. so that's why I Cc'd -stable.
[..]
> > @@ -2262,10 +2262,13 @@ static void SetZsPageMovable(struct zs_pool *pool, struct zspage *zspage)
>
> It seems this patch is based on my old page migration work?
> It's not go to the mainline yet but your patch which fixes the bug should
> be supposed to go to the -stable. So, I hope this patch first.
oops... my fat fingers! good catch, thanks! I have two versions: for -next and
-mmots (with your LRU rework applied, indeed). somehow I managed to cd to the
wrong dir. sorry, will resend.
-ss
Powered by blists - more mailing lists