linux-kernel - Re: [PATCH v2] zsmalloc: fix zs_can

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20160509234205.GB4426@bbox>
Date:	Tue, 10 May 2016 08:42:05 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	<linux-kernel@...r.kernel.org>, <linux-mm@...ck.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
	"[4.3+]" <stable@...r.kernel.org>
Subject: Re: [PATCH v2] zsmalloc: fix zs_can_compact() integer overflow

On Mon, May 09, 2016 at 11:00:52PM +0900, Sergey Senozhatsky wrote:
> zs_can_compact() has two race conditions in its core calculation:
> 
> unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> 				zs_stat_get(class, OBJ_USED);
> 
> 1) classes are not locked, so the numbers of allocated and used
>    objects can change by the concurrent ops happening on other CPUs
> 2) shrinker invokes it from preemptible context
> 
> Depending on the circumstances, thus, OBJ_ALLOCATED can become
> less than OBJ_USED, which can result in either very high or
> negative `total_scan' value calculated later in do_shrink_slab().
> 
> do_shrink_slab() has some logic to prevent those cases:
> 
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
> 
> However, due to the way `total_scan' is calculated, not every
> shrinker->count_objects() overflow can be spotted and handled.
> To demonstrate the latter, I added some debugging code to do_shrink_slab()
> (x86_64) and the results were:
> 
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 92679974445502
>  vmscan: resulting total_scan: 92679974445502
> [..]
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 22634041808232578
>  vmscan: resulting total_scan: 22634041808232578
> 
> Even though shrinker->count_objects() has returned an overflowed value,
> the resulting `total_scan' is positive, and, what is more worrisome, it
> is insanely huge. This value is getting used later on in
> shrinker->scan_objects() loop:
> 
>         while (total_scan >= batch_size ||
>                total_scan >= freeable) {
>                 unsigned long ret;
>                 unsigned long nr_to_scan = min(batch_size, total_scan);
> 
>                 shrinkctl->nr_to_scan = nr_to_scan;
>                 ret = shrinker->scan_objects(shrinker, shrinkctl);
>                 if (ret == SHRINK_STOP)
>                         break;
>                 freed += ret;
> 
>                 count_vm_events(SLABS_SCANNED, nr_to_scan);
>                 total_scan -= nr_to_scan;
> 
>                 cond_resched();
>         }
> 
> `total_scan >= batch_size' is true for a very-very long time and
> 'total_scan >= freeable' is also true for quite some time, because
> `freeable < 0' and `total_scan' is large enough, for example,
> 22634041808232578. The only break condition, in the given scheme of
> things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
> bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
> 
> To fix the issue, take a pool stat snapshot and use it instead of
> racy zs_stat_get() calls.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> Cc: Minchan Kim <minchan@...nel.org>
> Cc: <stable@...r.kernel.org>        [4.3+]
Acked-by: Minchan Kim <minchan@...nel.org>

Thanks!