linux-kernel - Re: [RFC][PATCH 07/10] zsmalloc: introduce auto-compact support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150604053056.GA662@swordfish>
Date:	Thu, 4 Jun 2015 14:30:56 +0900
From:	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
To:	Minchan Kim <minchan@...nel.org>
Cc:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org,
	Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
Subject: Re: [RFC][PATCH 07/10] zsmalloc: introduce auto-compact support

On (06/04/15 13:57), Minchan Kim wrote:
> On Sat, May 30, 2015 at 12:05:25AM +0900, Sergey Senozhatsky wrote:
> > perform class compaction in zs_free(), if zs_free() has created
> > a ZS_ALMOST_EMPTY page. this is the most trivial `policy'.
> 
> Finally, I got realized your intention.
> 
> Actually, I had a plan to add /sys/block/zram0/compact_threshold_ratio
> which means to compact automatically when compr_data_size/mem_used_total
> is below than the threshold but I didn't try because it could be done
> by usertool.
> 
> Another reason I didn't try the approach is that it could scan all of
> zs_objects repeatedly withtout any freeing zspage in some corner cases,
> which could be big overhead we should prevent so we might add some
> heuristic. as an example, we could delay a few compaction trial when
> we found a few previous trials as all fails.

this is why I use zs_can_compact() -- to evict from zs_compact() as soon
as possible. so useless scans are minimized (well, at least expected). I'm
also thinking of a threshold-based solution -- do class auto-compaction
only if we can free X pages, for example.

the problem of compaction is that there is no compaction until you trigger
it.

and fragmented classes are not necessarily a win. if writes don't happen
to a fragmented class-X (and we basically can't tell if they will, nor we
can estimate; it's up to I/O and data patterns, compression algorithm, etc.)
then class-X stays fragmented w/o any use.


> It's simple design of mm/compaction.c to prevent pointless overhead
> but historically it made pains several times and required more
> complicated logics but it's still painful.
> 
> Other thing I found recently is that it's not always win zsmalloc
> for zram is not fragmented. The fragmented space could be used
> for storing upcoming compressed objects although it is wasted space
> at the moment but if we don't have any hole(ie, fragment space)
> via frequent compaction, zsmalloc should allocate a new zspage
> which could be allocated on movable pageblock by fallback of
> nonmovable pageblock request on highly memory pressure system
> so it accelerates fragment problem of the system memory.

yes, but compaction almost always leave classes fragmented. I think
it's a corner case, when the number of unused allocated objects was
exactly the same as the number of objects that we migrated and the
number of migrated objects was exactly N*maxobj_per_zspage, so we
left the class w/o any unused objects (OBJ_ALLOCATED == OBJ_USED).
classes have 'holes' after compaction.


> So, I want to pass the policy to userspace.
> If we found it's really trobule on userspace, then, we need more
> thinking.

well, it can be under config "aggressive compaction" or "automatic
compaction" option.

	-ss

> Thanks.
> 
> > 
> > probably it would make zs_can_compact() to return an estimated number
> > of pages that potentially will be free and trigger auto-compaction
> > only when it's above some limit (e.g. at least 4 zs pages); or put it
> > under config option.
> > 
> > this also tweaks __zs_compact() -- we can't do reschedule
> > anymore, waiting for new pages in the current class. so we
> > compact as much as we can and return immediately if compaction
> > is not possible anymore.
> > 
> > auto-compaction is not a replacement of manual compaction.
> > 
> > compiled linux kernel with auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624076288        0 1624076288    19961     1106
> > 
> > performing additional manual compaction:
> > 
> > echo 1 > /sys/block/zram0/compact
> > cat /sys/block/zram0/mm_stat
> > 2339885056 1601034235 1624051712        0 1624076288    19961     1114
> > 
> > manual compaction was able to migrate additional 8 objects. so
> > auto-compaction is 'good enough'.
> > 
> > TEST
> > 
> > this test copies a 1.3G linux kernel tar to mounted zram disk,
> > and extracts it.
> > 
> > w/auto-compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26006    86016        0    86016    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.970s
> > user    0m15.247s
> > sys     0m8.477s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3547353088 2993384270 3011088384        0 3011088384    24310      108
> > 
> > =====================================================================
> > 
> > w/o auto compaction:
> > 
> > cat /sys/block/zram0/mm_stat
> >  1171456    26000    81920        0    81920    32781        0
> > 
> > time tar xf linux-3.10.tar.gz -C linux
> > 
> > real    0m16.983s
> > user    0m15.267s
> > sys     0m8.417s
> > 
> > du -sh linux
> > 2.0G    linux
> > 
> > cat /sys/block/zram0/mm_stat
> > 3548917760 2993566924 3011317760        0 3011317760    23928        0
> > 
> > =====================================================================
> > 
> > iozone shows that auto-compacted code runs faster in several
> > tests, which is hardly trustworthy. anyway.
> > 
> > iozone -t 3 -R -r 16K -s 60M -I +Z
> > 
> >        test           base       auto-compact (compacted 66123 objs)
> >    Initial write   1603682.25          1645112.38
> >          Rewrite   2502243.31          2256570.31
> >             Read   7040860.00          7130575.00
> >          Re-read   7036490.75          7066744.25
> >     Reverse Read   6617115.25          6155395.50
> >      Stride read   6705085.50          6350030.38
> >      Random read   6668497.75          6350129.38
> >   Mixed workload   5494030.38          5091669.62
> >     Random write   2526834.44          2500977.81
> >           Pwrite   1656874.00          1663796.94
> >            Pread   3322818.91          3359683.44
> >           Fwrite   4090124.25          4099773.88
> >            Fread   10358916.25         10324409.75
> > 
> > Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
> > ---
> >  mm/zsmalloc.c | 25 +++++++++++++------------
> >  1 file changed, 13 insertions(+), 12 deletions(-)
> > 
> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > index c2a640a..70bf481 100644
> > --- a/mm/zsmalloc.c
> > +++ b/mm/zsmalloc.c
> > @@ -1515,34 +1515,28 @@ static void __zs_compact(struct zs_pool *pool, struct size_class *class)
> >  
> >  		while ((dst_page = isolate_target_page(class))) {
> >  			cc.d_page = dst_page;
> > -			/*
> > -			 * If there is no more space in dst_page, resched
> > -			 * and see if anyone had allocated another zspage.
> > -			 */
> > +
> >  			if (!migrate_zspage(pool, class, &cc))
> > -				break;
> > +				goto out;
> >  
> >  			putback_zspage(pool, class, dst_page);
> >  		}
> >  
> > -		/* Stop if we couldn't find slot */
> > -		if (dst_page == NULL)
> > +		if (!dst_page)
> >  			break;
> > -
> >  		putback_zspage(pool, class, dst_page);
> >  		putback_zspage(pool, class, src_page);
> > -		spin_unlock(&class->lock);
> > -		cond_resched();
> > -		spin_lock(&class->lock);
> >  	}
> >  
> > +out:
> > +	if (dst_page)
> > +		putback_zspage(pool, class, dst_page);
> >  	if (src_page)
> >  		putback_zspage(pool, class, src_page);
> >  
> >  	spin_unlock(&class->lock);
> >  }
> >  
> > -
> >  unsigned long zs_get_total_pages(struct zs_pool *pool)
> >  {
> >  	return atomic_long_read(&pool->pages_allocated);
> > @@ -1741,6 +1735,13 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
> >  	unpin_tag(handle);
> >  
> >  	free_handle(pool, handle);
> > +
> > +	/*
> > +	 * actual fullness might have changed, __zs_compact() checks
> > +	 * if compaction makes sense
> > +	 */
> > +	if (fullness == ZS_ALMOST_EMPTY)
> > +		__zs_compact(pool, class);
> >  }
> >  EXPORT_SYMBOL_GPL(zs_free);
> >  
> > -- 
> > 2.4.2.337.gfae46aa
> > 
> 
> -- 
> Kind regards,
> Minchan Kim
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/