[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <VI1PR0802MB25289788BFADB136CE08AADA8FDA0@VI1PR0802MB2528.eurprd08.prod.outlook.com>
Date: Wed, 21 Nov 2018 12:06:02 +0000
From: Dave Rodgman <dave.rodgman@....com>
To: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: nd <nd@....com>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"davem@...emloft.net" <davem@...emloft.net>,
Matt Sealey <Matt.Sealey@....com>,
"nitingupta910@...il.com" <nitingupta910@...il.com>,
"rpurdie@...nedhand.com" <rpurdie@...nedhand.com>,
"markus@...rhumer.com" <markus@...rhumer.com>,
"minchan@...nel.org" <minchan@...nel.org>,
"sergey.senozhatsky.work@...il.com"
<sergey.senozhatsky.work@...il.com>,
Sonny Rao <sonnyrao@...gle.com>
Subject: [PATCH 0/6] lib/lzo: performance improvements
This patch series introduces performance improvements for lzo.
The improvements fall into two categories: general Arm-specific optimisations
(e.g., more efficient memory access); and the introduction of a special caseĀ
for handling runs of zeros (which is a common case for zram) using run-length
encoding.
The introduction of RLE modifies the bitstream such that it can't be decoded
by old versions of lzo (the new lzo-rle can correctly decode old bitstreams).
To avoid possible issues where data is persisted on disk (e.g., squashfs), theĀ
final patch in this series separates lzo-rle into a separate algorithm
alongside lzo, so that the new lzo-rle is (by default) only used for zram and
must be explicitly selected for other use-cases. This final patch could be
omitted if the consensus is that we'd rather avoid proliferation of lzo
variants.
Overall, performance is improved by around 1.1 - 4.8x (data-dependent: data
with many zero runs shows higher improvement). Under real-world testing with
zram, time spent in (de)compression during swapping is reduced by around 27%.
The graph below shows the weighted round-trip throughput of lzo, lz4 and
lzo-rle, for randomly generated 4k chunks of data with varying levels of
entropy. (To calculate weighted round-trip throughput, compression performance
is emphasised to reflect the fact that zram does around 2.25x more compression
than decompression. (Results and overall trends are fairly similar for
unweighted).
https://drive.google.com/file/d/18GU4pgRVCLNN7wXxynz-8R2ygrY2IdyE/view
Contributors:
Dave Rodgman <dave.rodgman@....com>
Matt Sealey <matt.sealey@....com>
Powered by blists - more mailing lists