[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5C09448C.8010506@oberhumer.com>
Date: Thu, 6 Dec 2018 16:47:24 +0100
From: "Markus F.X.J. Oberhumer" <markus@...rhumer.com>
To: Dave Rodgman <dave.rodgman@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>
Cc: "herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>,
"davem@...emloft.net" <davem@...emloft.net>,
Matt Sealey <Matt.Sealey@....com>,
"nitingupta910@...il.com" <nitingupta910@...il.com>,
"minchan@...nel.org" <minchan@...nel.org>,
"sergey.senozhatsky.work@...il.com"
<sergey.senozhatsky.work@...il.com>,
"sonnyrao@...gle.com" <sonnyrao@...gle.com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
nd <nd@....com>, "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>
Subject: Re: [PATCH v4 0/7] lib/lzo: performance improvements
On 2018-11-30 15:26, Dave Rodgman wrote:
> This patch series introduces performance improvements for lzo.
>
> The previous version of this patchset is here:
> https://lkml.org/lkml/2018/11/30/807
>
> This version of the patchset fixes a maybe-used-uninitialized warning
> (although the previous version was still safe).
>
> Dave
Hi Dave,
as indicated in my previous mail please split your series into three
distinct pull requests.
Request 1 - ARM64 improvements; acked by me
[PATCH 1/8] lib/lzo: tidy-up ifdefs
[PATCH 3/8] lib/lzo: enable 64-bit CTZ on Arm
[PATCH 4/8] lib/lzo: 64-bit CTZ on arm64
[PATCH 5/8] lib/lzo: fast 8-byte copy on arm64
are simple arch patches that give a nice speedup on ARM64 and should
get merged ASAP.
Request 2 - add COPY16; *NOT* acked by me
[PATCH 2/8] lib/lzo: clean-up by introducing COPY16
is still not correct because of possible overlapping copies. I'll
address this on the weekend.
Request 3 - add lzo-rle; *NOT* acked by me
[PATCH 6/8] lib/lzo: implement run-length encoding
[PATCH 7/8] lib/lzo: separate lzo-rle from lzo
[PATCH 8/8] zram: default to lzo-rle instead of lzo
This can *NOT* be applied in the current implementation.
It (1) silently changes the compressed data format, (2) crashes on MIPS,
and (3) makes compression and decompression on typical data 10% slower on
X86_64 with our internal benchmarks, and (4) has to be carefully checked
for buffer overflows.
I understand that we want some optimizations for data with many zeros like
in the typical ZRAM use case, but the implementation will clearly need some
more work. I'll also have a look at the weekend - eg I have a nice idea
how to deal with (1).
As a final comment, I question the quality your benchmarks - combining
arch-related ARM64 improvements and algorithmic changes into one
benchmark comparision is just unprofessional marketing.
Cheers,
Markus
--
Markus Oberhumer, <markus@...rhumer.com>, http://www.oberhumer.com/
Powered by blists - more mailing lists