[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAO7dBbUvGRzwJKdcds2sr6T_KXnyubyOHtJe6kGG77eEcT1q0g@mail.gmail.com>
Date: Tue, 12 Mar 2024 16:28:15 +0800
From: Tao Liu <ltao@...hat.com>
To: Dave Rodgman <dave.rodgman@....com>, "markus@...rhumer.com" <markus@...rhumer.com>
Cc: "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Matt Sealey <Matt.Sealey@....com>,
"davem@...emloft.net" <davem@...emloft.net>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
"herbert@...dor.apana.org.au" <herbert@...dor.apana.org.au>, "minchan@...nel.org" <minchan@...nel.org>,
"nitingupta910@...il.com" <nitingupta910@...il.com>, "rpurdie@...nedhand.com" <rpurdie@...nedhand.com>,
"sergey.senozhatsky.work@...il.com" <sergey.senozhatsky.work@...il.com>,
"sonnyrao@...gle.com" <sonnyrao@...gle.com>,
"akpm@...ux-foundation.org" <akpm@...ux-foundation.org>, "sfr@...b.auug.org.au" <sfr@...b.auug.org.au>, nd <nd@....com>
Subject: Re: [PATCH v5 0/3]: lib/lzo: run-length encoding support
On Fri, Mar 8, 2024 at 8:32 PM Dave Rodgman <dave.rodgman@....com> wrote:
>
> Hi Tao,
>
>
> I don’t see any reason for the upstream LZO library not to pick up the lzo-rle algorithm from the kernel, and I would expect the same performance benefit in userspace. This is really a question for Markus (the owner/maintainer of that library).
>
Hi Markus,
Is it possible to port the lzo-rle algorithm to the lzo library, so
userspace programs such as crash-utility or drgn can use it to
decompress the kernel data? Thanks in advance!
>
> I think the simplest short-term option would be to pull in the lzo library as source into crash-utility, and carry a patch against it to add support for lzo-rle.
Hi Dave,
Thanks for the suggestion! I agree with your short-term option, this
is what we are planning to do for now. If lzo-rle has been integrated
into the lzo library, we can then delete the patch from crash-utility
code.
Thanks,
Tao Liu
>
>
> Dave
>
>
> From: Tao Liu <ltao@...hat.com>
> Date: Friday, 8 March 2024 at 03:26
> To: Dave Rodgman <dave.rodgman@....com>
> Cc: linux-kernel@...r.kernel.org <linux-kernel@...r.kernel.org>, Matt Sealey <Matt.Sealey@....com>, davem@...emloft.net <davem@...emloft.net>, gregkh@...uxfoundation.org <gregkh@...uxfoundation.org>, herbert@...dor.apana.org.au <herbert@...dor.apana.org.au>, markus@...rhumer.com <markus@...rhumer.com>, minchan@...nel.org <minchan@...nel.org>, nitingupta910@...il.com <nitingupta910@...il.com>, rpurdie@...nedhand.com <rpurdie@...nedhand.com>, sergey.senozhatsky.work@...il.com <sergey.senozhatsky.work@...il.com>, sonnyrao@...gle.com <sonnyrao@...gle.com>, akpm@...ux-foundation.org <akpm@...ux-foundation.org>, sfr@...b.auug.org.au <sfr@...b.auug.org.au>, nd <nd@....com>
> Subject: Re: [PATCH v5 0/3]: lib/lzo: run-length encoding support
>
> Hi Dave,
>
> On Tue, Feb 05, 2019 at 03:59:59PM +0000, Dave Rodgman wrote:
> > Hi,
> >
> > Following on from the previous lzo-rle patchset:
> >
> > https://lkml.org/lkml/2018/11/30/972
> >
> > This patchset contains only the RLE patches, and should be applied on top of
> > the non-RLE patches ( https://lkml.org/lkml/2019/2/5/366 ).
> >
>
> Sorry for the interruption, since it is an old patchset and discussion.
> I have a few questions on lzo-rle support, hope you can give me some
> directions, thanks in advance!
>
> 1) Is lzo-rle suitable for userspace library? I've checked the current
> userspace lzo library lzo-2.10, it seems no lzo-rle support (Please
> correct me if I'm wrong). If lzo-rle have better performance in kernel,
> then is it possible to implement one in userspace and gain better
> performance as well?
>
> 2) Currently Yulong TANG have encountered problem that, crash utility
> cannot decompress a lzo-rle compressed zram since kernel 5.1 [1], since
> there is no lzo-rle support for current lzo library, crash have to
> import the kernel source code directly into crash, which is not good for
> crash utility code maintainance. It will be better if we can update lzo
> library with lzo-rle support. I guess not only crash, but also other
> kernel debugging tools running in userspace such as drgn may also need
> this feature.
>
> Do you have any suggestions on for these?
>
> [1]: https://www.mail-archive.com/devel@lists.crash-utility.osci.io/msg00475.html
>
>
> Thanks,
> Tao Liu
>
>
> >
> > Previously, some questions were raised around the RLE patches. I've done some
> > additional benchmarking to answer these questions. In short:
> >
> > - RLE offers significant additional performance (data-dependent)
> > - I didn't measure any regressions that were clearly outside the noise
> >
> >
> > One concern with this patchset was around performance - specifically, measuring
> > RLE impact separately from Matt Sealey's patches (CTZ & fast copy). I have done
> > some additional benchmarking which I hope clarifies the benefits of each part
> > of the patchset.
> >
> > Firstly, I've captured some memory via /dev/fmem from a Chromebook with many
> > tabs open which is starting to swap, and then split this into 4178 4k pages.
> > I've excluded the all-zero pages (as zram does), and also the no-zero pages
> > (which won't tell us anything about RLE performance). This should give a
> > realistic test dataset for zram. What I found was that the data is VERY
> > bimodal: 44% of pages in this dataset contain 5% or fewer zeros, and 44%
> > contain over 90% zeros (30% if you include the no-zero pages). This supports
> > the idea of special-casing zeros in zram.
> >
> > Next, I've benchmarked four variants of lzo on these pages (on 64-bit Arm at
> > max frequency): baseline LZO; baseline + Matt Sealey's patches (aka MS);
> > baseline + RLE only; baseline + MS + RLE. Numbers are for weighted roundtrip
> > throughput (the weighting reflects that zram does more compression than
> > decompression).
> >
> > https://drive.google.com/file/d/1VLtLjRVxgUNuWFOxaGPwJYhl_hMQXpHe/view?usp=sharing
> >
> > Matt's patches help in all cases for Arm (and no effect on Intel), as expected.
> >
> > RLE also behaves as expected: with few zeros present, it makes no difference;
> > above ~75%, it gives a good improvement (50 - 300 MB/s on top of the benefit
> > from Matt's patches).
> >
> > Best performance is seen with both MS and RLE patches.
> >
> > Finally, I have benchmarked the same dataset on an x86-64 device. Here, the
> > MS patches make no difference (as expected); RLE helps, similarly as on Arm.
> > There were no definite regressions; allowing for observational error, 01%
> > (3/4178) of cases had a regression > 1 standard deviation, of which the largest
> > was 4.6% (1.2 standard deviations). I think this is probably within the noise.
> >
> > https://drive.google.com/file/d/1xCUVwmiGD0heEMx5gcVEmLBI4eLaageV/view?usp=sharing
> >
> > One point to note is that the graphs show RLE appears to help very slightly
> > with no zeros present! This is because the extra code causes the clang
> > optimiser to change code layout in a way that happens to have a significant
> > benefit. Taking baseline LZO and adding a do-nothing line like
> > "__builtin_prefetch(out_len);" immediately before the "goto next" has the same
> > effect. So this is a real, but basically spurious effect - it's small enough
> > not to upset the overall findings.
> >
> > Dave
> >
> >
Powered by blists - more mailing lists