linux-kernel - Re: [PATCH] zsmalloc: do not use bit_spin

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALvZod7rj4X3M5o5agCzzSKkuLhb8z0Q+41sqNN2bcJLa=z5vA@mail.gmail.com>
Date:   Mon, 21 Dec 2020 14:46:28 -0800
From:   Shakeel Butt <shakeelb@...gle.com>
To:     "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
Cc:     Vitaly Wool <vitaly.wool@...sulko.com>,
        Minchan Kim <minchan@...nel.org>,
        Mike Galbraith <efault@....de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        NitinGupta <ngupta@...are.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock

On Mon, Dec 21, 2020 at 1:30 PM Song Bao Hua (Barry Song)
<song.bao.hua@...ilicon.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Shakeel Butt [mailto:shakeelb@...gle.com]
> > Sent: Tuesday, December 22, 2020 10:03 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> > Cc: Vitaly Wool <vitaly.wool@...sulko.com>; Minchan Kim <minchan@...nel.org>;
> > Mike Galbraith <efault@....de>; LKML <linux-kernel@...r.kernel.org>; linux-mm
> > <linux-mm@...ck.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>;
> > NitinGupta <ngupta@...are.org>; Sergey Senozhatsky
> > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > <akpm@...ux-foundation.org>
> > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > <song.bao.hua@...ilicon.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Shakeel Butt [mailto:shakeelb@...gle.com]
> > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > To: Vitaly Wool <vitaly.wool@...sulko.com>
> > > > Cc: Minchan Kim <minchan@...nel.org>; Mike Galbraith <efault@....de>; LKML
> > > > <linux-kernel@...r.kernel.org>; linux-mm <linux-mm@...ck.org>; Song Bao
> > Hua
> > > > (Barry Song) <song.bao.hua@...ilicon.com>; Sebastian Andrzej Siewior
> > > > <bigeasy@...utronix.de>; NitinGupta <ngupta@...are.org>; Sergey
> > Senozhatsky
> > > > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > > > <akpm@...ux-foundation.org>
> > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > >
> > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@...sulko.com>
> > wrote:
> > > > >
> > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@...nel.org> wrote:
> > > > > >
> > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases it
> > > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > > about scheduling in atomic context.
> > > > > > >
> > > > > > > To fix that and to improve RT properties of zsmalloc, remove that
> > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > >
> > > > > > I don't want to use such open code for the lock.
> > > > > >
> > > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug and
> > > > > > introduce this patch with allowing preemption enabling.
> > > > >
> > > > > This understanding is upside down. The code in zswap you are referring
> > > > > to is not buggy.  You may claim that it is suboptimal but there is
> > > > > nothing wrong in taking a mutex.
> > > > >
> > > >
> > > > Is this suboptimal for all or just the hardware accelerators? Sorry, I
> > > > am not very familiar with the crypto API. If I select lzo or lz4 as a
> > > > zswap compressor will the [de]compression be async or sync?
> > >
> > > Right now, in crypto subsystem, new drivers are required to write based on
> > > async APIs. The old sync API can't work in new accelerator drivers as they
> > > are not supported at all.
> > >
> > > Old drivers are used to sync, but they've got async wrappers to support async
> > > APIs. Eg.
> > > crypto: acomp - add support for lz4 via scomp
> > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > >
> > > crypto: acomp - add support for lzo via scomp
> > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > >
> > > so they are supporting async APIs but they are still working in sync mode
> > as
> > > those old drivers don't sleep.
> > >
> >
> > Good to know that those are sync because I want them to be sync.
> > Please note that zswap is a cache in front of a real swap and the load
> > operation is latency sensitive as it comes in the page fault path and
> > directly impacts the applications. I doubt decompressing synchronously
> > a 4k page on a cpu will be costlier than asynchronously decompressing
> > the same page from hardware accelerators.
>
> If you read the old paper:
> https://www.ibm.com/support/pages/new-linux-zswap-compression-functionality
> Because the hardware accelerator speeds up compression, looking at the zswap
> metrics we observed that there were more store and load requests in a given
> amount of time, which filled up the zswap pool faster than a software
> compression run. Because of this behavior, we set the max_pool_percent
> parameter to 30 for the hardware compression runs - this means that zswap
> can use up to 30% of the 10GB of total memory.
>
> So using hardware accelerators, we get a chance to speed up compression
> while decreasing cpu utilization.
>

I don't care much about the compression. It's the decompression or
more specifically the latency of decompression I really care about.

Compression happens on reclaim, so latency is not really an issue.
Reclaim can be pressure-based or proactive. I think async batched
compression by accelerators makes a lot of sense. Though I doubt zswap
is the right layer for that. To me adding "async batched compression
support by accelerators" in zram looks more natural as the kernel
already has async block I/O support.

For decompression, I would like as low latency as possible which I
think is only possible by doing decompression on a cpu synchronously.