lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 21 Dec 2020 21:25:40 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Shakeel Butt <shakeelb@...gle.com>
CC:     Vitaly Wool <vitaly.wool@...sulko.com>,
        Minchan Kim <minchan@...nel.org>,
        Mike Galbraith <efault@....de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        NitinGupta <ngupta@...are.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock



> -----Original Message-----
> From: Shakeel Butt [mailto:shakeelb@...gle.com]
> Sent: Tuesday, December 22, 2020 10:03 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> Cc: Vitaly Wool <vitaly.wool@...sulko.com>; Minchan Kim <minchan@...nel.org>;
> Mike Galbraith <efault@....de>; LKML <linux-kernel@...r.kernel.org>; linux-mm
> <linux-mm@...ck.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>;
> NitinGupta <ngupta@...are.org>; Sergey Senozhatsky
> <sergey.senozhatsky.work@...il.com>; Andrew Morton
> <akpm@...ux-foundation.org>
> Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> 
> On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> <song.bao.hua@...ilicon.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Shakeel Butt [mailto:shakeelb@...gle.com]
> > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > To: Vitaly Wool <vitaly.wool@...sulko.com>
> > > Cc: Minchan Kim <minchan@...nel.org>; Mike Galbraith <efault@....de>; LKML
> > > <linux-kernel@...r.kernel.org>; linux-mm <linux-mm@...ck.org>; Song Bao
> Hua
> > > (Barry Song) <song.bao.hua@...ilicon.com>; Sebastian Andrzej Siewior
> > > <bigeasy@...utronix.de>; NitinGupta <ngupta@...are.org>; Sergey
> Senozhatsky
> > > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > > <akpm@...ux-foundation.org>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@...sulko.com>
> wrote:
> > > >
> > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@...nel.org> wrote:
> > > > >
> > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > zsmalloc takes bit spinlock in its _map() callback and releases it
> > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > about scheduling in atomic context.
> > > > > >
> > > > > > To fix that and to improve RT properties of zsmalloc, remove that
> > > > > > bit spinlock completely and use a bit flag instead.
> > > > >
> > > > > I don't want to use such open code for the lock.
> > > > >
> > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > splat bug and you want to improve zsmalloc to fix the zswap bug and
> > > > > introduce this patch with allowing preemption enabling.
> > > >
> > > > This understanding is upside down. The code in zswap you are referring
> > > > to is not buggy.  You may claim that it is suboptimal but there is
> > > > nothing wrong in taking a mutex.
> > > >
> > >
> > > Is this suboptimal for all or just the hardware accelerators? Sorry, I
> > > am not very familiar with the crypto API. If I select lzo or lz4 as a
> > > zswap compressor will the [de]compression be async or sync?
> >
> > Right now, in crypto subsystem, new drivers are required to write based on
> > async APIs. The old sync API can't work in new accelerator drivers as they
> > are not supported at all.
> >
> > Old drivers are used to sync, but they've got async wrappers to support async
> > APIs. Eg.
> > crypto: acomp - add support for lz4 via scomp
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> >
> > crypto: acomp - add support for lzo via scomp
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> >
> > so they are supporting async APIs but they are still working in sync mode
> as
> > those old drivers don't sleep.
> >
> 
> Good to know that those are sync because I want them to be sync.
> Please note that zswap is a cache in front of a real swap and the load
> operation is latency sensitive as it comes in the page fault path and
> directly impacts the applications. I doubt decompressing synchronously
> a 4k page on a cpu will be costlier than asynchronously decompressing
> the same page from hardware accelerators.

If you read the old paper:
https://www.ibm.com/support/pages/new-linux-zswap-compression-functionality
Because the hardware accelerator speeds up compression, looking at the zswap
metrics we observed that there were more store and load requests in a given
amount of time, which filled up the zswap pool faster than a software
compression run. Because of this behavior, we set the max_pool_percent
parameter to 30 for the hardware compression runs - this means that zswap
can use up to 30% of the 10GB of total memory.

So using hardware accelerators, we get a chance to speed up compression
while decreasing cpu utilization.

BTW, If it is not easy to change zsmalloc, one quick workaround we might do
in zswap is adding the below after applying Mike's original patch:

if(in_atomic()) /* for zsmalloc */
	while(!try_wait_for_completion(&req->done);
else /* for zbud, z3fold */
	crypto_wait_req(....);

crypto_wait_req() is actually doing wait_for_completion():
static inline int crypto_wait_req(int err, struct crypto_wait *wait)
{
	switch (err) {
	case -EINPROGRESS:
	case -EBUSY:
		wait_for_completion(&wait->completion);
		reinit_completion(&wait->completion);
		err = wait->err;
		break;
	}

	return err;
}

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ