lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM4kBBKD6MAOaBvwC_Wedf_zmzmt-gm=TrAF1Lh7pVbNtcsFZg@mail.gmail.com>
Date:   Tue, 22 Dec 2020 01:59:51 +0100
From:   Vitaly Wool <vitaly.wool@...sulko.com>
To:     "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
Cc:     Shakeel Butt <shakeelb@...gle.com>,
        Minchan Kim <minchan@...nel.org>,
        Mike Galbraith <efault@....de>,
        LKML <linux-kernel@...r.kernel.org>,
        linux-mm <linux-mm@...ck.org>,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        NitinGupta <ngupta@...are.org>,
        Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>,
        Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock

On Tue, Dec 22, 2020 at 12:37 AM Song Bao Hua (Barry Song)
<song.bao.hua@...ilicon.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Song Bao Hua (Barry Song)
> > Sent: Tuesday, December 22, 2020 11:38 AM
> > To: 'Vitaly Wool' <vitaly.wool@...sulko.com>
> > Cc: Shakeel Butt <shakeelb@...gle.com>; Minchan Kim <minchan@...nel.org>; Mike
> > Galbraith <efault@....de>; LKML <linux-kernel@...r.kernel.org>; linux-mm
> > <linux-mm@...ck.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>;
> > NitinGupta <ngupta@...are.org>; Sergey Senozhatsky
> > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > <akpm@...ux-foundation.org>
> > Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> >
> >
> > > -----Original Message-----
> > > From: Vitaly Wool [mailto:vitaly.wool@...sulko.com]
> > > Sent: Tuesday, December 22, 2020 11:12 AM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> > > Cc: Shakeel Butt <shakeelb@...gle.com>; Minchan Kim <minchan@...nel.org>;
> > Mike
> > > Galbraith <efault@....de>; LKML <linux-kernel@...r.kernel.org>; linux-mm
> > > <linux-mm@...ck.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>;
> > > NitinGupta <ngupta@...are.org>; Sergey Senozhatsky
> > > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > > <akpm@...ux-foundation.org>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song)
> > > <song.bao.hua@...ilicon.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Shakeel Butt [mailto:shakeelb@...gle.com]
> > > > > Sent: Tuesday, December 22, 2020 10:03 AM
> > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> > > > > Cc: Vitaly Wool <vitaly.wool@...sulko.com>; Minchan Kim
> > > <minchan@...nel.org>;
> > > > > Mike Galbraith <efault@....de>; LKML <linux-kernel@...r.kernel.org>;
> > > linux-mm
> > > > > <linux-mm@...ck.org>; Sebastian Andrzej Siewior <bigeasy@...utronix.de>;
> > > > > NitinGupta <ngupta@...are.org>; Sergey Senozhatsky
> > > > > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > > > > <akpm@...ux-foundation.org>
> > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > >
> > > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > > > > <song.bao.hua@...ilicon.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Shakeel Butt [mailto:shakeelb@...gle.com]
> > > > > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > > > > To: Vitaly Wool <vitaly.wool@...sulko.com>
> > > > > > > Cc: Minchan Kim <minchan@...nel.org>; Mike Galbraith <efault@....de>;
> > > LKML
> > > > > > > <linux-kernel@...r.kernel.org>; linux-mm <linux-mm@...ck.org>; Song
> > > Bao
> > > > > Hua
> > > > > > > (Barry Song) <song.bao.hua@...ilicon.com>; Sebastian Andrzej Siewior
> > > > > > > <bigeasy@...utronix.de>; NitinGupta <ngupta@...are.org>; Sergey
> > > > > Senozhatsky
> > > > > > > <sergey.senozhatsky.work@...il.com>; Andrew Morton
> > > > > > > <akpm@...ux-foundation.org>
> > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > > >
> > > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@...sulko.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@...nel.org>
> > wrote:
> > > > > > > > >
> > > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases
> > > it
> > > > > > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > > > > > about scheduling in atomic context.
> > > > > > > > > >
> > > > > > > > > > To fix that and to improve RT properties of zsmalloc, remove
> > that
> > > > > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > > > > >
> > > > > > > > > I don't want to use such open code for the lock.
> > > > > > > > >
> > > > > > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug
> > > and
> > > > > > > > > introduce this patch with allowing preemption enabling.
> > > > > > > >
> > > > > > > > This understanding is upside down. The code in zswap you are referring
> > > > > > > > to is not buggy.  You may claim that it is suboptimal but there is
> > > > > > > > nothing wrong in taking a mutex.
> > > > > > > >
> > > > > > >
> > > > > > > Is this suboptimal for all or just the hardware accelerators? Sorry,
> > > I
> > > > > > > am not very familiar with the crypto API. If I select lzo or lz4 as
> > > a
> > > > > > > zswap compressor will the [de]compression be async or sync?
> > > > > >
> > > > > > Right now, in crypto subsystem, new drivers are required to write based
> > > on
> > > > > > async APIs. The old sync API can't work in new accelerator drivers as
> > > they
> > > > > > are not supported at all.
> > > > > >
> > > > > > Old drivers are used to sync, but they've got async wrappers to support
> > > async
> > > > > > APIs. Eg.
> > > > > > crypto: acomp - add support for lz4 via scomp
> > > > > >
> > > > >
> > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > > > > >
> > > > > > crypto: acomp - add support for lzo via scomp
> > > > > >
> > > > >
> > >
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > > > > >
> > > > > > so they are supporting async APIs but they are still working in sync
> > mode
> > > > > as
> > > > > > those old drivers don't sleep.
> > > > > >
> > > > >
> > > > > Good to know that those are sync because I want them to be sync.
> > > > > Please note that zswap is a cache in front of a real swap and the load
> > > > > operation is latency sensitive as it comes in the page fault path and
> > > > > directly impacts the applications. I doubt decompressing synchronously
> > > > > a 4k page on a cpu will be costlier than asynchronously decompressing
> > > > > the same page from hardware accelerators.
> > > >
> > > > If you read the old paper:
> > > >
> > >
> > https://www.ibm.com/support/pages/new-linux-zswap-compression-functionalit
> > > y
> > > > Because the hardware accelerator speeds up compression, looking at the zswap
> > > > metrics we observed that there were more store and load requests in a given
> > > > amount of time, which filled up the zswap pool faster than a software
> > > > compression run. Because of this behavior, we set the max_pool_percent
> > > > parameter to 30 for the hardware compression runs - this means that zswap
> > > > can use up to 30% of the 10GB of total memory.
> > > >
> > > > So using hardware accelerators, we get a chance to speed up compression
> > > > while decreasing cpu utilization.
> > > >
> > > > BTW, If it is not easy to change zsmalloc, one quick workaround we might
> > do
> > > > in zswap is adding the below after applying Mike's original patch:
> > > >
> > > > if(in_atomic()) /* for zsmalloc */
> > > >         while(!try_wait_for_completion(&req->done);
> > > > else /* for zbud, z3fold */
> > > >         crypto_wait_req(....);
> > >
> > > I don't think I'm going to ack this, sorry.
> > >
> >
> > Fair enough. And I am also thinking if we can move zpool_unmap_handle()
> > quite after zpool_map_handle() as below:
> >
> >       dlen = PAGE_SIZE;
> >       src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
> >       if (zpool_evictable(entry->pool->zpool))
> >               src += sizeof(struct zswap_header);
> > +     zpool_unmap_handle(entry->pool->zpool, entry->handle);
> >
> >       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> >       mutex_lock(acomp_ctx->mutex);
> >       sg_init_one(&input, src, entry->length);
> >       sg_init_table(&output, 1);
> >       sg_set_page(&output, page, PAGE_SIZE, 0);
> >       acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length,
> > dlen);
> >       ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> > &acomp_ctx->wait);
> >       mutex_unlock(acomp_ctx->mutex);
> >
> > -     zpool_unmap_handle(entry->pool->zpool, entry->handle);
> >
> > Since src is always low memory and we only need its virtual address
> > to get the page of src in sg_init_one(). We don't actually read it
> > by CPU anywhere.
>
> The below code might be better:
>
>         dlen = PAGE_SIZE;
>         src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
>         if (zpool_evictable(entry->pool->zpool))
>                 src += sizeof(struct zswap_header);
>
>         acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
>
> +       zpool_unmap_handle(entry->pool->zpool, entry->handle);
>
>         mutex_lock(acomp_ctx->mutex);
>         sg_init_one(&input, src, entry->length);
>         sg_init_table(&output, 1);
>         sg_set_page(&output, page, PAGE_SIZE, 0);
>         acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
>         ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
>         mutex_unlock(acomp_ctx->mutex);
>
> -       zpool_unmap_handle(entry->pool->zpool, entry->handle);

I don't see how this is going to work since we can't guarantee src
will be a valid pointer after the zpool_unmap_handle() call, can we?
Could you please elaborate?

~Vitaly

> >
> > > Best regards,
> > >    Vitaly
> > >
> > > > crypto_wait_req() is actually doing wait_for_completion():
> > > > static inline int crypto_wait_req(int err, struct crypto_wait *wait)
> > > > {
> > > >         switch (err) {
> > > >         case -EINPROGRESS:
> > > >         case -EBUSY:
> > > >                 wait_for_completion(&wait->completion);
> > > >                 reinit_completion(&wait->completion);
> > > >                 err = wait->err;
> > > >                 break;
> > > >         }
> > > >
> > > >         return err;
> > > > }
>
> Thanks
> Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ