lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAB=BE-SQZA7gETEvxnHmy0FDQ182fUSRoa0bJBNouN33SFx3hQ@mail.gmail.com>
Date:   Thu, 23 Feb 2023 16:13:50 -0800
From:   Sandeep Dhavale <dhavale@...gle.com>
To:     Gao Xiang <hsiangkao@...ux.alibaba.com>
Cc:     Eric Biggers <ebiggers@...nel.org>, linux-erofs@...ts.ozlabs.org,
        LKML <linux-kernel@...r.kernel.org>,
        Nathan Huckleberry <nhuck@...gle.com>,
        Yue Hu <huyue2@...lpad.com>, kernel-team@...roid.com
Subject: Re: [PATCH v5] erofs: add per-cpu threads for decompression as an option

On Thu, Feb 23, 2023 at 11:08 AM Gao Xiang <hsiangkao@...ux.alibaba.com> wrote:
>
>
>
> On 2023/2/24 02:52, Gao Xiang wrote:
> > Hi Eric,
> >
> > On 2023/2/24 02:29, Eric Biggers wrote:
> >> Hi,
> >>
> >> On Wed, Feb 08, 2023 at 05:33:22PM +0800, Gao Xiang wrote:
> >>> From: Sandeep Dhavale <dhavale@...gle.com>
> >>>
> >>> Using per-cpu thread pool we can reduce the scheduling latency compared
> >>> to workqueue implementation. With this patch scheduling latency and
> >>> variation is reduced as per-cpu threads are high priority kthread_workers.
> >>>
> >>> The results were evaluated on arm64 Android devices running 5.10 kernel.
> >>
> >> I see that this patch was upstreamed.  Meanwhile, commit c25da5b7baf1d
> >> ("dm verity: stop using WQ_UNBOUND for verify_wq") was also upstreamed.
> >>
> >> Why is this more complex solution better than simply removing WQ_UNBOUND?
> >
> > I do think it's a specific issue on specific arm64 hardwares (assuming
> > qualcomm, I don't know) since WQ_UNBOUND decompression once worked well
> > on the hardwares I once used (I meant Hisilicon, and most x86_64 CPUs,
> > I tested at that time) compared with per-cpu workqueue.
> >
> > Also RT threads are also matchable with softirq approach.  In addition,
> > many configurations work without dm-verity.
>
> Also for dm-verity use cases, EROFS will reuse the dm-verity context
> directly rather than kick off a new context.  Yet I'm not sure there
> are still users using EROFS without dm-verity as I said above.
>
> Anyway, the original scheduling issue sounds strange for me (with my
> own landing experiences) in the beginning, and I have no way to
> confirm the cases.  Just hopefully it could be resolved from the
> developer inputs and finally benefit to end users.
>
> I've already did my own stress test with this new configuration as
> well without explicit regression.
>
Hi Eric,
>From the dm-verity patch description of removing WQ_UNBOUND it seems
Nathan saw the EROFS wait time was reduced by 51% whereas high pri per-cpu
threads showed me sched latency reduced on avg by ~80%.

So from the description at least it does not look like both patches have
equal benefits. I can't argue about the size and complexity of removing
WQ_UNBOUND if it gives the same benefits, that would have been great.

I will do the app launch tests again to compare these and share.

Thanks,
Sandeep.


> >
> > I don't have more time to dig into it for now but it's important to
> > resolve this problem on some arm64 hardwares first.  Also it's an
> > optional stuff, if the root cause of workqueue issue can be resolved,
> > we could consider drop it then.
> >
> > Thsnka,
> > Gao Xiang
> >
> >>
> >> - Eric

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ