lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Date:   Fri, 10 Mar 2023 11:09:55 -0800
From:   Nathan Huckleberry <nhuck@...gle.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Eric Biggers <ebiggers@...nel.org>, fsverity@...ts.linux.dev,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

On Fri, Mar 10, 2023 at 12:01 AM Hillf Danton <hdanton@...a.com> wrote:
>
> On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@...nel.org>
> > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
> > > is problematic for latency sensitive workloads like I/O post-processing.
> > >
> > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> > > scheduler latency and improves app cold startup times by ~30ms.
> >
> > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
> > too, for the same reason?
> >
> > I'm still amazed that it's such a big improvement!  I don't really need it to
> > apply this patch, but it would be very interesting to know exactly why the
> > latency is so bad with WQ_UNBOUND.

My current guess for the root cause is excessing saving/restoring of
the FPSIMD state.

> >
> > > This code was tested by running Android app startup benchmarks and
> > > measuring how long the fsverity workqueue spent in the ready queue.
> > >
> > > Before
> > > Total workqueue scheduler latency: 553800us
> > > After
> > > Total workqueue scheduler latency: 18962us
>
> Given the gap between data above and the 15253 us in diagram[1], and
> the SHA instructions[2], could you specify a bit on your test?

The test I'm running opens the Android messaging APK which is
validated with fsverity. It opens the messaging app 25 times, dropping
caches each time. The benchmark produces a Perfetto trace which we use
to compute the scheduler latency. We sum up the amount of time that
each fsverity worker spent in the ready state. The test in [1] is
similar, but may be using a different APK. These tests are not in
AOSP, so I can't share a link to them, but I would expect that fio on
a ramdisk would produce similarly good results.

>
> [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/
> [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ