linux-kernel - Re: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <CAJkfWY6xWhcwfV-E5brz_qvW0v-ebqp8wYhgg_ZWyD9cUp-EJg@mail.gmail.com>
Date:   Fri, 10 Mar 2023 11:09:55 -0800
From:   Nathan Huckleberry <nhuck@...gle.com>
To:     Hillf Danton <hdanton@...a.com>
Cc:     Eric Biggers <ebiggers@...nel.org>, fsverity@...ts.linux.dev,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

On Fri, Mar 10, 2023 at 12:01 AM Hillf Danton <hdanton@...a.com> wrote:
>
> On 9 Mar 2023 21:11:47 -0800 Eric Biggers <ebiggers@...nel.org>
> > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android.  This
> > > is problematic for latency sensitive workloads like I/O post-processing.
> > >
> > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> > > scheduler latency and improves app cold startup times by ~30ms.
> >
> > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
> > too, for the same reason?
> >
> > I'm still amazed that it's such a big improvement!  I don't really need it to
> > apply this patch, but it would be very interesting to know exactly why the
> > latency is so bad with WQ_UNBOUND.

My current guess for the root cause is excessing saving/restoring of
the FPSIMD state.

> >
> > > This code was tested by running Android app startup benchmarks and
> > > measuring how long the fsverity workqueue spent in the ready queue.
> > >
> > > Before
> > > Total workqueue scheduler latency: 553800us
> > > After
> > > Total workqueue scheduler latency: 18962us
>
> Given the gap between data above and the 15253 us in diagram[1], and
> the SHA instructions[2], could you specify a bit on your test?

The test I'm running opens the Android messaging APK which is
validated with fsverity. It opens the messaging app 25 times, dropping
caches each time. The benchmark produces a Perfetto trace which we use
to compute the scheduler latency. We sum up the amount of time that
each fsverity worker spent in the ready state. The test in [1] is
similar, but may be using a different APK. These tests are not in
AOSP, so I can't share a link to them, but I would expect that fio on
a ramdisk would produce similarly good results.

>
> [1] https://lore.kernel.org/linux-erofs/20230106073502.4017276-1-dhavale@google.com/
> [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/