linux-kernel - Re: [RFC PATCH v0.1 0/9] UMCG early preview/RFC patchset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAG48ez19esMTd4XFMqOk-XsEQOPHRJUYd0O-qiyDZv=aEHYnmg@mail.gmail.com>
Date:   Sat, 22 May 2021 01:08:53 +0200
From:   Jann Horn <jannh@...gle.com>
To:     Peter Oskolkov <posk@...gle.com>
Cc:     Andrei Vagin <avagin@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-api <linux-api@...r.kernel.org>,
        Paul Turner <pjt@...gle.com>, Ben Segall <bsegall@...gle.com>,
        Peter Oskolkov <posk@...k.io>,
        Joel Fernandes <joel@...lfernandes.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Jim Newsome <jnewsome@...project.org>
Subject: Re: [RFC PATCH v0.1 0/9] UMCG early preview/RFC patchset

On Fri, May 21, 2021 at 9:14 PM Peter Oskolkov <posk@...gle.com> wrote:
> On Fri, May 21, 2021 at 11:44 AM Andrei Vagin <avagin@...gle.com> wrote:
> > On Thu, May 20, 2021 at 11:36 AM Peter Oskolkov <posk@...gle.com> wrote:
> >>
> >> As indicated earlier in the FUTEX_SWAP patchset:
> >>
> >> https://lore.kernel.org/lkml/20200722234538.166697-1-posk@posk.io/
> >
> >
> > Hi Peter,
> >
> > Do you have benchmark results? How fast is it compared with futex_swap and the google switchto?
>
> Hi Andrei,
>
> I did not run benchmarks on the same machine/kernel, but umcg_swap
> between "core" tasks (your use case for gVisor) should be somewhat
> faster than futex_swap, as there is no reading from the userspace and
> no futex hash lookup/dequeue ops;

The futex code currently creates and destroys hash table elements on
wait/wake, which does involve locking, but you could probably avoid
that if you built a faster futex variant optimized for the
single-waiter case that uses a bit more kernel memory to keep a
persistent hash table element (with RCU freeing) per pre-registered
lock address around? Whether that'd be significantly faster, I don't
know.

(As a sidenote, the futex code could slow down if the number of futex
buckets isn't well-calibrated - meaning you have something like >200
distinct futex addresses per CPU core, see futex_init(). Then
futex_init() probably needs to be tuned a bit. Actually, on my work
laptop, this is what I see right now (not counting multiple waiters on
the same address in the same process, since they intentionally occupy
the same bucket):

# for tasks_dir in /proc/*/task; do cat $tasks_dir/*/syscall | grep
'^202 ' | cut -d' ' -f2 | sort | uniq; done | wc -l
1193
# cat /sys/devices/system/cpu/possible
0-3
# gdb -core=/proc/kcore -ex "print ((unsigned long *)(0x$(grep
__futex_data /proc/kallsyms | cut -d' ' -f1)))[1]" -batch
[...]
$1 = 1024

So the load factor of the futex hash table on this machine right now
is ~117%, which I think is quite a bit higher than you'd normally want
in a hash table? I don't know how representative that is though. Seems
to mostly come from the tons of Chrome processes.)