linux-kernel - Re: [PATCH] signal: Allow RT tasks to cache one sigqueue struct

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87mtva4l6o.fsf@nanos.tec.linutronix.de>
Date:   Thu, 11 Mar 2021 00:56:47 +0100
From:   Thomas Gleixner <tglx@...utronix.de>
To:     "Eric W. Biederman" <ebiederm@...ssion.com>
Cc:     Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
        linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Oleg Nesterov <oleg@...hat.com>,
        Matt Fleming <matt@...eblueprint.co.uk>
Subject: Re: [PATCH] signal: Allow RT tasks to cache one sigqueue struct

On Wed, Mar 10 2021 at 15:57, Eric W. Biederman wrote:
> Thomas Gleixner <tglx@...utronix.de> writes:
>> IMO, not bothering with an extra counter and rlimit plus the required
>> atomic operations is just fine and having this for all tasks
>> unconditionally looks like a clear win.
>>
>> I'll post an updated version of this soonish.
>
> That looks like a good analysis.
>
> I see that there is a sigqueue_cachep.  As I recall there are per cpu
> caches and all kinds of other good stuff when using kmem_cache_alloc.
>
> Are those goodies falling down?
>
> I am just a little unclear on why a slab allocation is sufficiently
> problematic that we want to avoid it.

In the normal case it's not problematic at all. i.e. when the per cpu
cache can directly fullfil the allocation in the fast path. Once that
fails you're off into latency land...

For the usual setup probably not an issue at all, but for real time
processing it matters.

Vs. the dedicated kmem cache for sigqueue. That's a red herring. By
default kmem caches are shared/merged as I learned today and if you want
dedicated ones you need to boot with 'slab_nomerge' on the command line.

So without that option (which is of course not backwards compatible
because the original behaviour was the other way around) your signal
kmem cache might end up in a shared/merged kmem cache. Just do:

  cat /proc/slabinfo | grep sig

and the default will find:

signal_cache        6440   6440   1152   28    8 : tunables    0    0    0 : slabdata    230    230      0
sighand_cache       3952   4035   2112   15    8 : tunables    0    0    0 : slabdata    269    269      0

But of course there is no way to figure out where your cache actually
landed and then with with 'slab_nomerge' you'll get:

sigqueue            3264   3264     80   51    1 : tunables    0    0    0 : slabdata     64     64      0
signal_cache        6440   6440   1152   28    8 : tunables    0    0    0 : slabdata    230    230      0
sighand_cache       3952   4035   2112   15    8 : tunables    0    0    0 : slabdata    269    269      0

Don't worry about the 'active objects' field. That's just bonkers
because SLUB has no proper accounting for active objects. That number is
useless ...

Not even CONFIG_SLUB_STATS=y will give you anything useful. I had to
hack my own statistics into the signal code to gather these numbers
!$@...^?#!

But why I'm not surprised? This stuff is optimized for High Frequency
Trading which is useless by definition. Oh well...

Rant aside, there is no massive benefit of doing that caching in
general, but there is not much of a downside either and for particular
use cases it's useful even outside of PREEMPT_RT.

IMO, having it there unconditionally is better than yet another special
cased hackery.

Thanks,

        tglx