lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Oct 2023 11:42:34 -0400
From:   Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:     Steven Rostedt <rostedt@...dmis.org>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ankur Arora <ankur.a.arora@...cle.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        linux-mm@...ck.org, x86@...nel.org, akpm@...ux-foundation.org,
        luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
        jon.grimm@....com, bharata@....com, raghavendra.kt@....com,
        boris.ostrovsky@...cle.com, konrad.wilk@...cle.com,
        jgross@...e.com, andrew.cooper3@...rix.com,
        Joel Fernandes <joel@...lfernandes.org>,
        Youssef Esmat <youssefesmat@...omium.org>,
        Vineeth Pillai <vineethrp@...gle.com>,
        Suleiman Souhlal <suleiman@...gle.com>,
        Ingo Molnar <mingo@...nel.org>,
        Daniel Bristot de Oliveira <bristot@...nel.org>
Subject: Re: [POC][RFC][PATCH] sched: Extended Scheduler Time Slice

On 2023-10-25 10:31, Steven Rostedt wrote:
> On Wed, 25 Oct 2023 15:55:45 +0200
> Peter Zijlstra <peterz@...radead.org> wrote:

[...]

After digging lore for context, here are some thoughts about the actual
proposal: AFAIU the intent here is to boost the scheduling slice for a
userspace thread running with a mutex held so it can complete faster,
and therefore reduce contention.

I suspect this is not completely unrelated to priority inheritance
futexes, except that one goal stated by Steven is to increase the
owner slice without requiring to call a system call on the fast-path.

Compared to PI futexes, I think Steven's proposal misses the part
where a thread waiting on a futex boosts the lock owner's priority
so it can complete faster. By making the lock owner selfishly claim
that it needs a larger scheduling slice, it opens the door to
scheduler disruption, and it's hard to come up with upper-bounds
that work for all cases.

Hopefully I'm not oversimplifying if I state that we have mainly two
actors to consider:

[A] the lock owner thread

[B] threads that block trying to acquire the lock

The fast-path here is [A]. [B] can go through a system call, I don't
think it matters at all.

So perhaps we can extend the rseq per-thread area with a field that
implements a "held locks" list that allows [A] to let the kernel know
that it is currently holding a set of locks (those can be chained when
locks are nested). It would be updated on lock/unlock with just a few
stores in userspace.

Those lock addresses could then be used as keys for private locks,
or transformed into inode/offset keys for shared-memory locks. Threads
[B] blocking trying to acquire the lock can call a system call which
would boost the lock owner's slice and/or priority for a given lock key.

When the scheduler preempts [A], it would check whether the rseq
per-thread area has a "held locks" field set and use this information
to find the slice/priority boost which are currently active for each
lock, and use this information to boost the task slice/priority
accordingly.

A scheme like this should allow lock priority inheritance without
requiring system calls on the userspace lock/unlock fast path.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

Powered by blists - more mailing lists