linux-kernel - Re: [PATCH v2 7/9] sched: define TIF_ALLOW

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <39998df7-8882-43ae-8c7e-936c24eb4041@app.fastmail.com>
Date:   Mon, 18 Sep 2023 20:21:11 -0700
From:   "Andy Lutomirski" <luto@...nel.org>
To:     "Ankur Arora" <ankur.a.arora@...cle.com>,
        "Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>,
        linux-mm@...ck.org, "the arch/x86 maintainers" <x86@...nel.org>
Cc:     "Andrew Morton" <akpm@...ux-foundation.org>,
        "Borislav Petkov" <bp@...en8.de>,
        "Dave Hansen" <dave.hansen@...ux.intel.com>,
        "H. Peter Anvin" <hpa@...or.com>, "Ingo Molnar" <mingo@...hat.com>,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        "Matthew Wilcox (Oracle)" <willy@...radead.org>, mgorman@...e.de,
        "Peter Zijlstra (Intel)" <peterz@...radead.org>,
        "Steven Rostedt" <rostedt@...dmis.org>,
        "Thomas Gleixner" <tglx@...utronix.de>,
        "Jon Grimm" <jon.grimm@....com>, "Bharata B Rao" <bharata@....com>,
        raghavendra.kt@....com, boris.ostrovsky@...cle.com,
        konrad.wilk@...cle.com,
        "Linus Torvalds" <torvalds@...ux-foundation.org>
Subject: Re: [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED

On Wed, Aug 30, 2023, at 11:49 AM, Ankur Arora wrote:
> On preempt_model_none() or preempt_model_voluntary() configurations
> rescheduling of kernel threads happens only when they allow it, and
> only at explicit preemption points, via calls to cond_resched() or
> similar.
>
> That leaves out contexts where it is not convenient to periodically
> call cond_resched() -- for instance when executing a potentially long
> running primitive (such as REP; STOSB.)
>

So I said this not too long ago in the context of Xen PV, but maybe it's time to ask it in general:

Why do we support anything other than full preempt?  I can think of two reasons, neither of which I think is very good:

1. Once upon a time, tracking preempt state was expensive.  But we fixed that.

2. Folklore suggests that there's a latency vs throughput tradeoff, and serious workloads, for some definition of serious, want throughput, so they should run without full preemption.

I think #2 is a bit silly.  If you want throughput, and you're busy waiting for a CPU that wants to run you, but it's not because it's running some low-priority non-preemptible thing (because preempt is set to none or volunary), you're not getting throughput.  If you want to get keep some I/O resource busy to get throughput, but you have excessive latency getting scheduled, you don't get throughput.

If the actual problem is that there's a workload that performs better when scheduling is delayed (which preempt=none and preempt=volunary do, essentialy at random), then maybe someone should identify that workload and fix the scheduler.

So maybe we should just very strongly encourage everyone to run with full preempt and simplify the kernel?