linux-kernel - Re: [PATCH v3 0/3] Introduce per-task latency

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <10f42efa-3750-491a-74fe-d84c9c4924e3@oracle.com>
Date:   Wed, 19 Feb 2020 12:16:59 -0500
From:   chris hyser <chris.hyser@...cle.com>
To:     David Laight <David.Laight@...LAB.COM>,
        Parth Shah <parth@...ux.ibm.com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "patrick.bellasi@...bug.net" <patrick.bellasi@...bug.net>,
        "valentin.schneider@....com" <valentin.schneider@....com>,
        "dhaval.giani@...cle.com" <dhaval.giani@...cle.com>,
        "dietmar.eggemann@....com" <dietmar.eggemann@....com>
Cc:     "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "qais.yousef@....com" <qais.yousef@....com>,
        "pavel@....cz" <pavel@....cz>,
        "qperret@...rret.net" <qperret@...rret.net>,
        "pjt@...gle.com" <pjt@...gle.com>, "tj@...nel.org" <tj@...nel.org>
Subject: Re: [PATCH v3 0/3] Introduce per-task latency_nice for scheduler
 hints

On 2/19/20 6:18 AM, David Laight wrote:
> From: chris hyser
>> Sent: 18 February 2020 23:00
> ...
>> All, I was asked to take a look at the original latency_nice patchset.
>> First, to clarify objectives, Oracle is not
>> interested in trading throughput for latency.
>> What we found is that the DB has specific tasks which do very little but
>> need to do this as absolutely quickly as possible, ie extreme latency
>> sensitivity. Second, the key to latency reduction
>> in the task wakeup path seems to be limiting variations of "idle cpu" search.
>> The latter particularly interests me as an example of "platform size
>> based latency" which I believe to be important given all the varying size
>> VMs and containers.
> 
>  From my experiments there are a few things that seem to affect latency
> of waking up real time (sched fifo) tasks on a normal kernel:

Sorry. I was only ever talking about sched_other as per the original patchset. I realize the term extreme latency 
sensitivity may have caused confusion. What that means to DB people is no doubt different than audio people. :-)

> 
> 1) The time taken for the (intel x86) cpu to wakeup from monitor/mwait.
>     If the cpu is allowed to enter deeper sleep states this can take 900us.
>     Any changes to this are system-wide not process specific.
> 
> 2) If the cpu an RT process last ran on (ie the one it is woken on) is
>     running in kernel, the process switch won't happen until cond_reshed()
>     is called.
>     On my system the code to flush the display frame buffer takes 3.3ms.
>     Compiling a kernel with CONFIG_PREEMPT=y will reduce this.
> 
> 3) If a hardware interrupt happens just after the process is woken
>     then you have to wait until it finishes and any 'softint' work
>     that is scheduled on the same cpu finishes.
>     The ethernet driver transmit completions an receive ring filling
>     can easily take 1ms.
>     Booting with 'threadirq' might help this.
> 
> 4) If you need to acquire a lock/futex then you need to allow for the
>     process that holds it being delayed by a hardware interrupt (etc).
>     So even if the lock is only held for a few instructions it can take
>     a long time to acquire.
>     (I need to change some linked lists to arrays indexed by an atomically
>     incremented global index.)
> 
> FWIW I can't imagine how a database can have anything that is that
> latency sensitive.
> We are doing lots of channels of audio processing and have a lot of work
> to do within 10ms to avoid audible errors.

There are existing internal numbers that I will ultimately have to duplicate that show that simply short-cutting these 
idle cpu searches has a significant benefit on DB performance on large hardware. However that was for a different 
patchset involving things I don't like so I'm still exploring how to achieve similar results within the latency_nice 
framework.

-chrish