linux-kernel - Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter real idle path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <46086489-5a01-16e1-9314-70ae53c01952@gmail.com>
Date:   Thu, 16 Nov 2017 17:12:28 +0800
From:   Quan Xu <quan.xu0@...il.com>
To:     Thomas Gleixner <tglx@...utronix.de>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     Quan Xu <quan.xu03@...il.com>, kvm@...r.kernel.org,
        linux-doc@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        LKML <linux-kernel@...r.kernel.org>,
        virtualization@...ts.linux-foundation.org, x86@...nel.org,
        xen-devel@...ts.xenproject.org,
        Yang Zhang <yang.zhang.wz@...il.com>,
        Ingo Molnar <mingo@...hat.com>,
        "H. Peter Anvin" <hpa@...or.com>, Borislav Petkov <bp@...en8.de>,
        Kyle Huey <me@...ehuey.com>, Len Brown <len.brown@...el.com>,
        Andy Lutomirski <luto@...nel.org>,
        Tom Lendacky <thomas.lendacky@....com>,
        Tobias Klauser <tklauser@...tanz.ch>,
        Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [PATCH RFC v3 3/6] sched/idle: Add a generic poll before enter
 real idle path



On 2017-11-16 06:03, Thomas Gleixner wrote:
> On Wed, 15 Nov 2017, Peter Zijlstra wrote:
>
>> On Mon, Nov 13, 2017 at 06:06:02PM +0800, Quan Xu wrote:
>>> From: Yang Zhang <yang.zhang.wz@...il.com>
>>>
>>> Implement a generic idle poll which resembles the functionality
>>> found in arch/. Provide weak arch_cpu_idle_poll function which
>>> can be overridden by the architecture code if needed.
>> No, we want less of those magic hooks, not more.
>>
>>> Interrupts arrive which may not cause a reschedule in idle loops.
>>> In KVM guest, this costs several VM-exit/VM-entry cycles, VM-entry
>>> for interrupts and VM-exit immediately. Also this becomes more
>>> expensive than bare metal. Add a generic idle poll before enter
>>> real idle path. When a reschedule event is pending, we can bypass
>>> the real idle path.
>> Why not do a HV specific idle driver?
> If I understand the problem correctly then he wants to avoid the heavy
> lifting in tick_nohz_idle_enter() in the first place, but there is already
> an interesting quirk there which makes it exit early.  See commit
> 3c5d92a0cfb5 ("nohz: Introduce arch_needs_cpu"). The reason for this commit
> looks similar. But lets not proliferate that. I'd rather see that go away.

agreed.

Even we can get more benifit than commit 3c5d92a0cfb5 ("nohz: Introduce 
arch_needs_cpu")
in kvm guest. I won't proliferate that..

> But the irq_timings stuff is heading into the same direction, with a more
> complex prediction logic which should tell you pretty good how long that
> idle period is going to be and in case of an interrupt heavy workload this
> would skip the extra work of stopping and restarting the tick and provide a
> very good input into a polling decision.


interesting. I have tested with IRQ_TIMINGS related code, which seems 
not working so far.
Also I'd like to help as much as I can.
> This can be handled either in a HV specific idle driver or even in the
> generic core code. If the interrupt does not arrive then you can assume
> within the predicted time then you can assume that the flood stopped and
> invoke halt or whatever.
>
> That avoids all of that 'tunable and tweakable' x86 specific hackery and
> utilizes common functionality which is mostly there already.
here is some sample code. Poll for a while before enter halt in 
cpuidle_enter_state()
If I get a reschedule event, then don't try to enter halt.  (I hope this 
is the right direction as Peter mentioned in another email)

--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -210,6 +210,13 @@ int cpuidle_enter_state(struct cpuidle_device *dev, 
struct cpuidle_driver *drv,
                 target_state = &drv->states[index];
         }

+#ifdef CONFIG_PARAVIRT
+       paravirt_idle_poll();
+
+       if (need_resched())
+               return -EBUSY;
+#endif
+
         /* Take note of the planned idle state. */
         sched_idle_set_state(target_state);




thanks,

Quan
Alibaba Cloud