linux-kernel - Re: [PATCH][KVM][retry 1] Add support for Pause Filtering to AMD SVM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4A083B69.6010702@redhat.com>
Date:	Mon, 11 May 2009 17:51:21 +0300
From:	Avi Kivity <avi@...hat.com>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Mark Langsdorf <mark.langsdorf@....com>,
	Joerg Roedel <joerg.roedel@....com>, kvm@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH][KVM][retry 1] Add support for Pause Filtering to AMD
 SVM

Ingo Molnar wrote:
> * Avi Kivity <avi@...hat.com> wrote:
>
>   
>>> I.e. this is a somewhat poor solution as far as scheduling goes. 
>>> But i'm wondering what the CPU side does. Can REP-NOP really take 
>>> thousands of cycles? If yes, under what circumstances?
>>>       
>> The guest is running rep-nop in a loop while trying to acquire a 
>> spinlock.  The hardware detects this (most likely, repeated 
>> rep-nop with the same rip) and exits.  We can program the loop 
>> count; obviously if we're spinning for only a short while it's 
>> better to keep spinning while hoping the lock will be released 
>> soon.
>>
>> The idea is to detect that the guest is not making forward 
>> progress and yield.  If I could tell the scheduler, you may charge 
>> me a couple of milliseconds, I promise not to sue, that would be 
>> ideal. [...]
>>     
>
> Ok, with such a waiver, who could refuse?
>
> This really needs a new kernel-internal scheduler API though, which 
> does a lot of fancy things to do:
>
>         se->vruntime += 1000000;
>
> i.e. add 1 msec worth of nanoseconds to the task's timeline. (first 
> remove it from the rbtree, then add it back, and nice-weight it as 
> well) 

I suspected it would be as simple as this.

> And only do it if there's other tasks running on this CPU or 
> so.
>   

What would happen if there weren't?  I'd guess the task would continue 
running (but with a warped vruntime)?

> _That_ would be pretty efficient, and would do the right thing when 
> two (or more) vcpus run on the same CPU, and it would also do the 
> right thing if there are repeated VM-exits due to pause filtering.
>
> Please dont even think about using yield for this though - that will 
> just add a huge hit to this task and wont result in any sane 
> behavior - and yield is bound to some historic user-space behavior 
> as well.
>
> A gradual and linear back-off from the current timeline is more of a 
> fair negotiation process between vcpus and results in more or less 
> sane (and fair) scheduling, and no unnecessary looping.
>
> You could even do an exponential backoff up to a limit of 1-10 msecs 
> or so, starting at 100 usecs.
>   

Good idea, it eliminates another variable to be tuned.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/