linux-kernel - Re: [PATCH 0/5] Alter steal time reporting in KVM

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Fri, 07 Dec 2012 09:50:46 -0600
From:	Michael Wolf <mjw@...ux.vnet.ibm.com>
To:	Glauber Costa <glommer@...allels.com>
CC:	Marcelo Tosatti <mtosatti@...hat.com>,
	linux-kernel@...r.kernel.org, riel@...hat.com, kvm@...r.kernel.org,
	peterz@...radead.org, mingo@...hat.com, anthony@...emonkey.ws
Subject: Re: [PATCH 0/5] Alter steal time reporting in KVM

On 12/05/2012 06:46 AM, Glauber Costa wrote:
> I am deeply sorry.
>
> I was busy first time I read this, so I postponed answering and ended up
> forgetting.
>
> Sorry
>>> include/linux/sched.h:
>>> unsigned long long run_delay; /* time spent waiting on a runqueue */
>>>
>>> So if you are out of the runqueue, you won't get steal time accounted,
>>> and then I truly fail to understand what you are doing.
>> So I looked at something like this in the past.  To make sure things
>> haven't changed
>> I set up a cgroup on my test server running a kernel built from the
>> latest tip tree.
>>
>> [root]# cat cpu.cfs_quota_us
>> 50000
>> [root]# cat cpu.cfs_period_us
>> 100000
>> [root]# cat cpuset.cpus
>> 1
>> [root]# cat cpuset.mems
>> 0
>>
>> Next I put the PID from the cpu thread into tasks.  When I start a
>> script that will hog the cpu I see the
>> following in top on the guest
>> Cpu(s):  1.9%us,  0.0%sy,  0.0%ni,  0.0%id,  0.0%wa, 48.3%hi, 0.0%si,
>> 49.8%st
>>
>> So the steal time here is in line with the bandwidth control settings.
> Ok. So I was wrong in my hunch that it would be outside the runqueue,
> therefore work automatically. Still, the host kernel has all the
> information in cgroups.
>
>> So then the steal time did not show on the guest.  You have no value
>> that needs to be passed
>> around.  What I did not like about this approach was
>> * only works for cfs bandwidth control.  If another type of hard limit
>> was added to the kernel
>>     the code would potentially need to change.
> This is true for almost everything we have in the kernel!
> It is *very* unlikely for other bandwidth control mechanism to ever
> appear. If it ever does, it's *their* burden to make sure it works for
> steal time (provided it is merged). Code in tree gets precedence.

Ok,  I will work on a patch that uses the cgroup information for 
bandwidth control
to separate out the time.

>
>> * This approach doesn't help if the limits are set by overcommitting the
>> cpus.  It is my understanding
>>     that this is a common approach.
>>
> I can't say anything about commonality, but common or not, it is a
> *crazy* approach.
>
> When you simply overcommit, you have no way to differentiate between
> intended steal time and non-intended steal time. Moreover, when you
> overcommit, your cpu usage will vary over time. If two guests use the
> cpu to their full power, you will have 50 % each. But if one of them
> slows down, the other gets more. What is your entitlement value? How do
> you define this?
>
> And then after you define it, you end up using more than this, what is
> your cpu usage? 130 %?

yes exactly you would ideally show a boosted amount of cpu.  However to 
do that
you would need to either create a new tool or modify the current 
accounting tools
such as top.

My understanding is that you are not capping in this case as much as you 
are
guaranteeing a minimum level of performance.

>
>
> The only sane way to do it, is to communicate this value to the kernel
> somehow. The bandwidth controller is the interface we have for that. So
> everybody that wants to *intentionally* overcommit needs to communicate
> this to the controller. IOW: Any sane configuration should be explicit
> about your capping.
>
>>>>>>          Add an ioctl to communicate the consign limit to the host.
>>> This definitely should go away.
>>>
>>> More specifically, *whatever* way we use to cap the processor, the host
>>> system will have all the information at all times.
>> I'm not understanding that comment.  If you are capping by simply
>> controlling the amount of
>> overcommit on the host then wouldn't you still need some value to
>> indicate the desired amount.
> No, that is just crazy, and I don't like it a single bit.
>
> So in the light of it: Whatever capping mechanism we have, we need to be
> explicit about the expected entitlement. At this point, the kernel
> already knows what it is, and needs no extra ioctls or anything like that.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/