linux-kernel - Re: [BUG] Paravirtual time accounting / IRQ time accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20140322151540.9740576amzy5tdwk@intranet.cs.hku.hk>
Date:	Sat, 22 Mar 2014 15:15:40 +0800
From:	lwcheng@...hku.hk
To:	Rik van Riel <riel@...hat.com>
Cc:	Glauber Costa <glommer@...il.com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [BUG] Paravirtual time accounting / IRQ time accounting


Quoting Rik van Riel <riel@...hat.com>:

> On 03/20/2014 11:01 AM, Glauber Costa wrote:
>> On Wed, Mar 19, 2014 at 6:42 AM,  <lwcheng@...hku.hk> wrote:
>
>>> ------------
>>> [src/kernel/sched/core.c]
>>> static void update_rq_clock_task(struct rq *rq, s64 delta)
>>> {
>>>     ... ...
>>> #ifdef CONFIG_IRQ_TIME_ACCOUNTING
>>>     irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
>>>     ... ...
>>>     rq->prev_irq_time += irq_delta;
>>>     delta -= irq_delta;
>>> #endif
>>>
>>> #ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
>>>     if (static_key_false((&paravirt_steal_rq_enabled))) {
>>>         steal = paravirt_steal_clock(cpu_of(rq));
>>>         steal -= rq->prev_steal_time_rq;
>>>         ... ...
>>>         rq->prev_steal_time_rq += steal;
>>>         delta -= steal;
>>>     }
>>> #endif
>>>
>>>     rq->clock_task += delta;
>>>     ... ...
>>> }
>>> --
>>> "delta" -> the intended increment to rq->clock_task
>>> "irq_delta" -> the time spent on serving IRQ (hard + soft)
>>> "steal" -> the time stolen by the underlying hypervisor
>>> --
>>> "irq_delta" is calculated based on sched_clock_cpu(), which is vulnerable
>>> to VM scheduling delays.
>>
>> This looks like a real problem indeed. The main problem in searching
>> for a solution, is that of course not all of the irq time is steal
>> time and vice versa. In this case, we could subtract irq_time from
>> steal, and add only the steal part time that is in excess. I don't
>> think this is 100 % guaranteed, but maybe it is a good approximation.
>>
>> Rik, do you have an opinion on this ?
>
> The other way around may be better, since steal time (when it
> happens) is likely to be of "time slice" magnitude, while irq
> time will happen more frequently, and in dozens-of-microseconds
> intervals.
>
> Furthermore, we have no way to measure what the irq time is,
> except by looking at how much real time elapsed. For steal time,
> however, the hypervisor tells us exactly how much time was stolen.
>
> That means when steal time and irq time happen simultaneously,
> the amount of steal time should always be smaller than the
> calculated irq time for that period.
>
> actual irq_time = calculated irq time - reported steal time;
>
> --
> All rights reversed
>

I observe that sometimes irq_time only includes "part" of steal_time.

Like you said, irq_time is in dozens-of-microseconds. In VMs, as all
devices seen are virtual ones, irq_time seems to be not as desired
as it is in physical hosts.

A quick (but not radical) solution may be:
disable CONFIG_IRQ_TIME_ACCOUNTING if CONFIG_PARAVIRT is selected.
Just adopt tick-based accounting: CONFIG_TICK_CPU_ACCOUNTING

I am thinking what irq_time really *means* in VMs.

-Luwei


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/