lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140319174222.213976eu8b54ujk0@intranet.cs.hku.hk>
Date:	Wed, 19 Mar 2014 17:42:22 +0800
From:	lwcheng@...hku.hk
To:	linux-kernel@...r.kernel.org
Cc:	glommer@...il.com
Subject: [BUG] Paravirtual time accounting / IRQ time accounting

In consolidated environments, when there are multiple virtual machines (VMs)
running on one CPU core, timekeeping will be a problem to the guest OS.
Here, I report my findings about Linux process scheduler.


Description
------------
Linux CFS relies on rq->clock_task to charge each task, determine  
vruntime, etc.

When CONFIG_IRQ_TIME_ACCOUNTING is enabled, the time spent on serving IRQ
will be excluded from updating rq->clock_task.
When CONFIG_PARAVIRT_TIME_ACCOUNTING is enabled, the time stolen by  
the hypervisor
will also be excluded from updating rq->clock_task.

With "both" CONFIG_IRQ_TIME_ACCOUNTING and  
CONFIG_PARAVIRT_TIME_ACCOUNTING enabled,
I put three KVM guests on one core and run hackbench in each guest. I  
find that
in the guests, rq->clock_task stays *unchanged*. The malfunction  
embarrasses CFS.
------------


Analysis
------------
[src/kernel/sched/core.c]
static void update_rq_clock_task(struct rq *rq, s64 delta)
{
     ... ...
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
     irq_delta = irq_time_read(cpu_of(rq)) - rq->prev_irq_time;
     ... ...
     rq->prev_irq_time += irq_delta;
     delta -= irq_delta;
#endif

#ifdef CONFIG_PARAVIRT_TIME_ACCOUNTING
     if (static_key_false((&paravirt_steal_rq_enabled))) {
         steal = paravirt_steal_clock(cpu_of(rq));
         steal -= rq->prev_steal_time_rq;
         ... ...
         rq->prev_steal_time_rq += steal;
         delta -= steal;
     }
#endif

     rq->clock_task += delta;
     ... ...
}
--
"delta" -> the intended increment to rq->clock_task
"irq_delta" -> the time spent on serving IRQ (hard + soft)
"steal" -> the time stolen by the underlying hypervisor
--
"irq_delta" is calculated based on sched_clock_cpu(), which is vulnerable
to VM scheduling delays. "irq_delta" can include part or whole of "steal".
I observe that [irq_delta + steal >> delta].
As a result, "delta" becomes zero. That is why rq->clock_task stops.
------------

Please confirm this bug. Thanks.


Luwei Cheng
--
CS student
The University of Hong Kong
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ