linux-kernel - Re: [PATCH] rcu/tree: consider time a VM was suspended

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YKbkAXELPxXJcsHA@google.com>
Date:   Fri, 21 May 2021 07:34:41 +0900
From:   Sergey Senozhatsky <senozhatsky@...omium.org>
To:     "Paul E. McKenney" <paulmck@...nel.org>
Cc:     Sergey Senozhatsky <senozhatsky@...omium.org>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        Joel Fernandes <joel@...lfernandes.org>,
        Suleiman Souhlal <suleiman@...gle.com>, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] rcu/tree: consider time a VM was suspended

On (21/05/20 07:57), Paul E. McKenney wrote:
> > 
> > Sounds good. I can cook a patch and run some tests.
> > Or do you want to send a patch?
> 
> Given that you have the test setup, things might go faster if you do
> the patch, especially taking timezones into consideration.  Of course,
> if you run into difficulties, you know where to find me.

OK. Sounds good to me.

> > While VCPU-2 has PVCLOCK_GUEST_STOPPED set (resuming) and is in
> > check_cpu_stall(), the VCPU-3 is executing:
> > 
> > 	apic_timer_interrupt()
> > 	 tick_irq_enter()
> > 	  tick_do_update_jiffies64()
> > 	   do_timer()
> 
> OK, but the normal grace period time is way less than one second, and
> the stall timeout in mainline is 21 seconds, so that would be a -lot-
> of jiffies of skew.  Or does the restarting really take that long a time?

That's a good question. I see huge jiffies spike in the logs.
I suspect that resuming a VM can take some time, especially on a "not
powerful at all" overcommitted host (more virtual CPUs than physical
ones).