lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8ec039c6-d0fc-c7f4-72a4-ae677c9bbb68@suse.com>
Date:   Fri, 11 Jan 2019 15:01:10 +0100
From:   Juergen Gross <jgross@...e.com>
To:     Hans van Kranenburg <hans@...rrie.org>,
        linux-kernel@...r.kernel.org, xen-devel@...ts.xenproject.org,
        x86@...nel.org
Cc:     sstabellini@...nel.org, stable@...r.kernel.org, mingo@...hat.com,
        bp@...en8.de, hpa@...or.com, boris.ostrovsky@...cle.com,
        tglx@...utronix.de
Subject: Re: [Xen-devel] [PATCH v2] xen: Fix x86 sched_clock() interface for
 xen

On 11/01/2019 14:12, Hans van Kranenburg wrote:
> Hi,
> 
> On 1/11/19 1:08 PM, Juergen Gross wrote:
>> Commit f94c8d11699759 ("sched/clock, x86/tsc: Rework the x86 'unstable'
>> sched_clock() interface") broke Xen guest time handling across
>> migration:
>>
>> [  187.249951] Freezing user space processes ... (elapsed 0.001 seconds) done.
>> [  187.251137] OOM killer disabled.
>> [  187.251137] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
>> [  187.252299] suspending xenstore...
>> [  187.266987] xen:grant_table: Grant tables using version 1 layout
>> [18446743811.706476] OOM killer enabled.
>> [18446743811.706478] Restarting tasks ... done.
>> [18446743811.720505] Setting capacity to 16777216
>>
>> Fix that by setting xen_sched_clock_offset at resume time to ensure a
>> monotonic clock value.
>>
>> [...]
> 
> I'm throwing around a PV domU over a bunch of test servers with live
> migrate now, and in between the kernel logging, I'm seeing this:
> 
> [Fri Jan 11 13:58:42 2019] Freezing user space processes ... (elapsed
> 0.002 seconds) done.
> [Fri Jan 11 13:58:42 2019] OOM killer disabled.
> [Fri Jan 11 13:58:42 2019] Freezing remaining freezable tasks ...
> (elapsed 0.000 seconds) done.
> [Fri Jan 11 13:58:42 2019] suspending xenstore...
> [Fri Jan 11 13:58:42 2019] ------------[ cut here ]------------
> [Fri Jan 11 13:58:42 2019] Current state: 1
> [Fri Jan 11 13:58:42 2019] WARNING: CPU: 3 PID: 0 at
> kernel/time/clockevents.c:133 clockevents_switch_state+0x48/0xe0
> [Fri Jan 11 13:58:42 2019] Modules linked in:
> [Fri Jan 11 13:58:42 2019] CPU: 3 PID: 0 Comm: swapper/3 Not tainted
> 4.19.14+ #1
> [Fri Jan 11 13:58:42 2019] RIP: e030:clockevents_switch_state+0x48/0xe0
> [Fri Jan 11 13:58:42 2019] Code: 8b 0c cd 40 ee 00 82 e9 d6 5b d1 00 80
> 3d 8e 8d 43 01 00 75 17 89 c6 48 c7 c7 92 62 1f 82 c6 05 7c 8d 43 01 01
> e8 f8 22 f8 ff <0f> 0b 5b 5d f3 c3 83 e2 01 74 f7 48 8b 47 48 48 85 c0
> 74 69 48 89
> [Fri Jan 11 13:58:42 2019] RSP: e02b:ffffc90000787e30 EFLAGS: 00010082
> [Fri Jan 11 13:58:42 2019] RAX: 0000000000000000 RBX: ffff88805df94d80
> RCX: 0000000000000006
> [Fri Jan 11 13:58:42 2019] RDX: 0000000000000007 RSI: 0000000000000001
> RDI: ffff88805df963f0
> [Fri Jan 11 13:58:42 2019] RBP: 0000000000000004 R08: 0000000000000000
> R09: 0000000000000119
> [Fri Jan 11 13:58:42 2019] R10: 0000000000000020 R11: ffffffff82af4e2d
> R12: ffff88805df9ca40
> [Fri Jan 11 13:58:42 2019] R13: 0000000dd28d6ca6 R14: 0000000000000000
> R15: 0000000000000000
> [Fri Jan 11 13:58:42 2019] FS:  00007f34193ce040(0000)
> GS:ffff88805df80000(0000) knlGS:0000000000000000
> [Fri Jan 11 13:58:42 2019] CS:  e033 DS: 002b ES: 002b CR0: 0000000080050033
> [Fri Jan 11 13:58:42 2019] CR2: 00007f6220be50e1 CR3: 000000005ce5c000
> CR4: 0000000000002660
> [Fri Jan 11 13:58:42 2019] Call Trace:
> [Fri Jan 11 13:58:42 2019]  tick_program_event+0x4b/0x70
> [Fri Jan 11 13:58:42 2019]  hrtimer_try_to_cancel+0xa8/0x100
> [Fri Jan 11 13:58:42 2019]  hrtimer_cancel+0x10/0x20
> [Fri Jan 11 13:58:42 2019]  __tick_nohz_idle_restart_tick+0x45/0xd0
> [Fri Jan 11 13:58:42 2019]  tick_nohz_idle_exit+0x93/0xa0
> [Fri Jan 11 13:58:42 2019]  do_idle+0x149/0x260
> [Fri Jan 11 13:58:42 2019]  cpu_startup_entry+0x6a/0x70
> [Fri Jan 11 13:58:42 2019] ---[ end trace 519c07d1032908f8 ]---
> [Fri Jan 11 13:58:42 2019] xen:grant_table: Grant tables using version 1
> layout
> [Fri Jan 11 13:58:42 2019] OOM killer enabled.
> [Fri Jan 11 13:58:42 2019] Restarting tasks ... done.
> [Fri Jan 11 13:58:42 2019] Setting capacity to 6291456
> [Fri Jan 11 13:58:42 2019] Setting capacity to 10485760
> 
> This always happens on every *first* live migrate that I do after
> starting the domU.

Yeah, its a WARN_ONCE().

And you didn't see it with v1 of the patch?

On the first glance this might be another bug just being exposed by
my patch.

I'm investigating further, but this might take some time. Could you
meanwhile verify the same happens with kernel 5.0-rc1? That was the
one I tested with and I didn't spot that WARN.


Juergen

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ