lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8e9fb0c37ae4a3f60b09b8da5d39dbf909ec038e.camel@infradead.org>
Date: Fri, 14 Feb 2025 12:04:38 +0000
From: David Woodhouse <dwmw2@...radead.org>
To: Thomas Gleixner <tglx@...utronix.de>, Thomas
 Weißschuh <thomas.weissschuh@...utronix.de>, "James E.J.
 Bottomley" <James.Bottomley@...senPartnership.com>, Helge Deller
 <deller@....de>, Andy Lutomirski <luto@...nel.org>, Vincenzo Frascino
 <vincenzo.frascino@....com>, Anna-Maria Behnsen <anna-maria@...utronix.de>,
 Frederic Weisbecker <frederic@...nel.org>,  Andrew Morton
 <akpm@...ux-foundation.org>, Catalin Marinas <catalin.marinas@....com>,
 Will Deacon <will@...nel.org>, Theodore Ts'o <tytso@....edu>, "Jason A.
 Donenfeld" <Jason@...c4.com>, Paul Walmsley <paul.walmsley@...ive.com>,
 Palmer Dabbelt <palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
 Huacai Chen <chenhuacai@...nel.org>, WANG Xuerui <kernel@...0n.name>,
 Russell King <linux@...linux.org.uk>, Heiko Carstens <hca@...ux.ibm.com>,
 Vasily Gorbik <gor@...ux.ibm.com>, Alexander Gordeev
 <agordeev@...ux.ibm.com>, Christian Borntraeger
 <borntraeger@...ux.ibm.com>, Sven Schnelle <svens@...ux.ibm.com>, Thomas
 Bogendoerfer <tsbogend@...ha.franken.de>, Michael Ellerman
 <mpe@...erman.id.au>, Nicholas Piggin <npiggin@...il.com>, Christophe Leroy
 <christophe.leroy@...roup.eu>, Naveen N Rao <naveen@...nel.org>, Madhavan
 Srinivasan <maddy@...ux.ibm.com>, Ingo Molnar <mingo@...hat.com>, Borislav
 Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
 x86@...nel.org, "H. Peter Anvin" <hpa@...or.com>,  Arnd Bergmann
 <arnd@...db.de>, Guo Ren <guoren@...nel.org>
Cc: linux-parisc@...r.kernel.org, linux-kernel@...r.kernel.org, 
 linux-arm-kernel@...ts.infradead.org, linux-riscv@...ts.infradead.org, 
 loongarch@...ts.linux.dev, linux-s390@...r.kernel.org, 
 linux-mips@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org, 
 linux-arch@...r.kernel.org, Nam Cao <namcao@...utronix.de>, 
 linux-csky@...r.kernel.org, "Ridoux, Julien" <ridouxj@...zon.com>, "Luu,
 Ryan" <rluu@...zon.com>, kvm <kvm@...r.kernel.org>
Subject: Re: [PATCH v3 00/18] vDSO: Introduce generic data storage

On Fri, 2025-02-14 at 12:34 +0100, Thomas Gleixner wrote:
> >  2. In kernel, asking KVM to populate the vmclock structure much like
> >     it does other pvclocks shared with the guest. KVM/x86 already uses
> >     pvclock_gtod_register_notifier() to hook changes; should we expand
> >     on that? The problem with that notifier is that it seems to be
> >     called far more frequently than I'd expect.
> 
> It's called once per tick to expose the continous updates to the
> conversion factors and related internal data.

My recollection (a vague one) is that it's called, and reports
"changes", even when there *are* no changes to underlying conversion
factors. Something along the lines of "N ticks at 333 counts per tick,
then one tick at 334 counts per tick to catch up" because it can't
express the division factor completely without that discontinuity?

The actual 'error' caused by the apparent fluctuation in rate is
probably entirely negligible, but I am slightly concerned about the
steal time, if the hypervisor then spends stolen CPU time relaying all
those "changes" to the guest, and then the guest has to spend time
feeding the "changes" into its own timekeeping.

I'd like to strive for a mode where we only adjust what we tell guests,
when adjtimex actually changes the real timing factors.

In fact if we have a userspace tool like chrony feeding adjtimex based
on external NTP/PPS/whatever, that tool could probably calibrate a
stable host TSC directly against the external real time. And in that
mode maybe we don't even need to feed the guest from the kernel's
CLOCK_REALTIME; that would be just another conversion step to introduce
noise.

We might end up with the direct setup for dedicated hosting
environments, but I do also want to support the general-purpose QEMU-
based setup where we expose the host's CLOCK_REALTIME as efficiently as
possible.

How about this: A KVM feature to provide/populate the VMCLOCK, since
only KVM knows the precise TSC scaling (and can immediately flip the
VMCLOCK to report invalid state if the TSC becomes unreliable).

It can *either* be fed the precise TSC/realtime relationship from
userspace (maybe in a vmclock structure that *userspace* populates, so
all the kernel has to do is scale/offset to account for the guest TSC
being different from the host TSC).

Or it can be in 'automatic' mode, where it derives from the host's
timekeeping. Which at the moment would have "too many" updates for my
liking, but we can worry about that later if necessary.


Download attachment "smime.p7s" of type "application/pkcs7-signature" (5069 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ