lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240725163843-mutt-send-email-mst@kernel.org>
Date: Thu, 25 Jul 2024 16:50:48 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Richard Cochran <richardcochran@...il.com>,
	Peter Hilber <peter.hilber@...nsynergy.com>,
	linux-kernel@...r.kernel.org, virtualization@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, linux-rtc@...r.kernel.org,
	"Ridoux, Julien" <ridouxj@...zon.com>, virtio-dev@...ts.linux.dev,
	"Luu, Ryan" <rluu@...zon.com>,
	"Chashper, David" <chashper@...zon.com>,
	"Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>,
	"Christopher S . Hall" <christopher.s.hall@...el.com>,
	Jason Wang <jasowang@...hat.com>, John Stultz <jstultz@...gle.com>,
	netdev@...r.kernel.org, Stephen Boyd <sboyd@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	Marc Zyngier <maz@...nel.org>, Mark Rutland <mark.rutland@....com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Alessandro Zummo <a.zummo@...ertech.it>,
	Alexandre Belloni <alexandre.belloni@...tlin.com>,
	qemu-devel <qemu-devel@...gnu.org>, Simon Horman <horms@...nel.org>
Subject: Re: [PATCH] ptp: Add vDSO-style vmclock support

On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote:
> On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
> > > The use case isn't necessarily for all users of gettimeofday(), of
> > > course; this is for those applications which *need* precision time.
> > > Like distributed databases which rely on timestamps for coherency, and
> > > users who get fined millions of dollars when LM messes up their clocks
> > > and they put wrong timestamps on financial transactions.
> > 
> > I would however worry that with all this pass through,
> > applications have to be coded to each hypervisor or even
> > version of the hypervisor.
> 
> Yes, that would be a problem. Which is why I feel it's so important to
> harmonise the contents of the shared memory, and I'm implementing it
> both QEMU and $DAYJOB, as well as aligning with virtio-rtc.


Writing an actual spec for this would be another thing that might help.

> I don't think the structure should be changing between hypervisors (and
> especially versions). We *will* see a progression from simply providing
> the disruption signal, to providing the full clock information so that
> guests don't have to abort transactions while they resync their clock.
> But that's perfectly fine.
> 
> And it's also entirely agnostic to the mechanism by which the memory
> region is *discovered*. It doesn't matter if it's ACPI, DT, a
> hypervisor enlightenment, a BAR of a simple PCI device, virtio, or
> anything else.
> 
> ACPI is one of the *simplest* options for a hypervisor and guest to
> implement, and doesn't prevent us from using the same structure in
> virtio-rtc. I'm happy enough using ACPI and letting virtio-rtc come
> along later.
> 
> > virtio has been developed with the painful experience that we keep
> > making mistakes, or coming up with new needed features,
> > and that maintaining forward and backward compatibility
> > becomes a whole lot harder than it seems in the beginning.
> 
> Yes. But as you note, this shared memory structure is a userspace ABI
> all of its own, so we get to make a completely *different* kind of
> mistake :)
> 


So, something I still don't completely understand.
Can't the VDSO thing be written to by kernel?
Let's say on LM, an interrupt triggers and kernel copies
data from a specific device to the VDSO.

Is that problematic somehow? I imagine there is a race where
userspace reads vdso after lm but before kernel updated
vdso - is that the concern?

Then can't we fix it by interrupting all CPUs right after LM?

To me that seems like a cleaner approach - we then compartmentalize
the ABI issue - kernel has its own ABI against userspace,
devices have their own ABI against kernel.
It'd mean we need a way to detect that interrupt was sent,
maybe yet another counter inside that structure.

WDYT?

By the way the same idea would work for snapshots -
some people wanted to expose that info to userspace, too.

-- 
MST


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ