netdev - Re: [PATCH] ptp: Add vDSO-style vmclock support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240726012933-mutt-send-email-mst@kernel.org>
Date: Fri, 26 Jul 2024 01:55:45 -0400
From: "Michael S. Tsirkin" <mst@...hat.com>
To: David Woodhouse <dwmw2@...radead.org>
Cc: Richard Cochran <richardcochran@...il.com>,
	Peter Hilber <peter.hilber@...nsynergy.com>,
	linux-kernel@...r.kernel.org, virtualization@...ts.linux.dev,
	linux-arm-kernel@...ts.infradead.org, linux-rtc@...r.kernel.org,
	"Ridoux, Julien" <ridouxj@...zon.com>, virtio-dev@...ts.linux.dev,
	"Luu, Ryan" <rluu@...zon.com>,
	"Chashper, David" <chashper@...zon.com>,
	"Mohamed Abuelfotoh, Hazem" <abuehaze@...zon.com>,
	"Christopher S . Hall" <christopher.s.hall@...el.com>,
	Jason Wang <jasowang@...hat.com>, John Stultz <jstultz@...gle.com>,
	netdev@...r.kernel.org, Stephen Boyd <sboyd@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Xuan Zhuo <xuanzhuo@...ux.alibaba.com>,
	Marc Zyngier <maz@...nel.org>, Mark Rutland <mark.rutland@....com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	Alessandro Zummo <a.zummo@...ertech.it>,
	Alexandre Belloni <alexandre.belloni@...tlin.com>,
	qemu-devel <qemu-devel@...gnu.org>, Simon Horman <horms@...nel.org>
Subject: Re: [PATCH] ptp: Add vDSO-style vmclock support

On Fri, Jul 26, 2024 at 01:09:24AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 25, 2024 at 10:29:18PM +0100, David Woodhouse wrote:
> > > > > Then can't we fix it by interrupting all CPUs right after LM?
> > > > > 
> > > > > To me that seems like a cleaner approach - we then compartmentalize
> > > > > the ABI issue - kernel has its own ABI against userspace,
> > > > > devices have their own ABI against kernel.
> > > > > It'd mean we need a way to detect that interrupt was sent,
> > > > > maybe yet another counter inside that structure.
> > > > > 
> > > > > WDYT?
> > > > > 
> > > > > By the way the same idea would work for snapshots -
> > > > > some people wanted to expose that info to userspace, too.
> > 
> > Those people included me. I wanted to interrupt all the vCPUs, even the
> > ones which were in userspace at the moment of migration, and have the
> > kernel deal with passing it on to userspace via a different ABI.
> > 
> > It ends up being complex and intricate, and requiring a lot of new
> > kernel and userspace support. I gave up on it in the end for snapshots,
> > and didn't go there again for this.
> 
> Maybe become you insist on using ACPI?
> I see a fairly simple way to do it. For example, with virtio:
> 
> one vq per CPU, with a single outstanding buffer,
> callback copies from the buffer into the userspace
> visible memory.
> 
> Want me to show you the code?

Couldn't resist, so I wrote a bit of this code.
Fundamentally, we keep a copy of the hypervisor abi
in the device:

struct virtclk_info *vci {
	struct vmclock_abi abi;
};

each vq will has its own copy:

struct virtqueue_info {
	struct scatterlist sg[];
	struct vmclock_abi abi;
}

we add it during probe:
        sg_init_one(vqi->sg, &vqi->abi, sizeof(vqi->abi));
	virtqueue_add_inbuf(vq,
                        vqi->sg, 1,
                        &vq->vabi,
                        GFP_ATOMIC);



We set the affinity for each vq:

       for (i = 0; i < num_online_cpus(); i++)
               virtqueue_set_affinity(vi->vq[i], i);

(virtio net does it, and it handles cpu hotplug as well)

each vq callback would do:

static void vmclock_cb(struct virtqueue *vq)
{
        struct virtclk_info *vci = vq->vdev->priv;
        struct virtqueue_info *vqi = vq->priv;
	void *buf;
        unsigned int len;

	buf = virtqueue_get_buf(vq, &len);
	if (!buf)
		return;

	BUG_ON(buf != &vq->abi);

	spin_lock(vci->lock);
	if (memcmp(&vci->abi, &vqi->abi, sizeof(vqi->abi))) {
		memcpy(&vci->abi, &vqi->abi, sizeof(vqi->abi));
	}

	/* Update the userspace visible structure now */
	.....

	/* Re-add the buffer */
	virtqueue_add_inbuf(vq,
                        vqi->sg, 1,
                        &vqi->abi,
                        GFP_ATOMIC);

	spin_unlock(vi->lock);
}

That's it!
Where's the problem here?

-- 
MST