[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2eb6fd7e9a786749d70a17266a04fb50dbd5bb8.camel@linux.ibm.com>
Date: Fri, 17 Jan 2025 17:57:10 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Andrew Lunn <andrew@...n.ch>
Cc: dust.li@...ux.alibaba.com, Alexandra Winter <wintera@...ux.ibm.com>,
Julian Ruess <julianr@...ux.ibm.com>,
Wenjia Zhang <wenjia@...ux.ibm.com>, Jan Karcher <jaka@...ux.ibm.com>,
Gerd Bayer <gbayer@...ux.ibm.com>, Halil
Pasic <pasic@...ux.ibm.com>,
"D. Wythe" <alibuda@...ux.alibaba.com>,
Tony
Lu <tonylu@...ux.alibaba.com>, Wen Gu <guwen@...ux.alibaba.com>,
Peter
Oberparleiter <oberpar@...ux.ibm.com>,
David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Eric
Dumazet <edumazet@...gle.com>,
Andrew Lunn <andrew+netdev@...n.ch>,
Thorsten Winkler <twinkler@...ux.ibm.com>, netdev@...r.kernel.org,
linux-s390@...r.kernel.org, Heiko Carstens <hca@...ux.ibm.com>,
Vasily
Gorbik <gor@...ux.ibm.com>,
Alexander Gordeev <agordeev@...ux.ibm.com>,
Christian Borntraeger <borntraeger@...ux.ibm.com>,
Sven Schnelle
<svens@...ux.ibm.com>, Simon Horman <horms@...nel.org>
Subject: Re: [RFC net-next 0/7] Provide an ism layer
On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
> > Conceptually kind of but the existing s390 specific ISM device is a bit
> > special. But let me start with some background. On s390 aka Mainframes
> > OSs including Linux runs in so called logical partitions (LPARs) which
> > are machine hypervisor VMs which use partitioned non-paging memory. The
> > fact that memory is partitioned is important because this means LPARs
> > can not share physical memory by mapping it.
> >
> > Now at a high level an ISM device allows communication between two such
> > Linux LPARs on the same machine. The device is discovered as a PCI
> > device and allows Linux to take a buffer called a DMB map that in the
> > IOMMU and generate a token specific to another LPAR which also sees an
> > ISM device sharing the same virtual channel identifier (VCHID). This
> > token can then be transferred out of band (e.g. as part of an extended
> > TCP handshake in SMC-D) to that other system. With the token the other
> > system can use its ISM device to securely (authenticated by the token,
> > LPAR identity and the IOMMU mapping) write into the original systems
> > DMB at throughput and latency similar to doing a memcpy() via a
> > syscall.
> >
> > On the implementation level the ISM device is actually a piece of
> > firmware and the write to a remote DMB is a special case of our PCI
> > Store Block instruction (no real MMIO on s390, instead there are
> > special instructions). Sadly there are a few more quirks but in
> > principle you can think of it as redirecting writes to a part of the
> > ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
> > There's of course also a mechanism to cause an interrupt on the
> > receiver as the write completes.
>
> So the s390 details are interesting, but as you say, it is
> special. Ideally, all the special should be hidden away inside the
> driver.
Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
through but that's not unusual and why there is already the concept of
vfio-pci extension module.
>
> So please take a step back. What is the abstract model?
I think my high level description may be a good start. The abstract
model is the ability to share a memory buffer (DMB) for writing by a
communication partner, authenticated by a DMB Token. Plus stuff like
triggering an interrupt on write or explicit trigger. Then Alibaba
added optional support for what they called attaching the buffer which
means it becomes truly shared between the peers but which IBM's ISM
can't support. Plus a few more optional pieces such as VLANs, PNETIDs
don't ask. The idea for the new layer then is to define this interface
with operations and documentation.
>
> Can the abstract model be mapped onto CLX? Could it be used with a GPU
> vRAM? SoC with real shared memory between a pool of CPUs.
>
> Andrew
I'd think that yes, one could implement such a mechanism on top of CXL
as well as on SoC. Or even with no special hardware between a host and
a DPU (e.g. via PCIe endpoint framework). Basically anything that can
DMA and IRQs between two OS instances.
Powered by blists - more mailing lists