lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7fc92a63-0017-4d59-bdaf-8976bf8dcee1@linux.ibm.com>
Date: Mon, 20 Jan 2025 13:03:16 +0100
From: Alexandra Winter <wintera@...ux.ibm.com>
To: dust.li@...ux.alibaba.com, Andrew Lunn <andrew@...n.ch>,
        Niklas Schnelle <schnelle@...ux.ibm.com>
Cc: Julian Ruess <julianr@...ux.ibm.com>, Wenjia Zhang
 <wenjia@...ux.ibm.com>,
        Jan Karcher <jaka@...ux.ibm.com>, Gerd Bayer <gbayer@...ux.ibm.com>,
        Halil Pasic <pasic@...ux.ibm.com>,
        "D. Wythe" <alibuda@...ux.alibaba.com>,
        Tony Lu <tonylu@...ux.alibaba.com>, Wen Gu <guwen@...ux.alibaba.com>,
        Peter Oberparleiter
 <oberpar@...ux.ibm.com>,
        David Miller <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
        Paolo Abeni <pabeni@...hat.com>, Eric Dumazet <edumazet@...gle.com>,
        Andrew Lunn <andrew+netdev@...n.ch>,
        Thorsten Winkler <twinkler@...ux.ibm.com>, netdev@...r.kernel.org,
        linux-s390@...r.kernel.org, Heiko Carstens <hca@...ux.ibm.com>,
        Vasily Gorbik <gor@...ux.ibm.com>,
        Alexander Gordeev
 <agordeev@...ux.ibm.com>,
        Christian Borntraeger <borntraeger@...ux.ibm.com>,
        Sven Schnelle <svens@...ux.ibm.com>, Simon Horman <horms@...nel.org>
Subject: Re: [RFC net-next 0/7] Provide an ism layer



On 20.01.25 07:21, Dust Li wrote:
> On 2025-01-17 21:29:09, Andrew Lunn wrote:
>> On Fri, Jan 17, 2025 at 05:57:10PM +0100, Niklas Schnelle wrote:
>>> On Fri, 2025-01-17 at 17:33 +0100, Andrew Lunn wrote:
>>>>> Conceptually kind of but the existing s390 specific ISM device is a bit
>>>>> special. But let me start with some background. On s390 aka Mainframes
>>>>> OSs including Linux runs in so called logical partitions (LPARs) which
>>>>> are machine hypervisor VMs which use partitioned non-paging memory. The
>>>>> fact that memory is partitioned is important because this means LPARs
>>>>> can not share physical memory by mapping it.
>>>>>
>>>>> Now at a high level an ISM device allows communication between two such
>>>>> Linux LPARs on the same machine. The device is discovered as a PCI
>>>>> device and allows Linux to take a buffer called a DMB map that in the
>>>>> IOMMU and generate a token specific to another LPAR which also sees an
>>>>> ISM device sharing the same virtual channel identifier (VCHID). This
>>>>> token can then be transferred out of band (e.g. as part of an extended
>>>>> TCP handshake in SMC-D) to that other system. With the token the other
>>>>> system can use its ISM device to securely (authenticated by the token,
>>>>> LPAR identity and the IOMMU mapping) write into the original systems
>>>>> DMB at throughput and latency similar to doing a memcpy() via a
>>>>> syscall.
>>>>>
>>>>> On the implementation level the ISM device is actually a piece of
>>>>> firmware and the write to a remote DMB is a special case of our PCI
>>>>> Store Block instruction (no real MMIO on s390, instead there are
>>>>> special instructions). Sadly there are a few more quirks but in
>>>>> principle you can think of it as redirecting writes to a part of the
>>>>> ISM PCI devices' BAR to the DMB in the peer system if that makes sense.
>>>>> There's of course also a mechanism to cause an interrupt on the
>>>>> receiver as the write completes.
>>>>
>>>> So the s390 details are interesting, but as you say, it is
>>>> special. Ideally, all the special should be hidden away inside the
>>>> driver.
>>>
>>> Yes and it will be. There are some exceptions e.g. for vfio-pci pass-
>>> through but that's not unusual and why there is already the concept of
>>> vfio-pci extension module.
>>>
>>>>
>>>> So please take a step back. What is the abstract model?
>>>
>>> I think my high level description may be a good start. The abstract
>>> model is the ability to share a memory buffer (DMB) for writing by a
>>> communication partner, authenticated by a DMB Token. Plus stuff like
>>> triggering an interrupt on write or explicit trigger. Then Alibaba
>>> added optional support for what they called attaching the buffer which
>>> means it becomes truly shared between the peers but which IBM's ISM
>>> can't support. Plus a few more optional pieces such as VLANs, PNETIDs
>>> don't ask. The idea for the new layer then is to define this interface
>>> with operations and documentation.
>>>
>>>>
>>>> Can the abstract model be mapped onto CLX? Could it be used with a GPU
>>>> vRAM? SoC with real shared memory between a pool of CPUs.
>>>>
>>>> 	Andrew
>>>
>>> I'd think that yes, one could implement such a mechanism on top of CXL
>>> as well as on SoC. Or even with no special hardware between a host and
>>> a DPU (e.g. via PCIe endpoint framework). Basically anything that can
>>> DMA and IRQs between two OS instances.
>>
>> Is DMA part of the abstract model? That would suggest a true shared
>> memory system is excluded, since that would not require DMA.
>>
>> Maybe take a look at subsystems like USB, I2C.
>>
>> usb_submit_urb(struct urb *urb, gfp_t mem_flags)
>>
>> An URB is a data structure with a block of memory associated with it,
>> contains the detail to pass to the USB device.
>>
>> i2c_transfer(struct i2c_adapter *adap, struct i2c_msg *msgs, int num)
>>
>> *msgs points to num of messages which get transferred to/from the I2C
>> device.
>>
>> Could the high level API look like this? No DMA, no IRQ, no concept of
>> a somewhat shared memory. Just an API which asks for a message to be
>> sent to the other end? struct urb has some USB concepts in it, struct
>> i2c_msg has some I2C concepts in it. A struct ism_msg would follow the
>> same pattern, but does it need to care about the DMA, the IRQ, the
>> memory which is semi shared?
> 
> I don’t have a clear picture of what the API should look like yet, but I
> believe it’s possible to avoid DMA and IRQ. In fact, the current data
> transfer API, ops->move_data() in include/linux/ism.h, already abstracts
> away the DMA and IRQ details.
> 

What is central to ISM is the DMB (Direct Memory Buffer). The concept
that there is a DMB dedicated to one writer and one reader. It is owned
by the reader and only this writer can write at any offset into the DMB
(Fabric controlled). (Reader can technically read/write as well).

So for the client API I think the core functions are
- move_data(*data, target_dmb_token, offset) - called by the sending
client, to move data at some offset into a DMB.
- receive_signal(dmb_token, some_signal_info) - called by the ism layer
to signal the client, that this DMB needs handling. (currently called
handle_irq)

I would not want to abstract that to a message based API, because then
we need queues etc and are almost at a net_device. All that is not
needed for ism, because DMBs are dedicated to a single writer (who has
the responsibility).


> One thing we cannot hide, however, is whether the operation is zero-copy
> or copy. This distinction is important because we can reuse the data at
> different times in copy mode and zero-copy mode.
> 
> Best regards,
> Dust
> 

See my reply on 4/7, as well as Niklas' reply. Currently you can always
re-use the send buffer. So zero-copy can be a property of the DMB
(attach() function, etc. )


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ