lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250120035525.GK89233@linux.alibaba.com>
Date: Mon, 20 Jan 2025 11:55:25 +0800
From: Dust Li <dust.li@...ux.alibaba.com>
To: Alexandra Winter <wintera@...ux.ibm.com>,
	Wenjia Zhang <wenjia@...ux.ibm.com>,
	Jan Karcher <jaka@...ux.ibm.com>, Gerd Bayer <gbayer@...ux.ibm.com>,
	Halil Pasic <pasic@...ux.ibm.com>,
	"D. Wythe" <alibuda@...ux.alibaba.com>,
	Tony Lu <tonylu@...ux.alibaba.com>,
	Wen Gu <guwen@...ux.alibaba.com>,
	Peter Oberparleiter <oberpar@...ux.ibm.com>,
	David Miller <davem@...emloft.net>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Eric Dumazet <edumazet@...gle.com>,
	Andrew Lunn <andrew+netdev@...n.ch>
Cc: Julian Ruess <julianr@...ux.ibm.com>,
	Niklas Schnelle <schnelle@...ux.ibm.com>,
	Thorsten Winkler <twinkler@...ux.ibm.com>, netdev@...r.kernel.org,
	linux-s390@...r.kernel.org, Heiko Carstens <hca@...ux.ibm.com>,
	Vasily Gorbik <gor@...ux.ibm.com>,
	Alexander Gordeev <agordeev@...ux.ibm.com>,
	Christian Borntraeger <borntraeger@...ux.ibm.com>,
	Sven Schnelle <svens@...ux.ibm.com>,
	Simon Horman <horms@...nel.org>
Subject: Re: [RFC net-next 5/7] net/ism: Move ism_loopback to net/ism

On 2025-01-15 20:55:25, Alexandra Winter wrote:
>The first stage of ism_loopback was implemented as part of the
>SMC module [1]. Now that we have the ism layer, provide
>access to the ism_loopback device to all ism clients.
>
>Move ism_loopback.* from net/smc to net/ism.
>The following changes are required to ism_loopback.c:
>- Change ism_lo_move_data() to no longer schedule an smcd receive tasklet,
>but instead call ism_client->handle_irq().
>Note: In this RFC patch ism_loppback is not fully generic.
>  The smc-d client uses attached buffers, for moves without signalling.
>  and not-attached buffers for moves with signalling.
>  ism_lo_move_data() must not rely on that assumption.
>  ism_lo_move_data() must be able to handle more than one ism client.
>
>In addition the following changes are required to unify ism_loopback and
>ism_vp:
>
>In ism layer and ism_vpci:
>ism_loopback is not backed by a pci device, so use dev instead of pdev in
>ism_dev.
>
>In smc-d:
>in smcd_alloc_dev():
>- use kernel memory instead of device memory for smcd_dev and smcd->conn.
>        An alternative would be to ask device to alloc the memory.
>- use different smcd_ops and max_dmbs for ism_vp and ism_loopback.
>    A future patch can change smc-d to directly use ism_ops instead of
>    smcd_ops.
>- use ism dev_name instead of pci dev name for ism_evt_wq name
>- allocate an event workqueue for ism_loopback, although it currently does
>  not generate events.
>
>Link: https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ [1]
>
>Signed-off-by: Alexandra Winter <wintera@...ux.ibm.com>
>---
> drivers/s390/net/ism.h     |   6 +-
> drivers/s390/net/ism_drv.c |  31 ++-
> include/linux/ism.h        |  59 +++++
> include/net/smc.h          |   4 +-
> net/ism/Kconfig            |  13 ++
> net/ism/Makefile           |   1 +
> net/ism/ism_loopback.c     | 366 +++++++++++++++++++++++++++++++
> net/ism/ism_loopback.h     |  59 +++++
> net/ism/ism_main.c         |  11 +-
> net/smc/Kconfig            |  13 --
> net/smc/Makefile           |   1 -
> net/smc/af_smc.c           |  12 +-
> net/smc/smc_ism.c          | 108 +++++++---
> net/smc/smc_loopback.c     | 427 -------------------------------------
> net/smc/smc_loopback.h     |  60 ------
> 15 files changed, 606 insertions(+), 565 deletions(-)
> create mode 100644 net/ism/ism_loopback.c
> create mode 100644 net/ism/ism_loopback.h
> delete mode 100644 net/smc/smc_loopback.c
> delete mode 100644 net/smc/smc_loopback.h
>
>diff --git a/drivers/s390/net/ism.h b/drivers/s390/net/ism.h
>index 61cf10334170..0deca6d0e328 100644
>--- a/drivers/s390/net/ism.h
>+++ b/drivers/s390/net/ism.h
>@@ -202,7 +202,7 @@ struct ism_sba {
> static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> 				  unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, 8);
> 
> 	while (len > 0) {
>@@ -216,7 +216,7 @@ static inline void __ism_read_cmd(struct ism_dev *ism, void *data,
> static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> 				   unsigned long offset, unsigned long len)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 2, len);
> 
> 	if (len)
>@@ -226,7 +226,7 @@ static inline void __ism_write_cmd(struct ism_dev *ism, void *data,
> static inline int __ism_move(struct ism_dev *ism, u64 dmb_req, void *data,
> 			     unsigned int size)
> {
>-	struct zpci_dev *zdev = to_zpci(ism->pdev);
>+	struct zpci_dev *zdev = to_zpci(to_pci_dev(ism->dev.parent));
> 	u64 req = ZPCI_CREATE_REQ(zdev->fh, 0, size);
> 
> 	return __zpci_store_block(data, req, dmb_req);
>diff --git a/drivers/s390/net/ism_drv.c b/drivers/s390/net/ism_drv.c
>index ab70debbdd9d..c0954d6dd9f5 100644
>--- a/drivers/s390/net/ism_drv.c
>+++ b/drivers/s390/net/ism_drv.c
>@@ -88,7 +88,7 @@ static int register_sba(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_sba *sba;
> 
>-	sba = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	sba = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!sba)
> 		return -ENOMEM;
>@@ -99,7 +99,7 @@ static int register_sba(struct ism_dev *ism)
> 	cmd.request.sba = dma_handle;
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, sba, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, sba, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -115,7 +115,7 @@ static int register_ieq(struct ism_dev *ism)
> 	dma_addr_t dma_handle;
> 	struct ism_eq *ieq;
> 
>-	ieq = dma_alloc_coherent(&ism->pdev->dev, PAGE_SIZE, &dma_handle,
>+	ieq = dma_alloc_coherent(ism->dev.parent, PAGE_SIZE, &dma_handle,
> 				 GFP_KERNEL);
> 	if (!ieq)
> 		return -ENOMEM;
>@@ -127,7 +127,7 @@ static int register_ieq(struct ism_dev *ism)
> 	cmd.request.len = sizeof(*ieq);
> 
> 	if (ism_cmd(ism, &cmd)) {
>-		dma_free_coherent(&ism->pdev->dev, PAGE_SIZE, ieq, dma_handle);
>+		dma_free_coherent(ism->dev.parent, PAGE_SIZE, ieq, dma_handle);
> 		return -EIO;
> 	}
> 
>@@ -149,7 +149,7 @@ static int unregister_sba(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->sba, ism->sba_dma_addr);
> 
> 	ism->sba = NULL;
>@@ -169,7 +169,7 @@ static int unregister_ieq(struct ism_dev *ism)
> 	if (ret && ret != ISM_ERROR)
> 		return -EIO;
> 
>-	dma_free_coherent(&ism->pdev->dev, PAGE_SIZE,
>+	dma_free_coherent(ism->dev.parent, PAGE_SIZE,
> 			  ism->ieq, ism->ieq_dma_addr);
> 
> 	ism->ieq = NULL;
>@@ -216,7 +216,7 @@ static int ism_query_rgid(struct ism_dev *ism, uuid_t *rgid, u32 vid_valid,
> static void ism_free_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> {
> 	clear_bit(dmb->sba_idx, ism->sba_bitmap);
>-	dma_unmap_page(&ism->pdev->dev, dmb->dma_addr, dmb->dmb_len,
>+	dma_unmap_page(ism->dev.parent, dmb->dma_addr, dmb->dmb_len,
> 		       DMA_FROM_DEVICE);
> 	folio_put(virt_to_folio(dmb->cpu_addr));
> }
>@@ -227,7 +227,7 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	unsigned long bit;
> 	int rc;
> 
>-	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(&ism->pdev->dev))
>+	if (PAGE_ALIGN(dmb->dmb_len) > dma_get_max_seg_size(ism->dev.parent))
> 		return -EINVAL;
> 
> 	if (!dmb->sba_idx) {
>@@ -251,10 +251,10 @@ static int ism_alloc_dmb(struct ism_dev *ism, struct ism_dmb *dmb)
> 	}
> 
> 	dmb->cpu_addr = folio_address(folio);
>-	dmb->dma_addr = dma_map_page(&ism->pdev->dev,
>+	dmb->dma_addr = dma_map_page(ism->dev.parent,
> 				     virt_to_page(dmb->cpu_addr), 0,
> 				     dmb->dmb_len, DMA_FROM_DEVICE);
>-	if (dma_mapping_error(&ism->pdev->dev, dmb->dma_addr)) {
>+	if (dma_mapping_error(ism->dev.parent, dmb->dma_addr)) {
> 		rc = -ENOMEM;
> 		goto out_free;
> 	}
>@@ -419,10 +419,7 @@ static int ism_supports_v2(void)
> 
> static u16 ism_get_chid(struct ism_dev *ism)
> {
>-	if (!ism || !ism->pdev)
>-		return 0;
>-
>-	return to_zpci(ism->pdev)->pchid;
>+	return to_zpci(to_pci_dev(ism->dev.parent))->pchid;
> }
> 
> static void ism_handle_event(struct ism_dev *ism)
>@@ -499,7 +496,7 @@ static const struct ism_ops ism_vp_ops = {
> 
> static int ism_dev_init(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	int ret;
> 
> 	ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI);
>@@ -565,7 +562,6 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 
> 	spin_lock_init(&ism->lock);
> 	dev_set_drvdata(&pdev->dev, ism);
>-	ism->pdev = pdev;
> 	ism->dev.parent = &pdev->dev;
> 	device_initialize(&ism->dev);
> 	dev_set_name(&ism->dev, dev_name(&pdev->dev));
>@@ -603,14 +599,13 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> 	device_del(&ism->dev);
> err_dev:
> 	dev_set_drvdata(&pdev->dev, NULL);
>-	kfree(ism);
> 
> 	return ret;
> }
> 
> static void ism_dev_exit(struct ism_dev *ism)
> {
>-	struct pci_dev *pdev = ism->pdev;
>+	struct pci_dev *pdev = to_pci_dev(ism->dev.parent);
> 	unsigned long flags;
> 	int i;
> 
>diff --git a/include/linux/ism.h b/include/linux/ism.h
>index bc165d077071..929a1f275419 100644
>--- a/include/linux/ism.h
>+++ b/include/linux/ism.h
>@@ -144,6 +144,9 @@ int  ism_unregister_client(struct ism_client *client);
>  *	identified by dmb_tok and idx. If signal flag (sf) then signal
>  *	the remote peer that data has arrived in this dmb.
>  *
>+ * int (*unregister_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+ *	Unregister an ism_dmb buffer
>+ *
>  * int (*supports_v2)(void);
>  *
>  * u16 (*get_chid)(struct ism_dev *dev);
>@@ -218,12 +221,63 @@ struct ism_ops {
> 	int (*reset_vlan_required)(struct ism_dev *dev);
> 	int (*signal_event)(struct ism_dev *dev, uuid_t *rgid,
> 			    u32 trigger_irq, u32 event_code, u64 info);
>+/* no copy option
>+ * --------------
>+ */
>+	/**
>+	 * support_dmb_nocopy() - does this device provide no-copy option?
>+	 * @dev: ism device
>+	 *
>+	 * In addition to using move_data(), a sender device can provide a
>+	 * kernel address + length, that represents a target dmb
>+	 * (like MMIO). If a sender writes into such a ghost-send-buffer
>+	 * (= at this kernel address) the data will automatically
>+	 * immediately appear in the target dmb, even without calling
>+	 * move_data().
>+	 * Note that this is NOT related to the MSG_ZEROCOPY socket flag.
>+	 *
>+	 * Either all 3 function pointers for support_dmb_nocopy(),
>+	 * attach_dmb() and detach_dmb() are defined, or all of them must
>+	 * be NULL.
>+	 *
>+	 * Return: non-zero, if no-copy is supported.
>+	 */
>+	int (*support_dmb_nocopy)(struct ism_dev *dev);
>+	/**
>+	 * attach_dmb() - attach local memory to a remote dmb
>+	 * @dev: Local sending ism device
>+	 * @dmb: all other parameters are passed in the form of a
>+	 *	 dmb struct
>+	 *	 TODO: (THIS IS CONFUSING, should be changed)

Agree.

>+	 *  dmb_tok: (in) Token of the remote dmb, we want to attach to
>+	 *  cpu_addr: (out) MMIO address
>+	 *  dma_addr: (out) MMIO address (if applicable, invalid otherwise)
>+	 *  dmb_len: (out) length of local MMIO region,
>+	 *           equal to length of remote DMB.
>+	 *  sba_idx: (out) index of remote dmb (NOT HELPFUL, should be removed)
>+	 *
>+	 * Provides a memory address to the sender that can be used to
>+	 * directly write into the remote dmb.
>+	 *
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*attach_dmb)(struct ism_dev *dev, struct ism_dmb *dmb);
>+	/**
>+	 * detach_dmb() - Detach the ghost buffer from a remote dmb
>+	 * @dev: ism device
>+	 * @token: dmb token of the remote dmb
>+	 * Return: Zero upon success, Error code otherwise
>+	 */
>+	int (*detach_dmb)(struct ism_dev *dev, u64 token);
> };
> 

...

>+
>+static int ism_lo_move_data(struct ism_dev *ism, u64 dmb_tok,
>+			    unsigned int idx, bool sf, unsigned int offset,
>+			    void *data, unsigned int size)
>+{
>+	struct ism_lo_dmb_node *rmb_node = NULL, *tmp_node;
>+	struct ism_lo_dev *ldev;
>+	u16 s_mask;
>+	u8 client_id;
>+	u32 sba_idx;
>+
>+	ldev = container_of(ism, struct ism_lo_dev, ism);
>+
>+	if (!sf)
>+		/* since sndbuf is merged with peer DMB, there is
>+		 * no need to copy data from sndbuf to peer DMB.
>+		 */
>+		return 0;
>+
>+	read_lock_bh(&ldev->dmb_ht_lock);
>+	hash_for_each_possible(ldev->dmb_ht, tmp_node, list, dmb_tok) {
>+		if (tmp_node->token == dmb_tok) {
>+			rmb_node = tmp_node;
>+			break;
>+		}
>+	}
>+	if (!rmb_node) {
>+		read_unlock_bh(&ldev->dmb_ht_lock);
>+		return -EINVAL;
>+	}
>+	// So why copy the data now?? SMC usecase? Data buffer is attached,
>+	// rw-pointer are not attached?

I understand the confusion here. I have the same confusion the first time
I saw this.

This is actually the tricky part: it assumes the CDC will signal, while
the data will not. We need to copy the CDC, so the copy here only to the
CDC.

I think we should refine the move_data() API to make this clearer.

Best regards,
Dust


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ