lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMArcTV0D1bOED4E+4Yd+U2T88zRLciokP9HpBcoRF7H43d04w@mail.gmail.com>
Date: Thu, 17 Oct 2024 17:58:45 +0900
From: Taehee Yoo <ap420073@...il.com>
To: Stanislav Fomichev <stfomichev@...il.com>
Cc: davem@...emloft.net, kuba@...nel.org, pabeni@...hat.com, 
	edumazet@...gle.com, almasrymina@...gle.com, netdev@...r.kernel.org, 
	linux-doc@...r.kernel.org, donald.hunter@...il.com, corbet@....net, 
	michael.chan@...adcom.com, kory.maincent@...tlin.com, andrew@...n.ch, 
	maxime.chevallier@...tlin.com, danieller@...dia.com, hengqi@...ux.alibaba.com, 
	ecree.xilinx@...il.com, przemyslaw.kitszel@...el.com, hkallweit1@...il.com, 
	ahmed.zaki@...el.com, paul.greenwalt@...el.com, rrameshbabu@...dia.com, 
	idosch@...dia.com, asml.silence@...il.com, kaiyuanz@...gle.com, 
	willemb@...gle.com, aleksander.lobakin@...el.com, dw@...idwei.uk, 
	sridhar.samudrala@...el.com, bcreeley@....com
Subject: Re: [PATCH net-next v3 0/7] bnxt_en: implement device memory TCP for bnxt

On Thu, Oct 17, 2024 at 5:17 AM Stanislav Fomichev <stfomichev@...il.com> wrote:

Hi Stanislav,
Thank you so much for testing and improvement!

>
> On 10/03, Taehee Yoo wrote:
> > This series implements device memory TCP for bnxt_en driver and
> > necessary ethtool command implementations.
> >
> > NICs that use the bnxt_en driver support tcp-data-split feature named
> > HDS(header-data-split).
> > But there is no implementation for the HDS to enable/disable by ethtool.
> > Only getting the current HDS status is implemented and the HDS is just
> > automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
> > The hds_threshold follows the rx-copybreak value but it wasn't
> > changeable.
> >
> > Currently, bnxt_en driver enables tcp-data-split by default but not
> > always work.
> > There is hds_threshold value, which indicates that a packet size is
> > larger than this value, a packet will be split into header and data.
> > hds_threshold value has been 256, which is a default value of
> > rx-copybreak value too.
> > The rx-copybreak value hasn't been allowed to change so the
> > hds_threshold too.
> >
> > This patchset decouples hds_threshold and rx-copybreak first.
> > and make tcp-data-split, rx-copybreak, and
> > tcp-data-split-thresh(hds_threshold) configurable independently.
> >
> > But the default configuration is the same.
> > The default value of rx-copybreak is 256 and default
> > tcp-data-split-thresh is also 256.
> >
> > There are several related options.
> > TPA(HW-GRO, LRO), JUMBO, jumbo_thresh(firmware command), and Aggregation
> > Ring.
> >
> > The aggregation ring is fundamental to these all features.
> > When gro/lro/jumbo packets are received, NIC receives the first packet
> > from the normal ring.
> > follow packets come from the aggregation ring.
> >
> > These features are working regardless of HDS.
> > When TPA is enabled and HDS is disabled, the first packet contains
> > header and payload too.
> > and the following packets contain payload only.
> > If HDS is enabled, the first packet contains the header only, and the
> > following packets contain only payload.
> > So, HW-GRO/LRO is working regardless of HDS.
> >
> > There is another threshold value, which is jumbo_thresh.
> > This is very similar to hds_thresh, but jumbo thresh doesn't split
> > header and data.
> > It just split the first and following data based on length.
> > When NIC receives 1500 sized packet, and jumbo_thresh is 256(default, but
> > follows rx-copybreak),
> > the first data is 256 and the following packet size is 1500-256.
> >
> > Before this patch, at least if one of GRO, LRO, and JUMBO flags is
> > enabled, the Aggregation ring will be enabled.
> > If the Aggregation ring is enabled, both hds_threshold and
> > jumbo_thresh are set to the default value of rx-copybreak.
> >
> > So, GRO, LRO, JUMBO frames, they larger than 256 bytes, they will
> > be split into header and data if the protocol is TCP or UDP.
> > for the other protocol, jumbo_thresh works instead of hds_thresh.
> >
> > This means that tcp-data-split relies on the GRO, LRO, and JUMBO flags.
> > But by this patch, tcp-data-split no longer relies on these flags.
> > If the tcp-data-split is enabled, the Aggregation ring will be
> > enabled.
> > Also, hds_threshold no longer follows rx-copybreak value, it will
> > be set to the tcp-data-split-thresh value by user-space, but the
> > default value is still 256.
> >
> > If the protocol is TCP or UDP and the HDS is disabled and Aggregation
> > ring is enabled, a packet will be split into several pieces due to
> > jumbo_thresh.
> >
> > When XDP is attached, tcp-data-split is automatically disabled.
> >
> > LRO, GRO, and JUMBO are tested with BCM57414, BCM57504 and the firmware
> > version is 230.0.157.0.
> > I couldn't find any specification about minimum and maximum value
> > of hds_threshold, but from my test result, it was about 0 ~ 1023.
> > It means, over 1023 sized packets will be split into header and data if
> > tcp-data-split is enabled regardless of hds_treshold value.
> > When hds_threshold is 1500 and received packet size is 1400, HDS should
> > not be activated, but it is activated.
> > The maximum value of hds_threshold(tcp-data-split-thresh)
> > value is 256 because it has been working.
> > It was decided very conservatively.
> >
> > I checked out the tcp-data-split(HDS) works independently of GRO, LRO,
> > JUMBO. Tested GRO/LRO, JUMBO with enabled HDS and disabled HDS.
> > Also, I checked out tcp-data-split should be disabled automatically
> > when XDP is attached and disallowed to enable it again while XDP is
> > attached. I tested ranged values from min to max for
> > tcp-data-split-thresh and rx-copybreak, and it works.
> > tcp-data-split-thresh from 0 to 256, and rx-copybreak 65 to 256.
> > When testing this patchset, I checked skb->data, skb->data_len, and
> > nr_frags values.
> >
> > The first patch implements .{set, get}_tunable() in the bnxt_en.
> > The bnxt_en driver has been supporting the rx-copybreak feature but is
> > not configurable, Only the default rx-copybreak value has been working.
> > So, it changes the bnxt_en driver to be able to configure
> > the rx-copybreak value.
> >
> > The second patch adds an implementation of tcp-data-split ethtool
> > command.
> > The HDS relies on the Aggregation ring, which is automatically enabled
> > when either LRO, GRO, or large mtu is configured.
> > So, if the Aggregation ring is enabled, HDS is automatically enabled by
> > it.
> >
> > The third patch adds tcp-data-split-thresh command in the ethtool.
> > This threshold value indicates if a received packet size is larger
> > than this threshold, the packet's header and payload will be split.
> > Example:
> >    # ethtool -G <interface name> tcp-data-split-thresh <value>
> > This option can not be used when tcp-data-split is disabled or not
> > supported.
> >    # ethtool -G enp14s0f0np0 tcp-data-split on tcp-data-split-thresh 256
> >    # ethtool -g enp14s0f0np0
> >    Ring parameters for enp14s0f0np0:
> >    Pre-set maximums:
> >    ...
> >    Current hardware settings:
> >    ...
> >    TCP data split:         on
> >    TCP data split thresh:  256
> >
> >    # ethtool -G enp14s0f0np0 tcp-data-split off
> >    # ethtool -g enp14s0f0np0
> >    Ring parameters for enp14s0f0np0:
> >    Pre-set maximums:
> >    ...
> >    Current hardware settings:
> >    ...
> >    TCP data split:         off
> >    TCP data split thresh:  n/a
> >
> > The fourth patch adds the implementation of tcp-data-split-thresh logic
> > in the bnxt_en driver.
> > The default value is 256, which used to be the default rx-copybreak
> > value.
> >
> > The fifth and sixth adds condition check for devmem and ethtool.
> > If tcp-data-split is disabled or threshold value is not zero, setup of
> > devmem will be failed.
> > Also, tcp-data-split and tcp-data-split-thresh will not be changed
> > while devmem is running.
> >
> > The last patch implements device memory TCP for bnxt_en driver.
> > It usually converts generic page_pool api to netmem page_pool api.
> >
> > No dependencies exist between device memory TCP and GRO/LRO/MTU.
> > Only tcp-data-split and tcp-data-split-thresh should be enabled when the
> > device memory TCP.
> > While devmem TCP is set, tcp-data-split and tcp-data-split-thresh can't
> > be updated because core API disallows change.
> >
> > I tested the interface up/down while devmem TCP running. It works well.
> > Also, channel count change, and rx/tx ringsize change tests work well too.
> >
> > The devmem TCP test NIC is BCM57504
>
> [..]
>
> > All necessary configuration validations exist at the core API level.
> >
> > Note that by this patch, the setup of device memory TCP would fail.
> > Because tcp-data-split-thresh command is not supported by ethtool yet.
> > The tcp-data-split-thresh should be 0 for setup device memory TCP and
> > the default of bnxt is 256.
> > So, for the bnxt, it always fails until ethtool supports
> > tcp-data-split-thresh command.
> >
> > The ncdevmem.c will be updated after ethtool supports
> > tcp-data-split-thresh option.
>
> FYI, I've tested your series with BCM57504 on top of [1] and [2] with
> a couple of patches to make ncdevmem.c and TX work (see below). [1]
> decouples ncdevmem from ethtool so we can flip header split settings
> without requiring recent ethtool. Both RX and TX work perfectly.
> Feel free to carry:
>
> Tested-by: Stanislav Fomichev <sdf@...ichev.me>

Thank you so much for your work!
I will try to test your TX side patch before sending v4 patch.

>
> Also feel free to take over the ncdevmem patch if my ncdevmem changes
> get pulled before your series.

Good, Thanks!

>
> 1: https://lore.kernel.org/netdev/20241009171252.2328284-1-sdf@fomichev.me/
> 2: https://lore.kernel.org/netdev/20240913150913.1280238-1-sdf@fomichev.me/
>
> commit 69bc0e247eb4132ef5fd0b118719427d35d462fc
> Author:     Stanislav Fomichev <sdf@...ichev.me>
> AuthorDate: Tue Oct 15 15:56:43 2024 -0700
> Commit:     Stanislav Fomichev <sdf@...ichev.me>
> CommitDate: Wed Oct 16 13:13:42 2024 -0700
>
>     selftests: ncdevmem: Set header split threshold to 0
>
>     Needs to happen on BRCM to allow devmem to be attached.
>
>     Signed-off-by: Stanislav Fomichev <sdf@...ichev.me>
>
> diff --git a/tools/testing/selftests/drivers/net/hw/ncdevmem.c b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
> index 903dac3e61d5..6a94d52a6c43 100644
> --- a/tools/testing/selftests/drivers/net/hw/ncdevmem.c
> +++ b/tools/testing/selftests/drivers/net/hw/ncdevmem.c
> @@ -322,6 +322,8 @@ static int configure_headersplit(bool on)
>         ethtool_rings_set_req_set_header_dev_index(req, ifindex);
>         /* 0 - off, 1 - auto, 2 - on */
>         ethtool_rings_set_req_set_tcp_data_split(req, on ? 2 : 0);
> +       if (enable)
> +               ethtool_rings_set_req_set_tcp_data_split_thresh(req, 0);
>         ret = ethtool_rings_set(ys, req);
>         if (ret < 0)
>                 fprintf(stderr, "YNL failed: %s\n", ys->err.msg);
>
>
> commit ef5ba647bc94a19153c2c5cfc64ebe4cb86ac58d
> Author:     Stanislav Fomichev <sdf@...ichev.me>
> AuthorDate: Fri Oct 11 13:52:03 2024 -0700
> Commit:     Stanislav Fomichev <sdf@...ichev.me>
> CommitDate: Wed Oct 16 13:13:42 2024 -0700
>
>     bnxt_en: support tx device memory
>
>     The only change is to not unmap the frags on completions.
>
>     Signed-off-by: Stanislav Fomichev <sdf@...ichev.me>
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 6e422e24750a..cb22707a35aa 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -692,7 +692,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
>                         goto tx_dma_error;
>
>                 tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
> -               dma_unmap_addr_set(tx_buf, mapping, mapping);
> +               if (netmem_is_net_iov(frag->netmem))
> +                       dma_unmap_addr_set(tx_buf, mapping, 0);
> +               else
> +                       dma_unmap_addr_set(tx_buf, mapping, mapping);
>
>                 txbd->tx_bd_haddr = cpu_to_le64(mapping);
>
> @@ -749,9 +752,10 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
>         for (i = 0; i < last_frag; i++) {
>                 prod = NEXT_TX(prod);
>                 tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
> -               dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
> -                              skb_frag_size(&skb_shinfo(skb)->frags[i]),
> -                              DMA_TO_DEVICE);
> +               if (dma_unmap_addr(tx_buf, mapping))
> +                       dma_unmap_page(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
> +                                      skb_frag_size(&skb_shinfo(skb)->frags[i]),
> +                                      DMA_TO_DEVICE);
>         }
>
>  tx_free:
> @@ -821,11 +825,12 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
>                 for (j = 0; j < last; j++) {
>                         cons = NEXT_TX(cons);
>                         tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
> -                       dma_unmap_page(
> -                               &pdev->dev,
> -                               dma_unmap_addr(tx_buf, mapping),
> -                               skb_frag_size(&skb_shinfo(skb)->frags[j]),
> -                               DMA_TO_DEVICE);
> +                       if (dma_unmap_addr(tx_buf, mapping))
> +                               dma_unmap_page(
> +                                       &pdev->dev,
> +                                       dma_unmap_addr(tx_buf, mapping),
> +                                       skb_frag_size(&skb_shinfo(skb)->frags[j]),
> +                                       DMA_TO_DEVICE);
>                 }
>                 if (unlikely(is_ts_pkt)) {
>                         if (BNXT_CHIP_P5(bp)) {
> @@ -3296,10 +3301,11 @@ static void bnxt_free_tx_skbs(struct bnxt *bp)
>                                 skb_frag_t *frag = &skb_shinfo(skb)->frags[k];
>
>                                 tx_buf = &txr->tx_buf_ring[ring_idx];
> -                               dma_unmap_page(
> -                                       &pdev->dev,
> -                                       dma_unmap_addr(tx_buf, mapping),
> -                                       skb_frag_size(frag), DMA_TO_DEVICE);
> +                               if (dma_unmap_addr(tx_buf, mapping))
> +                                       dma_unmap_page(
> +                                               &pdev->dev,
> +                                               dma_unmap_addr(tx_buf, mapping),
> +                                               skb_frag_size(frag), DMA_TO_DEVICE);
>                         }
>                         dev_kfree_skb(skb);
>                 }

Thanks a lot!
Taehee Yoo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ