[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yy2prsz3tjqwjwxgsrumt3qt2d62gdvjwqsti3favtfmf7m5qs@eychxx5qz25f>
Date: Tue, 3 Sep 2024 17:17:18 -0500
From: Andrew Halaney <ahalaney@...hat.com>
To: Suraj Jaiswal <quic_jsuraj@...cinc.com>
Cc: Vinod Koul <vkoul@...nel.org>,
Bhupesh Sharma <bhupesh.sharma@...aro.org>, Andy Gross <agross@...nel.org>,
Bjorn Andersson <andersson@...nel.org>, Konrad Dybcio <konrad.dybcio@...aro.org>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Rob Herring <robh+dt@...nel.org>,
Krzysztof Kozlowski <krzysztof.kozlowski+dt@...aro.org>, Conor Dooley <conor+dt@...nel.org>,
Alexandre Torgue <alexandre.torgue@...s.st.com>, Jose Abreu <joabreu@...opsys.com>,
Maxime Coquelin <mcoquelin.stm32@...il.com>, netdev@...r.kernel.org, linux-arm-msm@...r.kernel.org,
devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-stm32@...md-mailman.stormreply.com, Prasad Sodagudi <psodagud@...cinc.com>,
Rob Herring <robh@...nel.org>, kernel@...cinc.com
Subject: Re: [PATCH net] net: stmmac: Stop using a single dma_map() for
multiple descriptors
On Mon, Sep 02, 2024 at 03:24:36PM GMT, Suraj Jaiswal wrote:
> Currently same page address is shared
> between multiple buffer addresses and causing smmu fault for other
> descriptor if address hold by one descriptor got cleaned.
> Allocate separate buffer address for each descriptor
> for TSO path so that if one descriptor cleared it should not
> clean other descriptor address.
I think maybe you mean something like:
Currently in the TSO case a page is mapped with dma_map_single(), and then
the resulting dma address is referenced (and offset) by multiple
descriptors until the whole region is programmed into the descriptors.
This makes it possible for stmmac_tx_clean() to dma_unmap() the first of the
already processed descriptors, while the rest are still being processed
by the DMA engine. This leads to an iommu fault due to the DMA engine using
unmapped memory as seen below:
<insert splat>
You can reproduce this easily by <reproduction steps>.
To fix this, let's map each descriptor's memory reference individually.
This way there's no risk of unmapping a region that's still being
referenced by the DMA engine in a later descriptor.
That's a bit nitpicky wording wise, but your first sentence is hard
for me to follow (buffer addresses seems to mean descriptor?). I think
showing a splat and mentioning how to reproduce is always a bonus as
well.
>
> Signed-off-by: Suraj Jaiswal <quic_jsuraj@...cinc.com>
Fixes: ?
At a quick glance I think its f748be531d70 ("stmmac: support new GMAC4")
> ---
>
> Changes since v2:
> - Fixed function description
> - Fixed handling of return value.
>
This is v1 as far as netdev is concerned :)
>
> .../net/ethernet/stmicro/stmmac/stmmac_main.c | 63 ++++++++++++-------
> 1 file changed, 42 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index 83b654b7a9fd..5948774c403f 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -4136,16 +4136,18 @@ static bool stmmac_vlan_insert(struct stmmac_priv *priv, struct sk_buff *skb,
> /**
> * stmmac_tso_allocator - close entry point of the driver
> * @priv: driver private structure
> - * @des: buffer start address
> + * @addr: Contains either skb frag address or skb->data address
> * @total_len: total length to fill in descriptors
> * @last_segment: condition for the last descriptor
> * @queue: TX queue index
> + * @is_skb_frag: condition to check whether skb data is part of fragment or not
> * Description:
> * This function fills descriptor and request new descriptors according to
> * buffer length to fill
> + * This function returns 0 on success else -ERRNO on fail
> */
> -static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des,
> - int total_len, bool last_segment, u32 queue)
> +static int stmmac_tso_allocator(struct stmmac_priv *priv, void *addr,
> + int total_len, bool last_segment, u32 queue, bool is_skb_frag)
> {
> struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
> struct dma_desc *desc;
> @@ -4153,6 +4155,8 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des,
> int tmp_len;
>
> tmp_len = total_len;
> + unsigned int offset = 0;
> + unsigned char *data = addr;
Reverse xmas tree order, offset is always set below so you could just
declare it, and data really doesn't seem necessary to me vs using addr
directly.
https://docs.kernel.org/process/maintainer-netdev.html#local-variable-ordering-reverse-xmas-tree-rcs
>
> while (tmp_len > 0) {
> dma_addr_t curr_addr;
> @@ -4161,20 +4165,44 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des,
> priv->dma_conf.dma_tx_size);
> WARN_ON(tx_q->tx_skbuff[tx_q->cur_tx]);
>
> + buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ? TSO_MAX_BUFF_SIZE : tmp_len;
> +
> if (tx_q->tbs & STMMAC_TBS_AVAIL)
> desc = &tx_q->dma_entx[tx_q->cur_tx].basic;
> else
> desc = &tx_q->dma_tx[tx_q->cur_tx];
>
> - curr_addr = des + (total_len - tmp_len);
> + offset = total_len - tmp_len;
> + if (!is_skb_frag) {
> + curr_addr = dma_map_single(priv->device, data + offset, buff_size,
> + DMA_TO_DEVICE);
Instead of defining "data" above, can't you just use "addr" directly here?
> +
> + if (dma_mapping_error(priv->device, curr_addr))
> + return -ENOMEM;
> +
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = false;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> + } else {
> + curr_addr = skb_frag_dma_map(priv->device, addr, offset,
> + buff_size,
> + DMA_TO_DEVICE);
> +
> + if (dma_mapping_error(priv->device, curr_addr))
> + return -ENOMEM;
> +
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = curr_addr;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].len = buff_size;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true;
> + tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> + }
> +
> if (priv->dma_cap.addr64 <= 32)
> desc->des0 = cpu_to_le32(curr_addr);
> else
> stmmac_set_desc_addr(priv, desc, curr_addr);
>
> - buff_size = tmp_len >= TSO_MAX_BUFF_SIZE ?
> - TSO_MAX_BUFF_SIZE : tmp_len;
> -
> stmmac_prepare_tso_tx_desc(priv, desc, 0, buff_size,
> 0, 1,
> (last_segment) && (tmp_len <= TSO_MAX_BUFF_SIZE),
> @@ -4182,6 +4210,7 @@ static void stmmac_tso_allocator(struct stmmac_priv *priv, dma_addr_t des,
>
> tmp_len -= TSO_MAX_BUFF_SIZE;
> }
> + return 0;
nit: add a newline before return 0
> }
>
> static void stmmac_flush_tx_descriptors(struct stmmac_priv *priv, int queue)
> @@ -4351,25 +4380,17 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev)
> pay_len = 0;
> }
>
> - stmmac_tso_allocator(priv, des, tmp_pay_len, (nfrags == 0), queue);
> + if (stmmac_tso_allocator(priv, (skb->data + proto_hdr_len),
> + tmp_pay_len, nfrags == 0, queue, false))
> + goto dma_map_err;
Changing the second argument here is subtly changing the dma_cap.addr64 <= 32
case right before this. Is that intentional?
i.e., prior, pretend des = 0 (side note but des is a very confusing
variable name for "dma address" when there's also mentions of desc meaning
"descriptor" in the DMA ring). In the <= 32 case, we'd call
stmmac_tso_allocator(priv, 0) and in the else case we'd call
stmmac_tso_allocator(priv, 0 + proto_hdr_len).
With this change in both cases its called with the (not-yet-dma-mapped)
skb->data + proto_hdr_len always (i.e. like the else case).
Honestly, the <= 32 case reads weird to me without this patch. It
seems some of the buffer is filled but des is not properly incremented?
I don't know how this hardware is supposed to be programmed (no databook
access) but that seems fishy (and like a separate bug, which would be
nice to squash if so in its own patch). Would you be able to explain the
logic there to me if it does make sense to you?
>
> /* Prepare fragments */
> for (i = 0; i < nfrags; i++) {
> - const skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
> + skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
>
> - des = skb_frag_dma_map(priv->device, frag, 0,
> - skb_frag_size(frag),
> - DMA_TO_DEVICE);
> - if (dma_mapping_error(priv->device, des))
> + if (stmmac_tso_allocator(priv, frag, skb_frag_size(frag),
> + (i == nfrags - 1), queue, true))
Personally I think it would be nice to change stmmac_tso_allocator() so
you can keep the frag const above... i.e. something like
stmmac_tso_allocator(..., void *addr, ..., const skb_frag_t *frag)
and just check if frag is NULL to determine if you're dealing with a
frag or not (instead of passing the boolean in to indicate that).
I'm curious if someone else can think of a cleaner API than that for
that function, even that's not super pretty...
> goto dma_map_err;
> -
> - stmmac_tso_allocator(priv, des, skb_frag_size(frag),
> - (i == nfrags - 1), queue);
> -
> - tx_q->tx_skbuff_dma[tx_q->cur_tx].buf = des;
> - tx_q->tx_skbuff_dma[tx_q->cur_tx].len = skb_frag_size(frag);
> - tx_q->tx_skbuff_dma[tx_q->cur_tx].map_as_page = true;
> - tx_q->tx_skbuff_dma[tx_q->cur_tx].buf_type = STMMAC_TXBUF_T_SKB;
> }
>
> tx_q->tx_skbuff_dma[tx_q->cur_tx].last_segment = true;
> --
> 2.25.1
>
Powered by blists - more mailing lists