[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALzJLG846VMsOWuAvT=6Kcdu_p+ixUX=tTyV-DgAEXYpXszBgw@mail.gmail.com>
Date: Mon, 11 Jul 2016 14:48:17 +0300
From: Saeed Mahameed <saeedm@....mellanox.co.il>
To: Brenden Blanco <bblanco@...mgrid.com>
Cc: Tariq Toukan <ttoukan.linux@...il.com>,
"David S. Miller" <davem@...emloft.net>,
Linux Netdev List <netdev@...r.kernel.org>,
Martin KaFai Lau <kafai@...com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Ari Saha <as754m@....com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>,
Or Gerlitz <gerlitz.or@...il.com>,
john fastabend <john.fastabend@...il.com>,
hannes@...essinduktion.org, Thomas Graf <tgraf@...g.ch>,
Tom Herbert <tom@...bertland.com>,
Daniel Borkmann <daniel@...earbox.net>
Subject: Re: [PATCH v6 04/12] net/mlx4_en: add support for fast rx drop bpf program
On Sun, Jul 10, 2016 at 7:05 PM, Brenden Blanco <bblanco@...mgrid.com> wrote:
> On Sun, Jul 10, 2016 at 06:25:40PM +0300, Tariq Toukan wrote:
>>
>> On 09/07/2016 10:58 PM, Saeed Mahameed wrote:
>> >On Fri, Jul 8, 2016 at 5:15 AM, Brenden Blanco <bblanco@...mgrid.com> wrote:
>> >>+ /* A bpf program gets first chance to drop the packet. It may
>> >>+ * read bytes but not past the end of the frag.
>> >>+ */
>> >>+ if (prog) {
>> >>+ struct xdp_buff xdp;
>> >>+ dma_addr_t dma;
>> >>+ u32 act;
>> >>+
>> >>+ dma = be64_to_cpu(rx_desc->data[0].addr);
>> >>+ dma_sync_single_for_cpu(priv->ddev, dma,
>> >>+ priv->frag_info[0].frag_size,
>> >>+ DMA_FROM_DEVICE);
>> >In case of XDP_PASS we will dma_sync again in the normal path, this
>> >can be improved by doing the dma_sync as soon as we can and once and
>> >for all, regardless of the path the packet is going to take
>> >(XDP_DROP/mlx4_en_complete_rx_desc/mlx4_en_rx_skb).
>> I agree with Saeed, dma_sync is a heavy operation that is now done
>> twice for all packets with XDP_PASS.
>> We should try our best to avoid performance degradation in the flow
>> of unfiltered packets.
> Makes sense, do folks here see a way to do this cleanly?
yes, we need something like:
+static inline void
+mlx4_en_sync_dma(struct mlx4_en_priv *priv,
+ struct mlx4_en_rx_desc *rx_desc,
+ int length)
+{
+ dma_addr_t dma;
+
+ /* Sync dma addresses from HW descriptor */
+ for (nr = 0; nr < priv->num_frags; nr++) {
+ struct mlx4_en_frag_info *frag_info = &priv->frag_info[nr];
+
+ if (length <= frag_info->frag_prefix_size)
+ break;
+
+ dma = be64_to_cpu(rx_desc->data[nr].addr);
+ dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size,
+ DMA_FROM_DEVICE);
+ }
+}
@@ -790,6 +808,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev,
struct mlx4_en_cq *cq, int bud
goto next;
}
+ length = be32_to_cpu(cqe->byte_cnt);
+ length -= ring->fcs_del;
+
+ mlx4_en_sync_dma(priv,rx_desc, length);
/* data is available continue processing the packet */
and make sure to remove all explicit dma_sync_single_for_cpu calls.
Powered by blists - more mailing lists