[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65f1a941a3013250e2a768a31f5e521dc21f73e8.camel@mellanox.com>
Date: Wed, 20 Jun 2018 23:41:45 +0000
From: Saeed Mahameed <saeedm@...lanox.com>
To: "eric.dumazet@...il.com" <eric.dumazet@...il.com>,
"kafai@...com" <kafai@...com>, Tariq Toukan <tariqt@...lanox.com>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"edumazet@...gle.com" <edumazet@...gle.com>
Subject: Re: [net RFC] net/mlx4_en: Use frag stride in crossing page boundary
condition
On Tue, 2018-06-19 at 17:25 -0700, Eric Dumazet wrote:
>
> On 06/19/2018 11:05 AM, Saeed Mahameed wrote:
>
> > this is only true for XDP setup, for non XDP max stride_size can
> > only
> > be around ~3k and only for mtu > ~6k
> >
> > For XDP setup you suggested:
> > - priv->frag_info[0].frag_size = eff_mtu;
> > + priv->frag_info[0].frag_size = PAGE_SIZE;
> >
> > currently the condition is:
> >
> > release = frags->page_offset + frag_info->frag_size > PAGE_SIZE;
> >
> > so my solution and yours have the same problem you described above.
> >
> > the problem is not with the initial values or with stride/farg size
> > math, it just that in XDP we shouldn't reuse at ALL. I agree with
> > you
> > that we need to optimize and maybe for PAGE_SIZE > 8k we need to
> > allow
> > XDP setup to reuses. but for now there is a data corruption to
> > handle.
>
>
> Sure, we all agree there is a bug to fix.
>
> The way you are fixing it is kind of illogical.
>
> The NIC can use a frag if its _size_ is big enough to receive the
> frame.
>
> The _stride_ is an abstraction created by the driver to report an
> estimation of the _truesize_,
> or memory consumption, so that linux can better track overall memory
> usage.
>
> For example, if MTU=1500, the size of the fragment is 1536 bytes, but
> since we can put only
> 2 fragments per 4KB page (on x86), we declare the _stride_ to be 2048
> bytes.
>
> Declaring that a final blob of a page, being 1600 bytes, not able to
> receive a frame because
> _stride_ is 2048 is illogical and waste resources.
>
>
I see, I wanted to use _stride_ as grantee for how much a page frag can
grow, for example in mlx5 we need the whole stride to build_skb around
the frag, since we always need the trailer, but it is different in here
and we can avoid resource waste.
so how a bout this: (As suggested by Martin).
currently as mlx4_en_complete_rx_desc assumes that priv->rx_headroom
is always 0 in non-XDP setup, hence:
frags->page_offset += sz_align;
where it really should be:
frags->page_offset += sz_align + priv->rx_headroom;
we can use it as a hint to not reuse as below:
what do you think ?
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -474,10 +474,10 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
{
const struct mlx4_en_frag_info *frag_info = priv->frag_info;
unsigned int truesize = 0;
+ bool release = true;
int nr, frag_size;
struct page *page;
dma_addr_t dma;
- bool release;
index 9f54ccbddea7..f14c7a574cc8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
/* Collect used fragments while replacing them in the HW
descriptors */
for (nr = 0;; frags++) {
@@ -500,7 +500,7 @@ static int mlx4_en_complete_rx_desc(struct
mlx4_en_priv *priv,
release = page_count(page) != 1 ||
page_is_pfmemalloc(page) ||
page_to_nid(page) != numa_mem_id();
- } else {
+ } elseif(!priv->rx_headroom) {
u32 sz_align = ALIGN(frag_size,
SMP_CACHE_BYTES);
frags->page_offset += sz_align;
Powered by blists - more mailing lists