[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKgT0UeGRUytaA9MVF_YitYBq_49uXBgAPrDHbbaykYCXpJj6A@mail.gmail.com>
Date: Mon, 13 Feb 2017 17:32:37 -0800
From: Alexander Duyck <alexander.duyck@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>,
Martin KaFai Lau <kafai@...com>,
Saeed Mahameed <saeedm@...lanox.com>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH v3 net-next 08/14] mlx4: use order-0 pages for RX
On Mon, Feb 13, 2017 at 4:57 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>
>> Alex, be assured that I implemented the full thing, of course.
>
> Patch was :
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index aa074e57ce06fb2842fa1faabd156c3cd2fe10f5..0ae1b544668d26c24044dbdefdd9b12253596ff9 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -68,6 +68,7 @@ static int mlx4_alloc_page(struct mlx4_en_priv *priv,
> frag->page = page;
> frag->dma = dma;
> frag->page_offset = priv->rx_headroom;
> + frag->pagecnt_bias = 1;
> return 0;
> }
>
> @@ -97,7 +98,7 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
> if (frag->page) {
> dma_unmap_page(priv->ddev, frag->dma,
> PAGE_SIZE, priv->dma_dir);
> - __free_page(frag->page);
> + __page_frag_cache_drain(frag->page, frag->pagecnt_bias);
> }
> /* We need to clear all fields, otherwise a change of priv->log_rx_info
> * could lead to see garbage later in frag->page.
> @@ -470,6 +471,7 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> {
> const struct mlx4_en_frag_info *frag_info = priv->frag_info;
> unsigned int truesize = 0;
> + unsigned int pagecnt_bias;
> int nr, frag_size;
> struct page *page;
> dma_addr_t dma;
> @@ -491,9 +493,10 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> frag_size);
>
> truesize += frag_info->frag_stride;
> + pagecnt_bias = frags->pagecnt_bias--;
> if (frag_info->frag_stride == PAGE_SIZE / 2) {
> frags->page_offset ^= PAGE_SIZE / 2;
> - release = page_count(page) != 1 ||
> + release = page_count(page) != pagecnt_bias ||
> page_is_pfmemalloc(page) ||
> page_to_nid(page) != numa_mem_id();
> } else {
> @@ -504,9 +507,13 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> }
> if (release) {
> dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
> + __page_frag_cache_drain(page, --pagecnt_bias);
> frags->page = NULL;
> } else {
> - page_ref_inc(page);
> + if (pagecnt_bias == 1) {
> + page_ref_add(page, USHRT_MAX);
> + frags->pagecnt_bias = USHRT_MAX;
> + }
> }
>
> nr++;
You might want to examine the code while running perf. What you
should see is the page_ref_inc here go from eating a significant
amount of time prior to the patch to something negligable after the
patch. If the page_ref_inc isn't adding much pressure then maybe that
is why it didn't provide any significant gain on mlx4. I suppose it's
a possibility that the mlx4 code is different enough that maybe their
code is just running in a different environment, for example there
might not be any MMIO pressure to put any serious pressure on the
atomic op so it is processed more quickly.
Also back when I was hammering on this it was back when I was mostly
focused on routing and doing micro-benchmarks. Odds are it is
probably one of those things that won't show up unless you are really
looking for it so no need to worry about addressing it now.
- Alex
Powered by blists - more mailing lists