[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170207190546.GA51444@ast-mbp.thefacebook.com>
Date: Tue, 7 Feb 2017 11:05:49 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Tariq Toukan <ttoukan.linux@...il.com>,
Eric Dumazet <edumazet@...gle.com>,
"David S . Miller" <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>,
Tariq Toukan <tariqt@...lanox.com>,
Martin KaFai Lau <kafai@...com>,
Willem de Bruijn <willemb@...gle.com>,
Jesper Dangaard Brouer <brouer@...hat.com>,
Brenden Blanco <bblanco@...mgrid.com>,
Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
On Tue, Feb 07, 2017 at 08:26:23AM -0800, Eric Dumazet wrote:
> On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
>
Awesome that you've started working on this. I think it's correct approach
and mlx5 should be cleaned up in similar way.
Long term we should be able to move all page alloc/free out of the drivers
completely.
> > /*
> > * make sure we read the CQE after we read the ownership bit
> > */
> > dma_rmb();
> > + prefetch(frags[0].page);
>
> Note that I would like to instead do a prefetch(frags[1].page)
yeah, this two look weird:
+ prefetch(frags[0].page);
+ va = page_address(frags[0].page) + frags[0].page_offset;
on most archs page_addres() is just math (not a load from memory),
but the result != frags[0].page, so I'm missing what are you trying to prefetch?
prefetch(frags[1].page)
is even more confusing. what will it prefetch?
btw we had a patch that was doing prefetch of 'va' of next packet
and it was very helpful. Like this:
pref_index = (index + 1) & ring->size_mask;
pref = ring->rx_info + (pref_index << priv->log_rx_info);
prefetch(page_address(pref->page) + pref->page_offset);
but since you're redesigning rxing->rx_info... not sure how will it fit.
> So I will probably change how ring->rx_info is allocated
>
> wasting all that space and forcing vmalloc() is silly :
>
> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
> sizeof(struct mlx4_en_rx_alloc));
I think you'd still need roundup_pow_of_two otherwise priv->log_rx_info
optimization won't work.
Powered by blists - more mailing lists