[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1372022003.3301.47.camel@edumazet-glaptop>
Date: Sun, 23 Jun 2013 14:13:23 -0700
From: Eric Dumazet <eric.dumazet@...il.com>
To: Or Gerlitz <or.gerlitz@...il.com>
Cc: "David S. Miller" <davem@...emloft.net>, netdev@...r.kernel.org,
Or Gerlitz <ogerlitz@...lanox.com>,
Eugenia Emantayev <eugenia@...lanox.com>,
Saeed Mahameed <saeedm@...lanox.com>
Subject: Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX
path
On Sun, 2013-06-23 at 23:17 +0300, Or Gerlitz wrote:
> On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@...il.com> wrote:
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >
> > mlx4 exclusively uses order-2 allocations in RX path, which are
> > likely to fail under memory pressure.
> >
> > We therefore drop frames more than needed.
> >
> > This patch tries order-3, order-2, order-1 and finally order-0
> > allocations to keep good performance, yet allow allocations if/when
> > memory gets fragmented.
> >
> > By using larger pages, and avoiding unnecessary get_page()/put_page()
> > on compound pages, this patch improves performance as well, lowering
> > false sharing on struct page.
>
> Hi Eric, thanks for the patch, both Amir and Yevgeny are OOO, so it
> will take us a bit more time to conduct the review... but lets start:
> could you explain a little further what do you exactly refer to by
> "false sharing" in this context?
Every time mlx4 prepared a page frag into a skb, it did :
- a get_page() in mlx4_en_alloc_frags()
- a get_page() in mlx4_en_complete_rx_desc()
- a put_page() in mlx4_en_free_frag()
-> lot of changes of page->_count
When this skb is consumed, frag is freed -> put_page()
-> decrement of page->_count
If the consumer is on a different cpu, this adds false sharing on
"struct page"
After my patch, mlx4 driver touches this "struct page" only once,
and the consumers will do their get_page() without being slowed down by
mlx4 driver/cpu. This reduces latencies.
>
> Also, I am not fully sure, but I think the current driver code doesn't
> support splice and this somehow relates to how RX skbs are spread over
> pages. In that repsect, I wonder if this patch goes in the direction
> that would allow to support splice, or maybe takes us a bit back, as
> of moving to use order-3 allocations?
splice is supported by core networking, no worries ;)
It doesn't depend on order-whatever allocations.
BTW, splice() works well for TCP over loopback, and TX already uses
fragments in order-3 pages.
>
> You've mentioned performance improvement, could you be more specific?
> what's the scheme under which you saw the improvement and what was
> that improvement.
A cpu might be fully dedicated to softirq handling, and skb consumed on
other cpus.
My patch removes ~60 atomic operations per allocated page
(21 frags, and for each frag, two get_page() and one put_page())
>
> Last, as Amir wrote you, we're looking on re-using skbs on the RX
> patch to avoid sever performance hits when IOMMU is enabled. The team
> has not provided me yet the patch, but basically, if you look on the
> ixgbe patch that was made largely for that very same purpose
> (improving perf under IOMMU) f800326dca7bc158f4c886aa92f222de37993c80
> "ixgbe: Replace standard receive path with a page based receive" ,
> they use there order-0 or order-1 allocations, but not order-2 or
> order-3, also here I have some more catch up to conduct, so we'll
> see...
ixgbe do not support frag_size of 1536 bytes, but 2048 or 4096 bytes.
So using order-3 pages is not win for it.
But for mlx4, we gain 5% occupancy using order-3 pages (21 frags per
32K) over order-2 pages (10 frags per 16K), and 30 % over order-0 pages
(2 frags per 4K)
I don't know, current mlx4 driver is barely usable as is, unless you
make sure the host has enough memory, with plenty of order-2 pages.
And unless you have really specialized applications, there is never
enough memory.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists