[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30CF0FF9-3B1D-48AF-AFB4-73E20C404357@neclab.eu>
Date: Fri, 22 May 2015 10:24:39 +0000
From: Joao Martins <Joao.Martins@...lab.eu>
To: Wei Liu <wei.liu2@...rix.com>
CC: "xen-devel@...ts.xenproject.org" <xen-devel@...ts.xenproject.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"ian.campbell@...rix.com" <ian.campbell@...rix.com>,
"david.vrabel@...rix.com" <david.vrabel@...rix.com>,
"boris.ostrovsky@...cle.com" <boris.ostrovsky@...cle.com>,
"konrad.wilk@...cle.com" <konrad.wilk@...cle.com>
Subject: Re: [RFC PATCH 03/13] xen-netback: implement TX persistent grants
On 19 May 2015, at 17:23, Wei Liu <wei.liu2@...rix.com> wrote:
> On Tue, May 12, 2015 at 07:18:27PM +0200, Joao Martins wrote:
>> Introduces persistent grants for TX path which follows similar code path
>> as the grant mapping.
>>
>> It starts by checking if there's a persistent grant available for header
>> and frags grefs and if so setting it in tx_pgrants. If no persistent grant
>> is found in the tree for the header it will resort to grant copy (but
>> preparing the map ops and add them laster). For the frags it will use the
> ^
> later
>
>> tree page pool, and in case of no pages it fallbacks to grant map/unmap
>> using mmap_pages. When skb destructor callback gets called we release the
>> slot and persistent grant within the callback to avoid waking up the
>> dealloc thread. As long as there are no unmaps to done the dealloc thread
>> will remain inactive.
>>
>
> This scheme looks complicated. Can we just only use one
> scheme at a time? What's the rationale for using this combined scheme?
> Maybe you're thinking about using a max_grants < ring_size to save
> memory?
Yes, my purpose was to allow a max_grants < ring_size to save amount of
memory mapped. I did a bulk transfer test with iperf and the max amount of
grants in tree was <160 TX gnts, without affecting the max performance;
tough using pktgen fills the tree completely.
The second reason is to handle the case for a (malicious?) frontend providing
more grefs than the max allowed in which I would fallback to grant map/unmap.
>
> Only skim the patch. I will do detailed reviews after we're sure this is
> the right way to go.
>
>> Results show an improvement of 46% (1.82 vs 1.24 Mpps, 64 pkt size)
>> measured with pktgen and up to over 48% (21.6 vs 14.5 Gbit/s) measured
>> with iperf (TCP) with 4 parallel flows 1 queue vif, DomU to Dom0.
>> Tests ran on a Intel Xeon E5-1650 v2 with HT disabled.
>> […]
>> int xenvif_init_queue(struct xenvif_queue *queue)
>> {
>> int err, i;
>> @@ -496,9 +524,23 @@ int xenvif_init_queue(struct xenvif_queue *queue)
>> .ctx = NULL,
>> .desc = i };
>> queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
>> + queue->tx_pgrants[i] = NULL;
>> + }
>> +
>> + if (queue->vif->persistent_grants) {
>> + err = init_persistent_gnt_tree(&queue->tx_gnts_tree,
>> + queue->tx_gnts_pages,
>> + XEN_NETIF_TX_RING_SIZE);
>> + if (err)
>> + goto err_disable;
>> }
>>
>> return 0;
>> +
>> +err_disable:
>> + netdev_err(queue->vif->dev, "Could not reserve tree pages.");
>> + queue->vif->persistent_grants = 0;
>
> You can just move the above two lines under `if (err)'.
>
> Also see below.
In the next patch this is also common cleanup path. I did it this way to
avoid moving this hunk around.
>> + return 0;
>> }
>>
>> void xenvif_carrier_on(struct xenvif *vif)
>> @@ -654,6 +696,10 @@ void xenvif_disconnect(struct xenvif *vif)
>> }
>>
>> xenvif_unmap_frontend_rings(queue);
>> +
>> + if (queue->vif->persistent_grants)
>> + deinit_persistent_gnt_tree(&queue->tx_gnts_tree,
>> + queue->tx_gnts_pages);
>
> If the init function fails on queue N (N>0) you now leak resources.
Correct. I should return -ENOMEM (on init) and not 0.
>> }
>> }
>>
>> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
>> index 332e489..529d7c3 100644
>> --- a/drivers/net/xen-netback/netback.c
>> +++ b/drivers/net/xen-netback/netback.c
>> @@ -269,6 +269,11 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
>> return (unsigned long)pfn_to_kaddr(idx_to_pfn(queue, idx));
>> }
>>
>> +static inline void *page_to_kaddr(struct page *page)
>> +{
>> + return pfn_to_kaddr(page_to_pfn(page));
>> +}
>> +
>> #define callback_param(vif, pending_idx) \
>> (vif->pending_tx_info[pending_idx].callback_struct)
>>
>> @@ -299,6 +304,29 @@ static inline pending_ring_idx_t pending_index(unsigned i)
>> return i & (MAX_PENDING_REQS-1);
>> }
>>
>> +/* Creates a new persistent grant and add it to the tree.
>> + */
>> +static struct persistent_gnt *xenvif_pgrant_new(struct persistent_gnt_tree *tree,
>> + struct gnttab_map_grant_ref *gop)
>> +{
>> + struct persistent_gnt *persistent_gnt;
>> +
>> + persistent_gnt = kmalloc(sizeof(*persistent_gnt), GFP_KERNEL);
>
> xenvif_pgrant_new can be called in NAPI which runs in softirq context
> which doesn't allow you to sleep.
Silly mistake, I will fix this. The whole point of the tree page pool was that we
aren't allowed to sleep both here and in ndo_start_xmit (in patch #6).
>> […]
>>
>> +/* Checks if there's a persistent grant available for gref and
>> + * if so, set it also in the tx_pgrants array that keeps the ones
>> + * in use.
>> + */
>> +static bool xenvif_tx_pgrant_available(struct xenvif_queue *queue,
>> + grant_ref_t ref, u16 pending_idx,
>> + bool *can_map)
>> +{
>> + struct persistent_gnt_tree *tree = &queue->tx_gnts_tree;
>> + struct persistent_gnt *persistent_gnt;
>> + bool busy;
>> +
>> + if (!queue->vif->persistent_grants)
>> + return false;
>> +
>> + persistent_gnt = get_persistent_gnt(tree, ref);
>> +
>> + /* If gref is already in use we fallback, since it would
>> + * otherwise mean re-adding the same gref to the tree
>> + */
>> + busy = IS_ERR(persistent_gnt);
>> + if (unlikely(busy))
>> + persistent_gnt = NULL;
>> +
>
> Under what circumstance can we retrieve a already in use persistent
> grant? You seem to suggest this is a bug in RX case.
A guest could share try to share the same mapped page in multiple frags,
in which case I fallback to map/unmap. I think this is a limitation in
the way we manage the persistent gnts where we can only have a single
reference of a persistent grant inflight.
>> […]
>> MAX_PENDING_REQS);
>> index = pending_index(queue->dealloc_prod);
>> @@ -1691,7 +1939,10 @@ void xenvif_zerocopy_callback(struct ubuf_info *ubuf, bool zerocopy_success)
>> smp_wmb();
>> queue->dealloc_prod++;
>> } while (ubuf);
>> - wake_up(&queue->dealloc_wq);
>> + /* Wake up only when there are grants to unmap */
>> + if (dealloc_prod_save != queue->dealloc_prod)
>> + wake_up(&queue->dealloc_wq);
>> +
>> spin_unlock_irqrestore(&queue->callback_lock, flags);
>>
>> if (likely(zerocopy_success))
>> @@ -1779,10 +2030,13 @@ int xenvif_tx_action(struct xenvif_queue *queue, int budget)
>>
>> xenvif_tx_build_gops(queue, budget, &nr_cops, &nr_mops);
>>
>> - if (nr_cops == 0)
>> + if (!queue->vif->persistent_grants &&
>> + nr_cops == 0)
>
> You can just move nr_cops to previous line.
>
> Wei.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists