[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5326E23F.8050304@citrix.com>
Date: Mon, 17 Mar 2014 11:53:35 +0000
From: Andrew Bennieston <andrew.bennieston@...rix.com>
To: Ian Campbell <Ian.Campbell@...rix.com>
CC: <xen-devel@...ts.xenproject.org>, <wei.liu2@...rix.com>,
<paul.durrant@...rix.com>, <netdev@...r.kernel.org>,
<david.vrabel@...rix.com>
Subject: Re: [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data
into queue struct.
On 14/03/14 15:55, Ian Campbell wrote:
> On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
>> From: "Andrew J. Bennieston" <andrew.bennieston@...rix.com>
>>
>> In preparation for multi-queue support in xen-netback, move the
>> queue-specific data from struct xenvif into struct xenvif_queue, and
>> update the rest of the code to use this.
>>
>> Also[...]
>>
>> Finally,[...]
>
> This is already quite a big patch, and I don't think the commit log
> covers everything it changes/refactors, does it?
>
> It's always a good idea to break these things apart but in particular
> separating the mechanical stuff (s/vif/queue/g) from the non-mechanical
> stuff, since the mechanical stuff is essentially trivial to review and
> getting it out the way makes the non-mechanical stuff much easier to
> check (or even spot).
>
The vast majority of changes in this patch are s/vif/queue/g. The rest
are related changes, such as inserting loops over queues, and moving
queue-specific initialisation away from the vif-wide initialisation, so
that it can be done once per queue.
I consider these things to be logically related and definitely within
the purview of this single patch. Without doing this, it is difficult to
get a patch that results in something that even compiles, without
putting in a bunch of placeholder code that will be removed in the very
next patch.
When I split this feature into multiple patches, I took care to group
as little as possible into this first patch (and the same for netfront).
It is still a large patch, but by my count most of this is a simple
replacement of vif with queue...
A first-order approximation, searching for line pairs where the first
has 'vif' and the second has 'queue', yields:
➜ xen-netback git:(saturn) git show HEAD~4 | grep -A 1 vif | grep queue
| wc -l
380
i.e. 760 (=380*2) lines out of the 2240 (~ 40%) are trivial replacements
of vif with queue, and this is not counting multi-line replacements, of
which there are many. What remains is mostly adding loops over these
queues. This could, in principle, be done in a second patch, but the
impact of this is small.
>
>>
>> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@...rix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@...rix.com>
>> ---
>> drivers/net/xen-netback/common.h | 85 ++++--
>> drivers/net/xen-netback/interface.c | 329 ++++++++++++++--------
>> drivers/net/xen-netback/netback.c | 530 ++++++++++++++++++-----------------
>> drivers/net/xen-netback/xenbus.c | 87 ++++--
>> 4 files changed, 608 insertions(+), 423 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index ae413a2..4176539 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -108,17 +108,39 @@ struct xenvif_rx_meta {
>> */
>> #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
>>
>> -struct xenvif {
>> - /* Unique identifier for this interface. */
>> - domid_t domid;
>> - unsigned int handle;
>> +/* Queue name is interface name with "-qNNN" appended */
>> +#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
>
> One more than necessary? Or does IFNAMSIZ not include the NULL? (I can't
> figure out if it does or not!)
interface.c contains the line:
snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);
This suggests that IFNAMSIZ counts the trailing NULL, so I can reduce
this count by 1 on that basis.
>
>> [...]
>> - /* This array is allocated seperately as it is large */
>> - struct gnttab_copy *grant_copy_op;
>> + struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>
> Is this deliberate? It seems like a retrograde step reverting parts of
> ac3d5ac27735 "xen-netback: fix guest-receive-side array sizes" from Paul
> (at least you are nuking a speeling erorr)
Yes, this was deliberate. These arrays were moved out to avoid problems
with kmalloc for the struct net_device (which contains the struct xenvif
in its netdev_priv space). Since the queues are now allocated via
vzalloc, there is no need to do separate allocations (with the
requirement to also separately free on every error/teardown path) so I
moved these back into the main queue structure.
>
> How does this series interact with Zoltan's foreign mapping one? Badly I
> should imagine, are you going to rebase?
I'm working on the rebase right now.
>
>> + /* First, check if there is only one queue to optimise the
>> + * single-queue or old frontend scenario.
>> + */
>> + if (vif->num_queues == 1) {
>> + queue_index = 0;
>> + } else {
>> + /* Use skb_get_hash to obtain an L4 hash if available */
>> + hash = skb_get_hash(skb);
>> + queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
>
> No modulo num_queues here?
>
> Is the multiply and shift from some best practice somewhere? Or else
> what is it doing?
It seems to be what a bunch of other net drivers do in this scenario. I
guess the reasoning is it'll be faster than a mod num_queues.
>
>
>> + /* Obtain the queue to be used to transmit this packet */
>> + index = skb_get_queue_mapping(skb);
>> + if (index >= vif->num_queues)
>> + index = 0; /* Fall back to queue 0 if out of range */
>
> Is this actually allowed to happen?
>
> Even if yes, not modulo num_queue so spread it around a bit?
This probably isn't allowed to happen. I figured it didn't hurt to be a
little defensive with the code here, and falling back to queue 0 is a
fairly safe thing to do.
>> static void xenvif_up(struct xenvif *vif)
>> {
>> - napi_enable(&vif->napi);
>> - enable_irq(vif->tx_irq);
>> - if (vif->tx_irq != vif->rx_irq)
>> - enable_irq(vif->rx_irq);
>> - xenvif_check_rx_xenvif(vif);
>> + struct xenvif_queue *queue = NULL;
>> + unsigned int queue_index;
>> +
>> + for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
>
> This vif->num_queues -- is it the same as dev->num_tx_queues? Or areew
> there differing concepts of queue around?
It should be the same as dev->real_num_tx_queues, which may be less than
dev->num_tx_queues.
>> + queue = &vif->queues[queue_index];
>> + napi_enable(&queue->napi);
>> + enable_irq(queue->tx_irq);
>> + if (queue->tx_irq != queue->rx_irq)
>> + enable_irq(queue->rx_irq);
>> + xenvif_check_rx_xenvif(queue);
>> + }
>> }
>>
>> static void xenvif_down(struct xenvif *vif)
>> {
>> - napi_disable(&vif->napi);
>> - disable_irq(vif->tx_irq);
>> - if (vif->tx_irq != vif->rx_irq)
>> - disable_irq(vif->rx_irq);
>> - del_timer_sync(&vif->credit_timeout);
>> + struct xenvif_queue *queue = NULL;
>> + unsigned int queue_index;
>
> Why unsigned?
Why not? You can't have a negative number of queues. Zero indicates "I
don't have any set up yet". I'm not expecting people to have 4 billion
or so queues, but equally I can't see a valid use for negative values
here.
>
>> @@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
>> return;
>> }
>>
>> - xen_net_read_rate(dev, &be->vif->credit_bytes,
>> - &be->vif->credit_usec);
>> - be->vif->remaining_credit = be->vif->credit_bytes;
>> + xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>> + read_xenbus_vif_flags(be);
>> +
>> + be->vif->num_queues = 1;
>> + be->vif->queues = vzalloc(be->vif->num_queues *
>> + sizeof(struct xenvif_queue));
>> +
>> + for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
>> + queue = &be->vif->queues[queue_index];
>> + queue->vif = be->vif;
>> + queue->id = queue_index;
>> + snprintf(queue->name, sizeof(queue->name), "%s-q%u",
>> + be->vif->dev->name, queue->id);
>> +
>> + xenvif_init_queue(queue);
>> +
>> + queue->remaining_credit = credit_bytes;
>> +
>> + err = connect_rings(be, queue);
>> + if (err)
>> + goto err;
>> + }
>> +
>> + xenvif_carrier_on(be->vif);
>>
>> unregister_hotplug_status_watch(be);
>> err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
>> @@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
>> if (!err)
>> be->have_hotplug_status_watch = 1;
>>
>> - netif_wake_queue(be->vif->dev);
>> + netif_tx_wake_all_queues(be->vif->dev);
>> +
>> + return;
>> +
>> +err:
>> + vfree(be->vif->queues);
>> + be->vif->queues = NULL;
>> + be->vif->num_queues = 0;
>> + return;
>
> Do you not need to unwind the setup already done on the previous queues
> before the failure?
Err... yes. I was sure that code existed at some point, but I can't find
it now. Oops!
-Andrew
>
>> }
>>
>>
>> -static int connect_rings(struct backend_info *be)
>> +static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>> {
>> - struct xenvif *vif = be->vif;
>> struct xenbus_device *dev = be->dev;
>> unsigned long tx_ring_ref, rx_ring_ref;
>> - unsigned int tx_evtchn, rx_evtchn, rx_copy;
>> + unsigned int tx_evtchn, rx_evtchn;
>> int err;
>> - int val;
>>
>> err = xenbus_gather(XBT_NIL, dev->otherend,
>> "tx-ring-ref", "%lu", &tx_ring_ref,
>> @@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
>> rx_evtchn = tx_evtchn;
>> }
>>
>> + /* Map the shared frame, irq etc. */
>> + err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
>> + tx_evtchn, rx_evtchn);
>> + if (err) {
>> + xenbus_dev_fatal(dev, err,
>> + "mapping shared-frames %lu/%lu port tx %u rx %u",
>> + tx_ring_ref, rx_ring_ref,
>> + tx_evtchn, rx_evtchn);
>> + return err;
>> + }
>> +
>> + return 0;
>> +}
>> +
>
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists