lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5326E23F.8050304@citrix.com>
Date:	Mon, 17 Mar 2014 11:53:35 +0000
From:	Andrew Bennieston <andrew.bennieston@...rix.com>
To:	Ian Campbell <Ian.Campbell@...rix.com>
CC:	<xen-devel@...ts.xenproject.org>, <wei.liu2@...rix.com>,
	<paul.durrant@...rix.com>, <netdev@...r.kernel.org>,
	<david.vrabel@...rix.com>
Subject: Re: [PATCH V6 net-next 1/5] xen-netback: Factor queue-specific data
 into queue struct.

On 14/03/14 15:55, Ian Campbell wrote:
> On Mon, 2014-03-03 at 11:47 +0000, Andrew J. Bennieston wrote:
>> From: "Andrew J. Bennieston" <andrew.bennieston@...rix.com>
>>
>> In preparation for multi-queue support in xen-netback, move the
>> queue-specific data from struct xenvif into struct xenvif_queue, and
>> update the rest of the code to use this.
>>
>> Also[...]
>>
>> Finally,[...]
>
> This is already quite a big patch, and I don't think the commit log
> covers everything it changes/refactors, does it?
>
> It's always a good idea to break these things apart but in particular
> separating the mechanical stuff (s/vif/queue/g) from the non-mechanical
> stuff, since the mechanical stuff is essentially trivial to review and
> getting it out the way makes the non-mechanical stuff much easier to
> check (or even spot).
>

The vast majority of changes in this patch are s/vif/queue/g. The rest
are related changes, such as inserting loops over queues, and moving
queue-specific initialisation away from the vif-wide initialisation, so
that it can be done once per queue.

I consider these things to be logically related and definitely within
the purview of this single patch. Without doing this, it is difficult to
get a patch that results in something that even compiles, without
putting in a bunch of placeholder code that will be removed in the very
next patch.

When I split this feature into multiple patches, I took care to group
as little as possible into this first patch (and the same for netfront).
It is still a large patch, but by my count most of this is a simple
replacement of vif with queue...

A first-order approximation, searching for line pairs where the first
has 'vif' and the second has 'queue', yields:

➜  xen-netback git:(saturn) git show HEAD~4 | grep -A 1 vif | grep queue 
| wc -l
380

i.e. 760 (=380*2) lines out of the 2240 (~ 40%) are trivial replacements
of vif with queue, and this is not counting multi-line replacements, of
which there are many. What remains is mostly adding loops over these
queues. This could, in principle, be done in a second patch, but the
impact of this is small.

>
>>
>> Signed-off-by: Andrew J. Bennieston <andrew.bennieston@...rix.com>
>> Reviewed-by: Paul Durrant <paul.durrant@...rix.com>
>> ---
>>   drivers/net/xen-netback/common.h    |   85 ++++--
>>   drivers/net/xen-netback/interface.c |  329 ++++++++++++++--------
>>   drivers/net/xen-netback/netback.c   |  530 ++++++++++++++++++-----------------
>>   drivers/net/xen-netback/xenbus.c    |   87 ++++--
>>   4 files changed, 608 insertions(+), 423 deletions(-)
>>
>> diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
>> index ae413a2..4176539 100644
>> --- a/drivers/net/xen-netback/common.h
>> +++ b/drivers/net/xen-netback/common.h
>> @@ -108,17 +108,39 @@ struct xenvif_rx_meta {
>>    */
>>   #define MAX_GRANT_COPY_OPS (MAX_SKB_FRAGS * XEN_NETIF_RX_RING_SIZE)
>>
>> -struct xenvif {
>> -	/* Unique identifier for this interface. */
>> -	domid_t          domid;
>> -	unsigned int     handle;
>> +/* Queue name is interface name with "-qNNN" appended */
>> +#define QUEUE_NAME_SIZE (IFNAMSIZ + 6)
>
> One more than necessary? Or does IFNAMSIZ not include the NULL? (I can't
> figure out if it does or not!)

interface.c contains the line:
snprintf(name, IFNAMSIZ - 1, "vif%u.%u", domid, handle);

This suggests that IFNAMSIZ counts the trailing NULL, so I can reduce
this count by 1 on that basis.

>
>> [...]
>> -	/* This array is allocated seperately as it is large */
>> -	struct gnttab_copy *grant_copy_op;
>> +	struct gnttab_copy grant_copy_op[MAX_GRANT_COPY_OPS];
>
> Is this deliberate? It seems like a retrograde step reverting parts of
> ac3d5ac27735 "xen-netback: fix guest-receive-side array sizes" from Paul
> (at least you are nuking a speeling erorr)

Yes, this was deliberate. These arrays were moved out to avoid problems
with kmalloc for the struct net_device (which contains the struct xenvif
in its netdev_priv space). Since the queues are now allocated via
vzalloc, there is no need to do separate allocations (with the
requirement to also separately free on every error/teardown path) so I
moved these back into the main queue structure.

>
> How does this series interact with Zoltan's foreign mapping one? Badly I
> should imagine, are you going to rebase?

I'm working on the rebase right now.

>
>> +	/* First, check if there is only one queue to optimise the
>> +	 * single-queue or old frontend scenario.
>> +	 */
>> +	if (vif->num_queues == 1) {
>> +		queue_index = 0;
>> +	} else {
>> +		/* Use skb_get_hash to obtain an L4 hash if available */
>> +		hash = skb_get_hash(skb);
>> +		queue_index = (u16) (((u64)hash * vif->num_queues) >> 32);
>
> No modulo num_queues here?
>
> Is the multiply and shift from some best practice somewhere? Or else
> what is it doing?

It seems to be what a bunch of other net drivers do in this scenario. I
guess the reasoning is it'll be faster than a mod num_queues.

>
>
>> +	/* Obtain the queue to be used to transmit this packet */
>> +	index = skb_get_queue_mapping(skb);
>> +	if (index >= vif->num_queues)
>> +		index = 0; /* Fall back to queue 0 if out of range */
>
> Is this actually allowed to happen?
>
> Even if yes, not modulo num_queue so spread it around a bit?

This probably isn't allowed to happen. I figured it didn't hurt to be a
little defensive with the code here, and falling back to queue 0 is a
fairly safe thing to do.

>>   static void xenvif_up(struct xenvif *vif)
>>   {
>> -	napi_enable(&vif->napi);
>> -	enable_irq(vif->tx_irq);
>> -	if (vif->tx_irq != vif->rx_irq)
>> -		enable_irq(vif->rx_irq);
>> -	xenvif_check_rx_xenvif(vif);
>> +	struct xenvif_queue *queue = NULL;
>> +	unsigned int queue_index;
>> +
>> +	for (queue_index = 0; queue_index < vif->num_queues; ++queue_index) {
>
> This vif->num_queues -- is it the same as dev->num_tx_queues? Or areew
> there differing concepts of queue around?

It should be the same as dev->real_num_tx_queues, which may be less than
dev->num_tx_queues.

>> +		queue = &vif->queues[queue_index];
>> +		napi_enable(&queue->napi);
>> +		enable_irq(queue->tx_irq);
>> +		if (queue->tx_irq != queue->rx_irq)
>> +			enable_irq(queue->rx_irq);
>> +		xenvif_check_rx_xenvif(queue);
>> +	}
>>   }
>>
>>   static void xenvif_down(struct xenvif *vif)
>>   {
>> -	napi_disable(&vif->napi);
>> -	disable_irq(vif->tx_irq);
>> -	if (vif->tx_irq != vif->rx_irq)
>> -		disable_irq(vif->rx_irq);
>> -	del_timer_sync(&vif->credit_timeout);
>> +	struct xenvif_queue *queue = NULL;
>> +	unsigned int queue_index;
>
> Why unsigned?
Why not? You can't have a negative number of queues. Zero indicates "I
don't have any set up yet". I'm not expecting people to have 4 billion
or so queues, but equally I can't see a valid use for negative values
here.

>
>> @@ -496,9 +497,30 @@ static void connect(struct backend_info *be)
>>   		return;
>>   	}
>>
>> -	xen_net_read_rate(dev, &be->vif->credit_bytes,
>> -			  &be->vif->credit_usec);
>> -	be->vif->remaining_credit = be->vif->credit_bytes;
>> +	xen_net_read_rate(dev, &credit_bytes, &credit_usec);
>> +	read_xenbus_vif_flags(be);
>> +
>> +	be->vif->num_queues = 1;
>> +	be->vif->queues = vzalloc(be->vif->num_queues *
>> +			sizeof(struct xenvif_queue));
>> +
>> +	for (queue_index = 0; queue_index < be->vif->num_queues; ++queue_index) {
>> +		queue = &be->vif->queues[queue_index];
>> +		queue->vif = be->vif;
>> +		queue->id = queue_index;
>> +		snprintf(queue->name, sizeof(queue->name), "%s-q%u",
>> +				be->vif->dev->name, queue->id);
>> +
>> +		xenvif_init_queue(queue);
>> +
>> +		queue->remaining_credit = credit_bytes;
>> +
>> +		err = connect_rings(be, queue);
>> +		if (err)
>> +			goto err;
>> +	}
>> +
>> +	xenvif_carrier_on(be->vif);
>>
>>   	unregister_hotplug_status_watch(be);
>>   	err = xenbus_watch_pathfmt(dev, &be->hotplug_status_watch,
>> @@ -507,18 +529,24 @@ static void connect(struct backend_info *be)
>>   	if (!err)
>>   		be->have_hotplug_status_watch = 1;
>>
>> -	netif_wake_queue(be->vif->dev);
>> +	netif_tx_wake_all_queues(be->vif->dev);
>> +
>> +	return;
>> +
>> +err:
>> +	vfree(be->vif->queues);
>> +	be->vif->queues = NULL;
>> +	be->vif->num_queues = 0;
>> +	return;
>
> Do you not need to unwind the setup already done on the previous queues
> before the failure?


Err... yes. I was sure that code existed at some point, but I can't find
it now. Oops!


-Andrew
>
>>   }
>>
>>
>> -static int connect_rings(struct backend_info *be)
>> +static int connect_rings(struct backend_info *be, struct xenvif_queue *queue)
>>   {
>> -	struct xenvif *vif = be->vif;
>>   	struct xenbus_device *dev = be->dev;
>>   	unsigned long tx_ring_ref, rx_ring_ref;
>> -	unsigned int tx_evtchn, rx_evtchn, rx_copy;
>> +	unsigned int tx_evtchn, rx_evtchn;
>>   	int err;
>> -	int val;
>>
>>   	err = xenbus_gather(XBT_NIL, dev->otherend,
>>   			    "tx-ring-ref", "%lu", &tx_ring_ref,
>> @@ -546,6 +574,27 @@ static int connect_rings(struct backend_info *be)
>>   		rx_evtchn = tx_evtchn;
>>   	}
>>
>> +	/* Map the shared frame, irq etc. */
>> +	err = xenvif_connect(queue, tx_ring_ref, rx_ring_ref,
>> +			     tx_evtchn, rx_evtchn);
>> +	if (err) {
>> +		xenbus_dev_fatal(dev, err,
>> +				 "mapping shared-frames %lu/%lu port tx %u rx %u",
>> +				 tx_ring_ref, rx_ring_ref,
>> +				 tx_evtchn, rx_evtchn);
>> +		return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>
>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ