[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <D5C1322C3E673F459512FB59E0DDC329029E454C@orsmsx414.amr.corp.intel.com>
Date: Tue, 10 Apr 2007 09:27:17 -0700
From: "Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>
To: "Patrick McHardy" <kaber@...sh.net>
Cc: <davem@...emloft.net>, <netdev@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, <jgarzik@...ox.com>,
"cramerj" <cramerj@...el.com>,
"Kok, Auke-jan H" <auke-jan.h.kok@...el.com>,
"Leech, Christopher" <christopher.leech@...el.com>
Subject: RE: [PATCH] NET: [UPDATED] Multiqueue network device support implementation.
> Peter P Waskiewicz Jr wrote:
> > + /* To retrieve statistics per subqueue - FOR FUTURE USE */
> > + struct net_device_stats* (*get_subqueue_stats)(struct
> net_device *dev,
> > + int
> queue_index);
>
>
> Please no future use stuff, just add it when you need it.
Gotcha. I'll remove this.
>
> > diff --git a/net/core/dev.c b/net/core/dev.c index 219a57f..c11c8fa
> > 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3326,12 +3328,23 @@ struct net_device *alloc_netdev(int
> sizeof_priv, const char *name,
> > if (sizeof_priv)
> > dev->priv = netdev_priv(dev);
> >
> > + alloc_size = (sizeof(struct net_device_subqueue) * queue_count);
> > +
> > + p = kzalloc(alloc_size, GFP_KERNEL);
> > + if (!p) {
> > + printk(KERN_ERR "alloc_netdev: Unable to
> allocate queues.\n");
> > + return NULL;
>
>
> This leaks the device. You treat every single-queue device as
> having a single subqueue. If it doesn't get too ugly it would
> be nice to avoid this and only allocate the subqueue states
> for real multiqueue devices.
We went back and forth on this. The reason we allocate a queue in every
case, even on single-queue devices, was to make the stack not have
branching for multiqueue and non-multiqueue devices. If we don't have
at least one queue on a device, then we can't have
netif_subqueue_stopped() in the hotpath unless we check if a device is
multiqueue before. The original patches I released had this branching,
and I was asked to not do that. I'd also like to see all queue-related
stuff be pulled from net_device and put into net_device_subqueue at some
point, even for single-queue devices. Thoughts?
>
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -133,7 +133,8 @@ static inline int qdisc_restart(struct
> net_device *dev)
> > /* And release queue */
> > spin_unlock(&dev->queue_lock);
> >
> > - if (!netif_queue_stopped(dev)) {
> > + if (!netif_queue_stopped(dev) &&
> > + !netif_subqueue_stopped(dev,
> skb->queue_mapping)) {
> > int ret;
> >
> > ret = dev_hard_start_xmit(skb,
> dev); @@ -149,7 +150,6 @@ static
> > inline int qdisc_restart(struct net_device *dev)
> > goto collision;
> > }
> > }
> > -
>
>
> Unrelated whitespace change.
I'll fix that.
>
> > /* NETDEV_TX_BUSY - we need to requeue */
> > /* Release the driver */
> > if (!nolock) {
> > diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c index
> > 5cfe60b..7365621 100644
> > --- a/net/sched/sch_prio.c
> > +++ b/net/sched/sch_prio.c
> > @@ -43,6 +43,7 @@ struct prio_sched_data
> > struct tcf_proto *filter_list;
> > u8 prio2band[TC_PRIO_MAX+1];
> > struct Qdisc *queues[TCQ_PRIO_BANDS];
> > + u16 band2queue[TC_PRIO_MAX + 1];
> > };
> >
> >
> > @@ -63,20 +64,26 @@ prio_classify(struct sk_buff *skb,
> struct Qdisc *sch, int *qerr)
> > case TC_ACT_SHOT:
> > return NULL;
> > };
> > -
>
> Same here
I'll fix that too.
>
> > if (!q->filter_list ) {
> > #else
> > if (!q->filter_list || tc_classify(skb,
> q->filter_list, &res)) {
> > #endif
> > if (TC_H_MAJ(band))
> > band = 0;
> > + skb->queue_mapping =
> > +
> q->prio2band[q->band2queue[band&TC_PRIO_MAX]];
> > +
>
>
> Does this needs to be cleared at some point again? TC actions
> might redirect or mirror packets to other (multiqueue) devices.
If an skb is redirected to another device, the skb should be filtered
through that device's qdisc, yes?
>
> > @@ -242,6 +259,30 @@ static int prio_tune(struct Qdisc
> *sch, struct rtattr *opt)
> > }
> > }
> > }
> > + /* setup queue to band mapping */
> > + if (q->bands < sch->dev->egress_subqueue_count) {
> > + qmapoffset = 1;
> > + mod = 0;
> > + } else {
> > + mod = q->bands % sch->dev->egress_subqueue_count;
> > + qmapoffset = q->bands /
> sch->dev->egress_subqueue_count +
> > + ((mod) ? 1 : 0);
> > + }
> > +
> > + queue = 0;
> > + offset = 0;
> > + for (i = 0; i < q->bands; i++) {
> > + q->band2queue[i] = queue;
> > + if ( ((i + 1) - offset) == qmapoffset) {
> > + queue++;
> > + offset += qmapoffset;
> > + if (mod)
> > + mod--;
> > + qmapoffset = q->bands /
> > + sch->dev->egress_subqueue_count +
> > + ((mod) ? 1 : 0);
> > + }
> > + }
>
>
> Besides being quite ugly, I don't think this does what you want.
> For bands < queues we get band2queue[0] = 0, all others map to 1.
>
I see what's wrong with this. If I removed "mod = 0;", the algorithm
works as intended. We also did go through a number of iterations of the
best way to map queues to bands, using a simple hash, a complex
assigment, etc., and this algorithm gives the best distribution given
any combination of queues / bands (minus that small bug). I'll fix this
and resubmit.
Thanks for the feedback,
-PJ Waskiewicz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists