[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <466DEF9D.9070509@trash.net>
Date: Tue, 12 Jun 2007 02:58:05 +0200
From: Patrick McHardy <kaber@...sh.net>
To: hadi@...erus.ca
CC: "Waskiewicz Jr, Peter P" <peter.p.waskiewicz.jr@...el.com>,
davem@...emloft.net, netdev@...r.kernel.org, jeff@...zik.org,
"Kok, Auke-jan H" <auke-jan.h.kok@...el.com>
Subject: Re: [PATCH] NET: Multiqueue network device support.
jamal wrote:
> On Mon, 2007-11-06 at 17:44 +0200, Patrick McHardy wrote:
>
>>At this point the qdisc might send new packets. What do you do when a
>>packet for a full ring arrives?
>>
>
>
> Hrm... ok, is this a trick question or i am missing the obvious?;->
> What is wrong with what any driver would do today - which is:
> netif_stop and return BUSY; core requeues the packet?
>
>
>>I see three choices:
>>
>>- drop it, even though its still within the qdiscs configured limits
>>- requeue it, which does not work because the qdisc is still active
>> and might just hand you the same packet over and over again in a
>> busy loop, until the ring has more room (which has the same worst
>> case, just that we're sitting in a busy loop now).
>>- requeue and stop the queue: we're back to where we started since
>> now higher priority packets will not get passed to the driver.
>
>
> Refer to choice #4 above.
Replying again so we can hopefully move forward soon. Your choice #4
is exactly what I proposed as choice number 3.
Let me repeat my example why it doesn't work (well) without multiple
queue states (with typos etc fixed) and describe the possibilities.
If you still disagree I hope you can just change my example to show
how it gets fixed. As a thank you I will actually understand that
your solution works as well :)
We have n empty HW queues served in ascending priority order
with a maximum length of m packets per queue:
[0] empty
[1] empty
[2] empty
..
[n-1] empty
Now we receive m - 1 packets for all priorities >= 1 and < n - 1,
so we have:
[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n-1] empty
Since no HW queue is completely full, the queue is still active.
Now we receive m packets of priority n - 1:
[0] empty
[1] m - 1 packets
[2] m - 1 packets
..
[n-2] m - 1 packets
[n-1] m packets
At this point the queue needs to be stopped since the highest
priority queue is entirely full. To start it again at least
one packet of queue n - 1 needs to be sent, which requires
that queues 1 to n - 2 are serviced first. So any prio 0 packet
arriving during this period will sit in the qdisc and will not
reach the device for up to the time for (n - 2) * (m - 1) + 1
full sized packet transmissions. With multiple queue states
we'd know that queue 0 can still take packets.
You agreed that this is a problem and instead of keeping the
queue stopped until all rings can take at least one packet
again you proposed:
> - let the driver shutdown whenever a ring is full. Remember which
> ring X shut it down.
> - when you get a tx interupt or prun tx descriptors, if a
> ring <= X has transmitted a packet (or threshold of packets),
> then wake up the driver (i.e open up).
At this point the queue is active, but at least one ring is already
full and the qdisc can still pass packets for it to the driver.
When this happens we can:
- drop it. This makes qdisc configured limit meaningless since
the qdisc can't anticipate when the packet will make it through
or get dropped.
- requeue it: this might result in a busy loop if the qdisc
decides to hand out the packet again. The loop will be
terminated once the ring has more room available an can
eat the packet, which has the same worst case behaviour
I described above.
- requeue (return BUSY) and stop the queue: thats what you
proposed as option #4. The question is when to wake the
queue again. Your suggestion was to wake it when some
other queue with equal or higher priority got dequeued.
Correcting my previous statement, you are correct that
this will fix the starvation of higher priority queues
because the qdisc has a chance to hand out either a packet
of the same priority or higher priority, but at the cost of
at worst (n - 1) * m unnecessary dequeues+requeues in case
there is only a packet of lowest priority and we need to
fully serve all higher priority HW queues before it can
actually be dequeued. The other possibility would be to
activate the queue again once all rings can take packets
again, but that wouldn't fix the problem, which you can
easily see if you go back to my example and assume we still
have a low priority packet within the qdisc when the lowest
priority ring fills up (and the queue is stopped), and after
we tried to wake it and stopped it again the higher priority
packet arrives.
Considering your proposal in combination with RR, you can see
the same problem of unnecessary dequeues+requeues. Since there
is no priority for waking the queue when a equal or higher
priority ring got dequeued as in the prio case, I presume you
would wake the queue whenever a packet was sent. For the RR
qdisc dequeue after requeue should hand out the same packet,
independantly of newly enqueued packets (which doesn't happen
and is a bug in Peter's RR version), so in the worst case the
HW has to make the entire round before a packet can get
dequeued in case the corresponding HW queue is full. This is
a bit better than prio, but still up to n - 1 unnecessary
requeues+dequeues. I think it can happen more often than
for prio though.
> The patches are trivial - really; as soon as Peter posts the e1000
> change for his version i should be able to cutnpaste and produce one
> that will work with what i am saying.
Forgetting about things like multiple qdisc locks and just
looking at queueing behaviour, the question seems to come
down to whether the unnecessary dequeues/requeues are acceptable
(which I don't think since they are easily avoidable). OTOH
you could turn it around and argue that the patches won't do
much harm since ripping them out again (modulo queue mapping)
should result in the same behaviour with just more overhead.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists