[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180803185241.4ac0d1e5@epycfail>
Date: Fri, 3 Aug 2018 18:52:41 +0200
From: Stefano Brivio <sbrivio@...hat.com>
To: Ben Pfaff <blp@....org>
Cc: Matteo Croce <mcroce@...hat.com>,
Pravin B Shelar <pshelar@....org>, jpettit@...are.com,
gvrose8192@...il.com, netdev <netdev@...r.kernel.org>,
dev@...nvswitch.org, Jiri Benc <jbenc@...hat.com>,
Aaron Conole <aconole@...hat.com>
Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in
per-port round-robin order
Hi Ben,
On Tue, 31 Jul 2018 15:06:57 -0700
Ben Pfaff <blp@....org> wrote:
> This is an awkward problem to try to solve with sockets because of the
> nature of sockets, which are strictly first-in first-out. What you
> really want is something closer to the algorithm that we use in
> ovs-vswitchd to send packets to an OpenFlow controller. When the
> channel becomes congested, then for each packet to be sent to the
> controller, OVS appends it to a queue associated with its input port.
> (This could be done on a more granular basis than just port.) If the
> maximum amount of queued packets is reached, then OVS discards a packet
> from the longest queue. When space becomes available in the channel,
> OVS round-robins through the queues to send a packet. This achieves
> pretty good fairness but it can't be done with sockets because you can't
> drop a packet that is already queued to one.
Thanks for your feedback. What you describe is, though, functionally
equivalent to what this patch does, minus the per-port queueing limit.
However, instead of having one explicit queue for each port, and
then fetching packets in a round-robin fashion from all the queues, we
implemented this with a single queue and choose insertion points while
queueing in such a way that the result is equivalent. This way, we
avoid the massive overhead associated with having one queue per each
port (we can have up to 2^16 ports), and cycling over them.
Let's say we have two ports, A and B, and three upcalls are sent for
each port. If we implement one queue for each port as you described, we
end up with this:
.---------------- - - -
| A1 | A2 | A3 |
'---------------- - - -
.---------------- - - -
| B1 | B2 | B3 |
'---------------- - - -
and then send upcalls in this order: A1, B1, A2, B2, A3, B3.
What we are doing here with a single queue is inserting the upcalls
directly in this order:
.------------------------------- - - -
| A1 | B1 | A2 | B2 | A3 | B3 |
'------------------------------- - - -
and dequeueing from the head.
About the per-port queueing limit: we currently have a global one
(UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by
implementation constraints in our case:
if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
err = -ENOSPC;
goto out_clear;
}
but we can easily swap that U8_MAX - 1 with another macro or a
configurable value, if there's any value in doing that.
> My current thought is that any fairness scheme we implement directly in
> the kernel is going to need to evolve over time. Maybe we could do
> something flexible with BPF and maps, instead of hard-coding it.
Honestly, I fail to see what else we might want to do here, other than
adding a simple mechanism for fairness, to solve the specific issue at
hand. Flexibility would probably come at a higher cost. We could easily
make limits configurable if needed. Do you have anything else in mind?
--
Stefano
Powered by blists - more mailing lists