netdev - Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in per-port round-robin order

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 3 Aug 2018 18:52:41 +0200
From:   Stefano Brivio <sbrivio@...hat.com>
To:     Ben Pfaff <blp@....org>
Cc:     Matteo Croce <mcroce@...hat.com>,
        Pravin B Shelar <pshelar@....org>, jpettit@...are.com,
        gvrose8192@...il.com, netdev <netdev@...r.kernel.org>,
        dev@...nvswitch.org, Jiri Benc <jbenc@...hat.com>,
        Aaron Conole <aconole@...hat.com>
Subject: Re: [PATCH RFC net-next] openvswitch: Queue upcalls to userspace in
 per-port round-robin order

Hi Ben,

On Tue, 31 Jul 2018 15:06:57 -0700
Ben Pfaff <blp@....org> wrote:

> This is an awkward problem to try to solve with sockets because of the
> nature of sockets, which are strictly first-in first-out.  What you
> really want is something closer to the algorithm that we use in
> ovs-vswitchd to send packets to an OpenFlow controller.  When the
> channel becomes congested, then for each packet to be sent to the
> controller, OVS appends it to a queue associated with its input port.
> (This could be done on a more granular basis than just port.)  If the
> maximum amount of queued packets is reached, then OVS discards a packet
> from the longest queue.  When space becomes available in the channel,
> OVS round-robins through the queues to send a packet.  This achieves
> pretty good fairness but it can't be done with sockets because you can't
> drop a packet that is already queued to one.

Thanks for your feedback. What you describe is, though, functionally
equivalent to what this patch does, minus the per-port queueing limit.

However, instead of having one explicit queue for each port, and
then fetching packets in a round-robin fashion from all the queues, we
implemented this with a single queue and choose insertion points while
queueing in such a way that the result is equivalent. This way, we
avoid the massive overhead associated with having one queue per each
port (we can have up to 2^16 ports), and cycling over them.

Let's say we have two ports, A and B, and three upcalls are sent for
each port. If we implement one queue for each port as you described, we
end up with this:

.---------------- - - -
| A1 | A2 | A3 |
'---------------- - - -

.---------------- - - -
| B1 | B2 | B3 |
'---------------- - - -

and then send upcalls in this order: A1, B1, A2, B2, A3, B3.

What we are doing here with a single queue is inserting the upcalls
directly in this order:

.------------------------------- - - -
| A1 | B1 | A2 | B2 | A3 | B3 |
'------------------------------- - - -

and dequeueing from the head.

About the per-port queueing limit: we currently have a global one
(UPCALL_QUEUE_MAX_LEN), while the per-port limit is simply given by
implementation constraints in our case:

		if (dp->upcalls.count[pos->port_no] == U8_MAX - 1) {
			err = -ENOSPC;
			goto out_clear;
		}

but we can easily swap that U8_MAX - 1 with another macro or a
configurable value, if there's any value in doing that.

> My current thought is that any fairness scheme we implement directly in
> the kernel is going to need to evolve over time.  Maybe we could do
> something flexible with BPF and maps, instead of hard-coding it.

Honestly, I fail to see what else we might want to do here, other than
adding a simple mechanism for fairness, to solve the specific issue at
hand. Flexibility would probably come at a higher cost. We could easily
make limits configurable if needed. Do you have anything else in mind?

-- 
Stefano