[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200817163222.opf576vyvapk4bqm@skbuf>
Date: Mon, 17 Aug 2020 19:32:22 +0300
From: Vladimir Oltean <olteanv@...il.com>
To: Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
UNGLinuxDriver@...rochip.com, Jakub Kicinski <kuba@...nel.org>
Subject: Re: devlink-sb on ocelot switches
So after some more fiddling, it looks like I got the diagram wrong.
Here's how the switch really consumes resources. 4 lookups in parallel,
they are ORed in 2 pairs (ingress with egress forms a pair), and the
result is ANDed. The consumptions for ingress and egress are really
completely independent.
Frame forwarding decision taken
|
|
v
+--------------------+--------------------+--------------------+
| | | |
v v v v
Ingress memory Egress memory Ingress frame Egress frame
check check reference check reference check
| | | |
v v v v
BUF_Q_RSRV_I ok BUF_Q_RSRV_E ok REF_Q_RSRV_I ok REF_Q_RSRV_E ok
(src port, prio) -+ (dst port, prio) -+ (src port, prio) -+ (dst port, prio) -+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_P_RSRV_I ok| BUF_P_RSRV_E ok| REF_P_RSRV_I ok| REF_P_RSRV_E ok|
(src port) ----+ (dst port) ----+ (src port) ----+ (dst port) -----+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_PRIO_SHR_I ok| BUF_PRIO_SHR_E ok| REF_PRIO_SHR_I ok| REF_PRIO_SHR_E ok|
(prio) ------+ (prio) ------+ (prio) ------+ (prio) -------+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v | v | v | v |
BUF_COL_SHR_I ok| BUF_COL_SHR_E ok| REF_COL_SHR_I ok| REF_COL_SHR_E ok|
(dp) -------+ (dp) -------+ (dp) -------+ (dp) --------+
| | | | | | | |
| exceeded | | exceeded | | exceeded | | exceeded |
| | | | | | | |
v v v v v v v v
fail success fail success fail success fail success
| | | | | | | |
v v v v v v v v
+-----+----+ +-----+----+ +-----+----+ +-----+-----+
| | | |
+-------> OR <-------+ +-------> OR <-------+
| |
v v
+----------------> AND <-----------------+
|
v
FIFO drop / accept
Something which isn't explicitly said in devlink-sb is whether a pool
bound to a port-TC is allowed to spill over into the port pool. And
whether the port pool, in turn, is allowed to spill over into something
else (a shared pool)?
If they are, then I could expose BUF_P_RSRV_I (buffer reservation per
ingress port) as the threshold of the port pool, BUF_Q_RSRV_I and
BUF_Q_RSRV_E (buffer reservations per QoS class of ingress, and egress,
ports) as port-TC pools, and I could implicitly configure the remaining
sharing watermarks to consume the rest of the memory available in the
pool.
But by looking at some of the selftests, I don't see any clear
indication of a test where the occupancy of the port-TC exceeds the size
of that pool, and what should happen in that case. Just a vague hint,
in tools/testing/selftests/drivers/net/mlxsw/sch_ets.sh, that once the
port-TC pool threshold has been exceeded, the excess should be simply
dropped:
# Set the ingress quota high and use the three egress TCs to limit the
# amount of traffic that is admitted to the shared buffers. This makes
# sure that there is always enough traffic of all types to select from
# for the DWRR process.
devlink_port_pool_th_set $swp1 0 12
devlink_tc_bind_pool_th_set $swp1 0 ingress 0 12
devlink_port_pool_th_set $swp2 4 12
devlink_tc_bind_pool_th_set $swp2 7 egress 4 5
devlink_tc_bind_pool_th_set $swp2 6 egress 4 5
devlink_tc_bind_pool_th_set $swp2 5 egress 4 5
So I'm guessing that this is not the same behavior as in ocelot. But,
truth be told, it doesn't really help either that nfp and mlxsw are
simply passing these parameters to firmware, not really giving any
insight into how they are interpreted.
Would it be simpler if I just exposed these watermarks as generic
devlink resources? Although in a way that would be a wasted opportunity
for devlink-sb. I also don't think I can monitor occupancy if I model
them as generic resources.
Am I missing something?
Thanks,
-Vladimir
Powered by blists - more mailing lists