lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200817163222.opf576vyvapk4bqm@skbuf>
Date:   Mon, 17 Aug 2020 19:32:22 +0300
From:   Vladimir Oltean <olteanv@...il.com>
To:     Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org,
        UNGLinuxDriver@...rochip.com, Jakub Kicinski <kuba@...nel.org>
Subject: Re: devlink-sb on ocelot switches

So after some more fiddling, it looks like I got the diagram wrong.
Here's how the switch really consumes resources. 4 lookups in parallel,
they are ORed in 2 pairs (ingress with egress forms a pair), and the
result is ANDed. The consumptions for ingress and egress are really
completely independent.

                          Frame forwarding decision taken
                                       |
                                       |
                                       v
       +--------------------+--------------------+--------------------+
       |                    |                    |                    |
       v                    v                    v                    v
 Ingress memory       Egress memory        Ingress frame        Egress frame
     check                check           reference check      reference check
       |                    |                    |                    |
       v                    v                    v                    v
  BUF_Q_RSRV_I   ok    BUF_Q_RSRV_E   ok    REF_Q_RSRV_I   ok     REF_Q_RSRV_E   ok
(src port, prio) -+  (dst port, prio) -+  (src port, prio) -+   (dst port, prio) -+
       |          |         |          |         |          |         |           |
       | exceeded |         | exceeded |         | exceeded |         | exceeded  |
       |          |         |          |         |          |         |           |
       v          |         v          |         v          |         v           |
  BUF_P_RSRV_I  ok|    BUF_P_RSRV_E  ok|    REF_P_RSRV_I  ok|    REF_P_RSRV_E   ok|
   (src port) ----+     (dst port) ----+     (src port) ----+     (dst port) -----+
       |          |         |          |         |          |         |           |
       | exceeded |         | exceeded |         | exceeded |         | exceeded  |
       |          |         |          |         |          |         |           |
       v          |         v          |         v          |         v           |
 BUF_PRIO_SHR_I ok|   BUF_PRIO_SHR_E ok|   REF_PRIO_SHR_I ok|   REF_PRIO_SHR_E  ok|
     (prio) ------+       (prio) ------+       (prio) ------+       (prio) -------+
       |          |         |          |         |          |         |           |
       | exceeded |         | exceeded |         | exceeded |         | exceeded  |
       |          |         |          |         |          |         |           |
       v          |         v          |         v          |         v           |
 BUF_COL_SHR_I  ok|   BUF_COL_SHR_E  ok|   REF_COL_SHR_I  ok|   REF_COL_SHR_E   ok|
      (dp) -------+        (dp) -------+        (dp) -------+        (dp) --------+
       |          |         |          |         |          |         |           |
       | exceeded |         | exceeded |         | exceeded |         | exceeded  |
       |          |         |          |         |          |         |           |
       v          v         v          v         v          v         v           v
      fail     success     fail     success     fail     success     fail      success
       |          |         |          |         |          |         |           |
       v          v         v          v         v          v         v           v
       +-----+----+         +-----+----+         +-----+----+         +-----+-----+
             |                    |                    |                    |
             +-------> OR <-------+                    +-------> OR <-------+
                        |                                        |
                        v                                        v
                        +----------------> AND <-----------------+
                                            |
                                            v
                                    FIFO drop / accept

Something which isn't explicitly said in devlink-sb is whether a pool
bound to a port-TC is allowed to spill over into the port pool. And
whether the port pool, in turn, is allowed to spill over into something
else (a shared pool)?

If they are, then I could expose BUF_P_RSRV_I (buffer reservation per
ingress port) as the threshold of the port pool, BUF_Q_RSRV_I and
BUF_Q_RSRV_E (buffer reservations per QoS class of ingress, and egress,
ports) as port-TC pools, and I could implicitly configure the remaining
sharing watermarks to consume the rest of the memory available in the
pool.

But by looking at some of the selftests, I don't see any clear
indication of a test where the occupancy of the port-TC exceeds the size
of that pool, and what should happen in that case.  Just a vague hint,
in tools/testing/selftests/drivers/net/mlxsw/sch_ets.sh, that once the
port-TC pool threshold has been exceeded, the excess should be simply
dropped:

	# Set the ingress quota high and use the three egress TCs to limit the
	# amount of traffic that is admitted to the shared buffers. This makes
	# sure that there is always enough traffic of all types to select from
	# for the DWRR process.
	devlink_port_pool_th_set $swp1 0 12
	devlink_tc_bind_pool_th_set $swp1 0 ingress 0 12
	devlink_port_pool_th_set $swp2 4 12
	devlink_tc_bind_pool_th_set $swp2 7 egress 4 5
	devlink_tc_bind_pool_th_set $swp2 6 egress 4 5
	devlink_tc_bind_pool_th_set $swp2 5 egress 4 5

So I'm guessing that this is not the same behavior as in ocelot. But,
truth be told, it doesn't really help either that nfp and mlxsw are
simply passing these parameters to firmware, not really giving any
insight into how they are interpreted.

Would it be simpler if I just exposed these watermarks as generic
devlink resources? Although in a way that would be a wasted opportunity
for devlink-sb. I also don't think I can monitor occupancy if I model
them as generic resources.

Am I missing something?

Thanks,
-Vladimir

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ