lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240708195401.5ef3f016@kernel.org>
Date: Mon, 8 Jul 2024 19:54:01 -0700
From: Jakub Kicinski <kuba@...nel.org>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
 Simon Horman <horms@...nel.org>, John Fastabend <john.fastabend@...il.com>,
 Sunil Kovvuri Goutham <sgoutham@...vell.com>, Jamal Hadi Salim
 <jhs@...atatu.com>
Subject: Re: [PATCH net-next 1/5] netlink: spec: add shaper YAML spec

On Mon, 08 Jul 2024 21:42:00 +0200 Paolo Abeni wrote:
> > To judge whether it's an over-design we'd need to know what the user
> > scenarios are, those are not included in the series AFAICS.  
> 
> My bad, in the cover I referred to the prior discussions without
> explicitly quoting the contents.
> 
> The initial goal here was to allow the user-space to configure per-
> queue, H/W offloaded, TX shaping.

Per-queue Tx shaping is already supported.

I mean "user scenarios" in the agile programming sense as in something
that gets closer to production use cases. "Set rate limit on a queue"
is a suggested solution not a statement of a problem.

> That later evolved in introducing an in-kernel H/W offload TX shaping
> API capable of replacing the many existing similar in-kernel APIs and
> supporting the foreseeable H/W capabilities.
> 
> > To be blunt - what I'm getting at is that the API mirrors Intel's FW
> > API with an extra kludge to support the DSA H/W - in the end matching
> > neither what the DSA wants nor what Intel can do.  
> 
> The API is similar to Intel’s FW API because to my understanding the
> underlying design - an arbitrary tree - is the most complete
> representation possible for shaping H/W. AFAICT is also similar to what
> other NIC vendors’ offer.
> 
> I don’t see why the APIs don’t match what Intel can do, could you
> please elaborate on that?

That's not the main point I was making, I was complaining about how 
the "extension" to support DSA HW was bolted onto this API.

Undeniably the implementation must be stored as a tree (with some
max depth). That doesn't imply that, for example, arbitrary re-parenting
of non-leaf nodes is an operation that makes sense for all devices.
From memory devlink rate doesn't allow mixing some node types after one
parent, too.

> > IOW I'm trying to explore whether we can find a language of
> > transformations which will be more complex than single micro-operations
> > on the tree, but sufficiently expressive to provide atomic
> > transformations without transactions of micro-ops.  
> 
> I personally find it straight-forward describing the scenario you
> proposed in terms of the simple operations as allowed by this series. I
> also think it’s easier to build arbitrarily complex scenarios in terms
> of simple operations instead of trying to put enough complexity in the
> language to describe everything. It will easily lose flexibility or
> increase complexity for unclear gain. Why do you think that would be a
> better approach?

I strongly dislike the operation grouping/batching as it stands in v1.
Do you think that part of the design is clean?

I also disagree with the assertion that having a language of
transformations more advanced that "add/move/delete" increases
complexity. Language gives you properties you can reason about.
That's why rbtree, B-tree and other algos define a language of
transformations. Naive ops make arbitrary trees easy but I put
it to you that allowing arbitrary trees without any enforced 
invariants will breed far more complexity and bugs than a properly
designed language :)


The primary operation of interest is, in fact: given a set of resources
(queues or netdevs) which currently feed one mux node - build 
a sub-hierarchy.

Use case 1 - container b/w sharing. Container manager will want to
group a set of queues, and feed a higher layer RR node (so that number
of queues doesn't impact load sharing).

Say we have two containers (c1, c2 represent queues assigned to them);
before container 3 starts:

c1 - \
c1 -  >RR
c1 - /    > RR(root)
c2 - \ RR
c2 - /

allocate 2 queues:

c1 - \
c1 - ->RR
c1 - /    > RR(root)
c2 - \ RR
c2 - /     '
       qX /
       qY /

hierarchize ("group([qX, qY], type="rr")):

c1 - \
c1 -  >RR
c1 - /    \ 
c2 - \ RR -> RR(root)
c2 - /    /
c3 - \ RR 
c3 - /

The container manager just wants so say "take the new queues (X, Y),
put them under an RR node". If the language is build around creating
mux nodes - that's a single call.

Note that the RR(root) node is implicit (in your API it's not visible
but it is implied).

Users case 2 - delegation - the neat thing about using such construct is
that as you can see we never referred to output, i.e. RR(root).
The output can be implied by whatever node the queues already output to.
So say container 1 (c1) wants to set a b/w limit on two of it's queues
(let's call its queues A B C):

 rr_id = group([B C], type="rr")
 rate_limit(rr_id, 1Gbps)

and end up with:

cA ----- \
cB - \RR*/ RR
cC - /        \ 
    c2 - \ RR -> RR(root)
    c2 - /    /
    c3 - \ RR 
    c3 - /

* new node, also has rate limit set

You don't have to worry about parentage permissions. Container can only
add nodes (which is always safe) or delete nodes (and we can trivially
enforce it only deletes node it has created itself).

> Also the simple building blocks approach is IMHO closer to the original
> use-case.
> 
> Are there any other reasons for atomic operations, beyond addressing
> low-end H/W?

I think atomic changes are convenient to match what the user wants to
do. And the second use case is that I do believe there's a real need
to allow uncoordinated agents to modify sections on the hierarchy.

Neither of those are hard requirements, but I think any application
driven requirement should come before "that's the FW API for vendor X"
I hope this we can agree on.


Thinking about it (for longer than I care to admit), one concern I have
about the "mux creation" API I described above is that it forces
existence of leaf and non-leaf nodes at the same parent, at least
transiently.

Can we go back to an API with an explicit create/modify/delete?
All we need for Andrew's use case, I believe, is to be able to
"somewhat atomically" move leaf nodes.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ