netdev - Re: [PATCH net-next 1/5] netlink: spec: add shaper YAML spec

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <2e4bf0dcffe51a7bc4d427e33f132a99ceac8d8a.camel@redhat.com>
Date: Wed, 03 Jul 2024 16:53:38 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, Jiri Pirko <jiri@...nulli.us>, Madhu Chittim
 <madhu.chittim@...el.com>, Sridhar Samudrala <sridhar.samudrala@...el.com>,
  Simon Horman <horms@...nel.org>, John Fastabend
 <john.fastabend@...il.com>, Sunil Kovvuri Goutham <sgoutham@...vell.com>,
 Jamal Hadi Salim <jhs@...atatu.com>
Subject: Re: [PATCH net-next 1/5] netlink: spec: add shaper YAML spec

On Tue, 2024-07-02 at 14:08 -0700, Jakub Kicinski wrote:
> Not sure if dropping CCs was on purpose ?

Nope, just PEBKAC here. Re-adding them and include your reply verbatim,
to avoid losing track on the ML

> On Tue, 2 Jul 2024 21:49:09 +0200 Paolo Abeni wrote:
> > On Tue, Jul 2, 2024 at 5:05 PM Jakub Kicinski <kuba@...nel.org> wrote:
> > > On Tue, 02 Jul 2024 16:21:38 +0200 Paolo Abeni wrote:  
> > > > Yes, it does not cover deletion _and_ update/add/move within the same
> > > > atomic operation.
> > > > 
> > > > Still any configuration could be reached from default/initial state
> > > > with set(<possibly many shapers>). Additionally, given any arbitrary
> > > > configuration, the default/initial state could be restored with a
> > > > single delete(<possibly many handlers>).  
> > > 
> > > From:
> > > 
> > > q0 -. RR \
> > > q1 /      > SP
> > > q2 -. RR /
> > > q3 /  
> > 
> > Call this C1
> > 
> > > To:
> > > 
> > > q0 ------\
> > > q1 -------> SP
> > > q2 -. RR /
> > > q3 /  
> > 
> > Call this C2
> > 
> > > You have to both delete an RR node, and set SP params on Q0 and Q1.  
> > 
> > default -> C1:
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do set --json '{ "ifindex":2, "shapers": [ \
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 1 }, "priority": 1 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 2 }, "priority": 2 },
> >                          { "parent": { "scope": "detached", "id":1},
> > "handle": { "scope": "queue", "id": 1 }, "weight": 1 },
> >                          { "parent": { "scope": "detached", "id":1},
> > "handle": { "scope": "queue", "id": 2 }, "weight": 2 },
> >                          { "parent": { "scope": "detached" "id":2},
> > "handle": { "scope": "queue", "id": 3 }, "weight": 1 },
> >                          { "parent": { "scope": "detached" "id":2},
> > "handle": { "scope": "queue", "id": 4 }, "weight": 2 }]}
> > C1 -> C2:
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do delete --json '{ "ifindex":2, "handles": [ \
> >                          { "scope": "queue", "id": 1 },
> >                          { "scope": "queue", "id": 2 },
> >                          { "scope": "queue", "id": 3 },
> >                          { "scope": "queue", "id": 4 },
> >                          {  "scope": "detached", "id": 1 },
> >                          {  "scope": "detached", "id": 2 }]}
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do set --json '{ "ifindex":2, "shapers": [ \
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 1 }, "priority": 1 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "queue", "id": 1 }, "priority": 2 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "queue", "id": 2 }, "priorirty": 3 },
> >                          { "parent": { "scope": "detached" "id":1},
> > "handle": { "scope": "queue", "id": 3 }, "weight": 1 },
> >                          { "parent": { "scope": "detached" "id":1},
> > "handle": { "scope": "queue", "id": 4 }, "weight": 2 },
> > 
> > The goal is to allow the system to reach the final configuration, even
> > with the assumption the H/W does not support any configuration except
> > the starting one and the final one.
> > But the infra requires that the system _must_ support at least a 3rd
> > configuration, the default one.
> 
> This assumes there is a single daemon responsible for control of all of
> the shaping, which I think the endless BPF multi-program back and forth
> proves to be incorrect.
> 
> We'd also lose stats each time, and make reconfiguration disruptive to
> running workloads.

Note there is no stats support for the shapers, nor is planned, nor the
H/W I know of have any support for it.

The destructive operations will be needed only when the configuration
change is inherently destructive. Nothing prevents the user-space to
push a direct configuration change when possible - which should be the
most frequent case, in practice.

Regarding the entity responsible for control, I had in mind a single
one, yes. I read the above as you are looking forward to e.g. different
applications configuring their own shaping accessing directly the NL
interface, am I correct? Why can’t such applications talk to that
daemon, instead? 

Anyway different applications must touch disjoint resources (e.g.
disjoint queues sets) right? In such a case even multiple destructive
configuration changes (on disjoint resources set) will not be
problematic.

Still if we want to allow somewhat consistent, concurrent, destructive
configuration changes on shared resource (why? It sounds a bit of
overdesign at this point), we could extend the set() operation to
additional support shapers deletion, e.g. adding an additional ‘delete’
flag attribute to the ‘handle’ sub-set.

> > > > The above covers any possible limitation enforced by the H/W, not just
> > > > the DSA use-case.
> > > > 
> > > > Do you have a strong feeling for atomic transactions from any arbitrary
> > > > state towards any other? If so, I’d like to understand why?  
> > > 
> > > I don't believe this is covers all cases.  
> > 
> > Given any pair of configurations C1, C2 we can reach C2 via C1 ->
> > default, default -> C2. respecting any H/W constraint.
> > 
> > > > Dealing with transactions allowing arbitrary any state <> any state
> > > > atomic changes will involve some complex logic that seems better
> > > > assigned to user-space.  
> > > 
> > > Complex logic in which part of the code?  
> > 
> > IIRC in a past iteration you pointed out that the complexity of
> > computing the delta between 2 arbitrary configurations is
> > significantly higher than going through the empty/default
> > configuration.
> 
> For a fixed-layout scheduler where HW blocks have a fixed mapping 
> with user's hierarchy - it's easier to program the shapers from 
> the hierarchy directly. Since node maps to the same set of registers
> reprogramming will be writing to a register a value it already has
> - a noop. That's different than doing a full reset and reprogram 
> each time, as two separate calls from user space.
> 
> For Intel's cases OTOH, when each command is a separate FW call
> we can't actually enforce the atomic transitions, right?
> Your code seems to handle returns smaller the number of commands,
> from which I infer that we may execute half of the modification?

Yes, the code and the NL API allows the NIC to do the update
incrementally, and AFAICS Intel ICE has no support for complex
transactions.
Somewhat enforcing atomic transitions will be quite complex at best and
is not need to accomplish the stated goal - allow reconfiguration even
when the H/W does not support intermediate states.

Do we need to enforce atomicity? Why? NFT has proven that a
transational model implementation is hard, and should be avoided if
possible. 

> IOW for Andrew's HW - he'd probably prefer to look at the resulting
> tree, no matter what previous state we were in. For Intel we _can't_
> support atomic commands, if they span multiple cycles of FW exchanges?

My understanding is that we can’t have atomic updates on Intel, from
firmare perspective. As said, I don’t think it’s necessary to support
them.

WRT the DSA H/W, do you mean the core should always set() the whole
known tree to the driver, regardless of the specific changes asked by
the user-space? If so, what about letting the driver expose some
capability (or private flag) asking the core for such behavior? So that
the driver will do the largish set() only with the H/W requiring that.

Anyway I'm not sure the mentioned DSA H/W would benefit from always
receiving the whole configuration. e.g. changing the weight for a
single queue shaper would not need the whole data-set.

> > In any case I think that the larger complexity to implement a full
> > transactional model. nft had proven that to be very hard and bug
> > prone. I really would avoid that option, if possible.
> 
> Maybe instead of discussing the user space API it'd be more beneficial
> to figure out a solid way of translating the existing APIs into the new
> model?

Could you please rephrase? I think all the arguments discussed here are
related to the model - at some point that impact the user space API,
too.

Thanks,

Paolo