netdev - RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 11 Sep 2019 00:45:11 +0000
From:   "Gomes, Vinicius" <vinicius.gomes@...el.com>
To:     Vladimir Oltean <olteanv@...il.com>
CC:     David Miller <davem@...emloft.net>,
        "f.fainelli@...il.com" <f.fainelli@...il.com>,
        "vivien.didelot@...il.com" <vivien.didelot@...il.com>,
        "andrew@...n.ch" <andrew@...n.ch>,
        "Patel, Vedang" <vedang.patel@...el.com>,
        "richardcochran@...il.com" <richardcochran@...il.com>,
        "Voon, Weifeng" <weifeng.voon@...el.com>,
        "jiri@...lanox.com" <jiri@...lanox.com>,
        "m-karicheri2@...com" <m-karicheri2@...com>,
        "Jose.Abreu@...opsys.com" <Jose.Abreu@...opsys.com>,
        "ilias.apalodimas@...aro.org" <ilias.apalodimas@...aro.org>,
        "jhs@...atatu.com" <jhs@...atatu.com>,
        "xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
        "kurt.kanzenbach@...utronix.de" <kurt.kanzenbach@...utronix.de>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH v1 net-next 00/15] tc-taprio offload for SJA1105 DSA

Hi Vladimir,

[...]

> 
> I'll make sure this subtlety is more clearly formulated in the next version of the
> patch.
> 

Ack.

> Actually let me ask you a few questions as well:
> 
> - I'm trying to understand what is the correct use of the tc-mqprio "queues"
> argument. I've only tested it with "1@0 1@1 1@2 1@3 1@4 1@5
> 1@6 1@7", which I believe is equivalent to not specifying it at all? I believe it
> should be interpreted as: "allocate this many netdev queues for each traffic
> class", where "traffic class" means a group of queues having the same priority
> (equal to the traffic class's number), but engaged in a strict priority scheme with
> other groups of queues (traffic classes). Right?

Specifying the "queues" is mandatory, IIRC. Yeah, your reading of those arguments
for you example matches mine.

So you mean, that you only tested situations when only one queue is "open" at a time?
I think this is another good thing to test.

> 
> - DSA can only formally support multi-queue, because its connection to the Linux
> host is through an Ethernet MAC (FIFO). Even if the DSA master netdevice may
> be multi-queue, allocating and separating those queues for each front-panel
> switch port is a task best left to the user/administrator. This means that DSA
> should reject all other "queues" mappings except the trivial one I pointed to
> above?
> 
> - I'm looking at the "tc_mask_to_queue_mask" function that I'm carrying along
> from your initial offload RFC. Are you sure this is the right approach? I don't feel
> a need to translate from traffic class to netdev queues, considering that in the
> general case, a traffic class is a group of queues, and 802.1Qbv doesn't really
> specify that you can gate individual queues from a traffic class. In the software
> implementation you are only looking at netdev_get_prio_tc_map, which is not
> equivalent as far as my understanding goes, but saner.
> Actually 802.1Q-2018 does not really clarify this either. It looks to me like they
> use the term "queue" and "traffic class" interchangeably.
> See two examples below (emphasis mine):

I spent quite a long time thinking about this, still not sure that I got it right. Let me begin
with the objective for that "translation". Scheduled traffic only makes sense when
the whole network shares the same schedule, so, I wanted a way so I minimize the
amount of information of each schedule that's controller dependent, Linux already 
does most of it with the separation of traffic classes and queues (you are right that 
802.1Q is confusing on this), the idea is that the only thing that needs to change from 
one node to another in the network is the "queues" parameter. Because each node might 
have different number of queues, or assign different priorities to different queues.  

So, that's the idea of doing that intermediate "transformation" step: taprio knows about
traffic classes and HW queues, but the driver only knows about HW queues. And unless I made
a mistake, tc_mask_to_queue_mask() should be equivalent to:  

netdev_get_prio_tc_map() + scanning the gatemask for BIT(tc).

(Thinking more about this, I am having a few ideas about ways to simplify software mode :-)

> 
> Q.2 Using gate operations to create protected windows The enhancements for
> scheduled traffic described in 8.6.8.4 allow transmission to be switched on and
> off on a timed basis for each _traffic class_ that is implemented on a port. This
> switching is achieved by means of individual on/off transmission gates
> associated with each _traffic class_ and a list of gate operations that control the
> gates; an individual SetGateStates operation has a time delay parameter that
> indicates the delay after the gate operation is executed until the next operation
> is to occur, and a GateState parameter that defines a vector of up to eight state
> values (open or
> closed) that is to be applied to each gate when the operation is executed. The
> gate operations allow any combination of open/closed states to be defined, and
> the mechanism makes no assumptions about which _traffic classes_ are being
> “protected” and which are “unprotected”; any such assumptions are left to the
> designer of the sequence of gate operations.
> 
> Table 8-7—Gate operations
> The GateState parameter indicates a value, open or closed, for each of the
> Port’s _queues_.
> 
> - What happens with the "clockid" argument now that hardware offload is
> possible? Do we allow "/dev/ptp0" to be specified as input?
> Actually this question is relevant to your txtime-assist mode as well:
> doesn't it assume that there is an implicit phc2sys instance running to keep the
> system time in sync with the PHC?

That's a very interesting question. I think, for now, allowing specifying /dev/ptp* clocks
won't work "always": if the driver or something needs to add a timer to be able to run 
the schedule, it won't be able to use /dev/ptp* clocks (hrtimers and ptp clocks don’t mix).
But for "full" offloads, it should work.

So, you are right, taprio and txtime-assisted (and ETF) require the system clock and phc 
clock to be synchronized, via something like phc2sys.

Hope I got all your questions.

Cheers,
--
Vinicius