netdev - Re: [RFC net] Fixes: b63c5478e9cb ("ipv6: ioam: Support for Queue depth data field")

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1328d117-70b5-b03c-c0be-cd046d728d53@uliege.be>
Date:   Wed, 7 Dec 2022 13:07:18 +0100
From:   Justin Iurman <justin.iurman@...ege.be>
To:     Jakub Kicinski <kuba@...nel.org>
Cc:     Eric Dumazet <edumazet@...gle.com>, netdev@...r.kernel.org,
        davem@...emloft.net, yoshfuji@...ux-ipv6.org, dsahern@...nel.org,
        pabeni@...hat.com, stable@...r.kernel.org
Subject: Re: [RFC net] Fixes: b63c5478e9cb ("ipv6: ioam: Support for Queue
 depth data field")

On 12/6/22 21:43, Jakub Kicinski wrote:
> On Mon, 5 Dec 2022 21:44:09 +0100 Justin Iurman wrote:
>>> Please revert this patch.
>>>
>>> Many people use FQ qdisc, where packets are waiting for their Earliest
>>> Departure Time to be released.
>>
>> The IOAM queue depth is a very important value and is already used.
> 
> Can you say more about the use? What signal do you derive from it?
> I do track qlen on Meta's servers but haven't found a strong use
> for it yet (I did for backlog drops but not the qlen itself).

The specification goal of the queue depth was initially to be able to 
track the entire path with a detailed view for packets or flows (kind of 
a zoom on the interface to have details about its queues). With the 
current definition/implementation of the queue depth, if only one queue 
is congested, you're able to know it. Which doesn't necessarily mean 
that all queues are full, but this one is and there might be something 
going on. And this is something operators might want to be able to 
detect precisely, for a lot of use cases depending on the situation. On 
the contrary, if all queues are full, then you could deduce that as well 
for each queue separately, as soon as a packet is assigned to it. So I 
think that with "queue depth = sum(queues)", you don't have details and 
you're not able to detect a single queue congestion, while with "queue 
depth = queue" you could detect both. One might argue that it's fine to 
only have the aggregation in some situation. I'd say that we might need 
both, actually. Which is technically possible (even though expensive, as 
Eric mentioned) thanks to the way it is specified by the RFC, where some 
freedom was intentionally given. I could come up with a solution for that.

>>> Also, the draft says:
>>>
>>> 5.4.2.7.  queue depth
>>>
>>>      The "queue depth" field is a 4-octet unsigned integer field.  This
>>>      field indicates the current length of the egress interface queue of
>>>      the interface from where the packet is forwarded out.  The queue
>>>      depth is expressed as the current amount of memory buffers used by
>>>      the queue (a packet could consume one or more memory buffers,
>>>      depending on its size).
>>>
>>>       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
>>>      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>      |                       queue depth                             |
>>>      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>>>
>>>
>>> It is relatively clear that the egress interface is the aggregate
>>> egress interface,
>>> not a subset of the interface.
>>
>> Correct, even though the definition of an interface in RFC 9197 is quite
>> abstract (see the end of section 4.4.2.2: "[...] could represent a
>> physical interface, a virtual or logical interface, or even a queue").
>>
>>> If you have 32 TX queues on a NIC, all of them being backlogged (line rate),
>>> sensing the queue length of one of the queues would give a 97% error
>>> on the measure.
>>
>> Why would it? Not sure I get your idea based on that example.
> 
> Because it measures the length of a single queue not the device.

Yep, I figured that out after the off-list discussion we've had with Eric.

So my plan would be, if you all agree with, to correct and repost this 
patch to fix the NULL qdisc issue. Then, I'd come with a solution to 
allow both (with and without aggregation of queues) and post it on 
net-next. But again, if the consensus is to revert this patch (which I 
think would bring no benefit IMHO), then so be it. Thoughts?