[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <6be17a08-f8aa-4f91-9bd0-d9e1f0a92d90@kernel.org>
Date: Fri, 30 Jan 2026 17:54:05 +0100
From: Jesper Dangaard Brouer <hawk@...nel.org>
To: Toke Høiland-Jørgensen <toke@...hat.com>,
netdev@...r.kernel.org, Eric Dumazet <eric.dumazet@...il.com>,
"David S. Miller" <davem@...emloft.net>, Paolo Abeni <pabeni@...hat.com>
Cc: bpf@...r.kernel.org, Jakub Kicinski <kuba@...nel.org>, horms@...nel.org,
jiri@...nulli.us, edumazet@...gle.com, xiyou.wangcong@...il.com,
jhs@...atatu.com, carges@...udflare.com, kernel-team@...udflare.com
Subject: Re: [PATCH net-next v3] net: sched: sfq: add detailed drop reasons
for monitoring
On 30/01/2026 09.40, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <hawk@...nel.org> writes:
>
>> Add specific drop reasons to SFQ qdisc to improve packet drop
>> observability and monitoring capabilities. This change replaces
>> generic qdisc_drop() calls with qdisc_drop_reason() to provide
>> granular metrics about different drop scenarios in production
>> environments.
>>
>> Two new drop reasons are introduced:
>>
>> - SKB_DROP_REASON_QDISC_SFQ_MAXFLOWS: Used when a new flow cannot
>> be created because the maximum number of flows (flows parameter)
>> has been reached and no free flow slots are available.
>>
>> - SKB_DROP_REASON_QDISC_SFQ_MAXDEPTH: Used when a flow's queue
>> length exceeds the per-flow depth limit (depth parameter),
>> triggering either tail drop or head drop depending on headdrop
>> configuration.
>>
>> The existing SKB_DROP_REASON_QDISC_OVERLIMIT is used in sfq_drop()
>> when the overall qdisc limit is exceeded and packets are dropped
>> from the longest queue.
>>
>> The naming uses qdisc-specific drop reasons tied to SFQ tunables,
>> following the pattern established by Eric's FQ commit 5765c7f6e317
>> ("net_sched: sch_fq: add three drop_reason") which introduced
>> FQ_BAND_LIMIT, FQ_HORIZON_LIMIT, and FQ_FLOW_LIMIT.
>>
>> The new drop reasons are inserted in the middle of the enum after
>> SKB_DROP_REASON_QDISC_CONGESTED to group all qdisc-related reasons
>> together. While drop reason enum values are not UAPI, the enum
>> names implicitly become UAPI as userspace tools rely on BTF to
>> resolve names to values. This makes middle-insertion safe as long
>> as names remain stable.
>>
>> These detailed drop reasons enable production monitoring systems
>> to distinguish between different SFQ drop scenarios and generate
>> specific metrics for:
>> - Flow table exhaustion (flows exceeded)
>> - Per-flow congestion (depth limit exceeded)
>> - Global qdisc congestion (overall limit exceeded)
>>
>> This granular visibility allows operators to identify capacity
>> planning needs, detect traffic patterns, and optimize SFQ
>> configuration based on real-world drop patterns.
>>
>> Signed-off-by: Jesper Dangaard Brouer <hawk@...nel.org>
>
> Okay, so I went looking for ways to get the information (which qdisc did
> the drop come from) with a generic drop reason and... it's not exactly
> trivial?
>
(Correct, it is actually not possible to get the qdisc from the call
location (see below), because we don't have the txq.)
> My thought was that you could get it from the location, but that ends up
> being in the generic __dev_queue_xmit:
>
> ping-117101 [017] b.s2. 155651.231372: kfree_skb: skbaddr=0000000019495eb6 rx_sk=0000000000000000 protocol=34525 location=__dev_queue_xmit+0xcbd/0xee0 reason: QDISC_DROP
>
> So you can't actually use the kfree_skb trace point to go backwards to
> the qdisc.
Thanks you for providing this example. Showing what info is provided by
the tracepoint to existing tools. (The location=__dev_queue_xmit is
key, but realize this is actually __dev_xmit_skb being inlined)
> Given that we have a sometimes-NULL socket parameter to the kfree_skb
> tracepoint, I think it would make sense to augment it with a 'dev'
> parameter as well, but that would require some restructuring of the drop
> code, I guess.
>
Adding 'dev' parameter to drop tracepoint is IMHO a can-of-worms.
I.e. do you mean the ingress or egress device?
The skb->dev pointer changes meaning ingress or egress depending on
where in the network stack code we are. And sometimes it is not even a
net_device pointer, see the double union[1] skb->dev belongs to.
Besides having the 'dev' doesn't help my use-case, I need the qdisc.
[1]
https://elixir.bootlin.com/linux/v6.18.6/source/include/linux/skbuff.h#L886-L900
I've been looking over the code multiple times. A possible alternative
is to have a new tracepoint for qdisc drops (meaning it no-longer calls
trace_kfree_skb / skb:kfree_skb). We could argue that qdisc drops have
it's own meaning like "consume". Would this be acceptable?
This will allow tracepoint to have it's own enum set of
qdisc_drop_reasons. This new tracepoint should have the same arguments
as __dev_xmit_skb, and we know the 'dev' is egress :
static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q,
struct net_device *dev,
struct netdev_queue *txq)
Doing this work involves removing all the qdisc specific drop_reason,
except SKB_DROP_REASON_QDISC_DROP. Is that acceptable?
I need some buy-in from maintainers Paolo/Jakub/Eric if you want me to
go down this road. It is a lot of churn and restructuring for
something, that might be overkill. Current patch solves my use-case,
but it would be cool/powerful to have qdisc pointer readable by BPF,
although I don't have a use-case in mind.
> Another option would be to add new stats members to sfq for each of
> these drop reasons? Assuming you're only after the counts, that would
> allow you to get those without adding new drop reasons to the core code?
>
I'm fine with someone adding more stats members to SFQ, but it doesn't
solve my use-case. As mentioned before, we also sample packets and have
different sample rates based on drop_reasons. We also use this as a
cmdline troubleshooting tool when debugging a prod issue, e.g. just
using perf record.
> OTOH, since we already have per-qdisc specific drop reasons for FQ and
> CAKE, and adding more enums is cheap, I am also OK with adding these; so
> with that in mind:
>
> Reviewed-by: Toke Høiland-Jørgensen <toke@...hat.com>
Thanks a lot for your review.
--Jesper
Powered by blists - more mailing lists