netdev - Re: [PATCH net v4 1/2] net/sched: Fix backlog accounting in qdisc_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Xd_A9IO0dh4NAVigE2yIDk9ZbCEz4XRcUO1PBNl2G6kEZF6TEAeXtDR85R_P-zIMdSL17cULM_GdmijrKs84RdMewdZswMDCBu5G7oBrajY=@willsroot.io>
Date: Mon, 11 Aug 2025 17:51:39 +0000
From: William Liu <will@...lsroot.io>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com, pabeni@...hat.com, jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, horms@...nel.org, savy@...t3mfailure.io, victor@...atatu.com
Subject: Re: [PATCH net v4 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal

On Monday, August 11th, 2025 at 5:24 PM, Jakub Kicinski <kuba@...nel.org> wrote:

> 
> 
> On Mon, 11 Aug 2025 16:52:51 +0000 William Liu wrote:
> 
> > > > Can you elaborate on this?
> > > > 
> > > > I just moved the reset of two cstats fields from the dequeue handler
> > > > epilogue to the prologue. Those specific cstats fields are not used
> > > > elsewhere so they should be fine,
> > > 
> > > That's the disconnect. AFAICT they are passed to codel_dequeue(),
> > > and will be used during normal dequeue, as part of normal active
> > > queue management under traffic..
> > 
> > Yes, that is the only place those values are used. From my
> > understanding, codel_dequeue is only called in fq_codel_dequeue. So
> > moving the reset from the dequeue epilogue to the dequeue prologue
> > should be fine as the same behavior is kept - the same values should
> > always be used by codel_dequeue.
> > 
> > Is there a case I am not seeing? If so, I can just add additional
> > fields to the fq_codel_sched_data, but wanted to avoid doing that for
> > this one edge case.
> 
> 
> This sort of separation of logic is very error prone in general.
> If you're asking for a specific bug that would exist with your
> patch - I believe that two subsequent fq_codel_change() calls,
> first one reducing the limit, the other one not reducing (and
> therefore never invoking dequeue) will adjust the backlog twice.
> 

In that case, I think the code in the limit adjustment while loop never run, so the backlog reduction would only happen with arguments of 0. But yes, I agree that this approach is not ideal.

> As I commented in the previous message - wouldn't counting the
> packets we actually dequeue not solve this problem? smth like:
> 
> pkts = 0;
> bytes = 0;
> while (sch->q.qlen > sch->limit ||
> 
> q->memory_usage > q->memory_limit) {
> 
> struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
> 
> pkts++;
> bytes += qdisc_pkt_len(skb);
> rtnl_kfree_skbs(skb, skb);
> }
> qdisc_tree_reduce_backlog(sch, pkts, bytes);
> 
> ? "Conceptually" we are only responsible for adjusting the backlog
> for skbs we actually gave to kfree_skb().

I think the issue here is qdisc_dequeue_internal can call the actual dequeue handler, and fq_codel_dequeue would have already made a qdisc_tree_reduce_backlog call [1] when cstats.drop_count is non-zero. Wouldn't we be double counting packets and bytes for qdisc_tree_reduce_backlog after the limit adjustment loop with this approach?

[1] https://elixir.bootlin.com/linux/v6.16/source/net/sched/sch_fq_codel.c#L320