netdev - Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7guebhjv734hjkgtnmloyj7lwaaxj6nz5as4bjruo24t3vs72r@54ryrobt6tdo>
Date: Fri, 21 Nov 2025 13:33:25 -0700
From: Xiang Mei <xmei5@....edu>
To: Toke Høiland-Jørgensen <toke@...e.dk>
Cc: security@...nel.org, netdev@...r.kernel.org, 
	cake@...ts.bufferbloat.net, bestswngs@...il.com
Subject: Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction
 in cake_drop

On Mon, Nov 17, 2025 at 02:23:45PM +0100, Toke Høiland-Jørgensen wrote:
> > will not run because the parent scheduler stops enqueueing after seeing
> > NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
> > qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
> > difficult because we may have already added multiple segments into the
> > flow, and we don’t know how many of them were dequeued.
> 
> Huh, dequeued? This is all running under the qdisc lock, nothing gets
> dequeued in the meantime.

Sorry for using wrong word. The "dequeued" should be "dropped".

> 
> Besides, the ACK thinning is irrelevant to the drop compensation. Here's
> an example:
> 
> Without ACK splitting - we enqueue 1 packet of 100 bytes, then drop 1
> packet of 200 bytes, so we should end up with the same qlen, but 100
> fewer bytes in the queue:
> 
> start: parent qlen = X, parent backlog = Y
> 
> len = 100;
> cake_drop() drops 1 pkt / 200 bytes
> 
> if (same_flow) {
>   qdisc_reduce_backlog(0, 100) // parent qlen == X, parent backlog == Y - 100
>   return NET_XMIT_CN;
>   // no change in parent, so parent qlen == X, parent backlog == Y - 100
> } else {
>   qdisc_reduce_backlog(1, 200); // parent qlen == X - 1, backlog == Y - 200
>   return NET_XMIT_SUCCESS;
>   // parent does qlen +=1, backlog += 100, so parent qlen == x, parent backlog == Y - 100
> }
> 
> With ACK splitting - we enqueue 10 segments totalling 110 bytes, then
> drop 1 packet of 200 bytes, so we should end up with 9 packets more in
> the queue, but 90 bytes less:
> 
> start: parent qlen = X, parent backlog = Y
> 
> len = 100;
> /* split ack: slen == 110, numsegs == 10 */
> qdisc_tree_reduce_backlog(-9, -10); // parent qlen == X + 9, backlog == Y + 10
> 
> cake_drop() drops 1 pkt / 200 bytes
> 
> if (same_flow) {
>   qdisc_reduce_backlog(0, 100)   // parent qlen == X + 9, backlog == Y - 90
>   return NET_XMIT_CN;
>   // no change in parent, so parent qlen == X + 9, backlog == Y - 90
> 
> } else {
>   qdisc_reduce_backlog(1, 200); // parent qlen == X + 8, backlog == Y - 190
>   return NET_XMIT_SUCCESS;
>   // parent does qlen +=1, backlog += 100, so parent qlen == X + 9, backlog == Y - 90
> }
> 
> 
> In both cases, what happens is that we drop one or more packets, reduce
> the backlog by the number of packets/bytes dropped *while compensating
> for what the parent qdisc does on return*. So the adjustments made by
> the segmentation makes no difference to the final outcome.

Thanks for the detailed explanations. You are right. The current patch
logic is correct to handle these cases.

> 
> However, we do have one problem with the ACK thinning code: in the 'if
> (ack)' branch, we currently adjust 'len' if we drop an ACK. Meaning that
> if we use that value later to adjust for what the parent qdisc, the
> value will no longer agree with what the parent does. So we'll have to
> introduce a new variable for the length used in the ACK thinning
> calculation.
> 

I see the issue. It will be resolved in the new patch.

> > The number of dequeued segments can be anywhere in [0, numsegs], and the
> > dequeued length in [0, slen]. We cannot know the exact number without 
> > checking the tin/flow index of each dropped packet. Therefore, we should
> > check inside the loop (as v1 did):
> >
> > ```
> > cake_drop(...)
> > {
> >     ...
> >     if (likely(current_flow != idx + (tin << 16)))
> >         qdisc_tree_reduce_backlog(sch, 1, len);
> >     ...
> > }
> > ```
> 
> No, this is not needed - the calculation involving prev_qlen and
> prev_backlog will correctly give us the total number of packets/bytes
> dropped.
> >
> > This solution also has a problem, as you mentioned:
> > if the flow already contains packets, dropping those packets should
> > trigger backlog reduction, but our check would incorrectly skip that. One
> > possible solution is to track the number of packets/segments enqueued
> > in the current cake_enqueue (numsegs or 1), and then avoid calling
> > `qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
> > packets. If that makes sense, I'll make the patch and test it.
> 
> It does not - see above.
> 
> > -----
> >
> > Besides, I have a question about the condition for returning NET_XMIT_CN.
> > Do we return NET_XMIT_CN when:
> >
> > The incoming packet itself is dropped? (makes more sense to me)
> > or
> > The same flow dequeued once? (This is the current logic)
> 
> The same flow. The current logic it correct.
> 
> -Toke


Thanks for the explanations. I have understood NET_XMIT_CN wrong. A patch
(removing var droppped and handling ACK branch correctly) will be tested
and sent.

Thanks,
Xiang