netdev - Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction in cake

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <87a50kokri.fsf@toke.dk>
Date: Mon, 17 Nov 2025 14:23:45 +0100
From: Toke Høiland-Jørgensen <toke@...e.dk>
To: Xiang Mei <xmei5@....edu>
Cc: security@...nel.org, netdev@...r.kernel.org, cake@...ts.bufferbloat.net,
 bestswngs@...il.com
Subject: Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen
 reduction in cake_drop

> will not run because the parent scheduler stops enqueueing after seeing
> NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
> qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
> difficult because we may have already added multiple segments into the
> flow, and we don’t know how many of them were dequeued.

Huh, dequeued? This is all running under the qdisc lock, nothing gets
dequeued in the meantime.

Besides, the ACK thinning is irrelevant to the drop compensation. Here's
an example:

Without ACK splitting - we enqueue 1 packet of 100 bytes, then drop 1
packet of 200 bytes, so we should end up with the same qlen, but 100
fewer bytes in the queue:

start: parent qlen = X, parent backlog = Y

len = 100;
cake_drop() drops 1 pkt / 200 bytes

if (same_flow) {
  qdisc_reduce_backlog(0, 100) // parent qlen == X, parent backlog == Y - 100
  return NET_XMIT_CN;
  // no change in parent, so parent qlen == X, parent backlog == Y - 100
} else {
  qdisc_reduce_backlog(1, 200); // parent qlen == X - 1, backlog == Y - 200
  return NET_XMIT_SUCCESS;
  // parent does qlen +=1, backlog += 100, so parent qlen == x, parent backlog == Y - 100
}

With ACK splitting - we enqueue 10 segments totalling 110 bytes, then
drop 1 packet of 200 bytes, so we should end up with 9 packets more in
the queue, but 90 bytes less:

start: parent qlen = X, parent backlog = Y

len = 100;
/* split ack: slen == 110, numsegs == 10 */
qdisc_tree_reduce_backlog(-9, -10); // parent qlen == X + 9, backlog == Y + 10

cake_drop() drops 1 pkt / 200 bytes

if (same_flow) {
  qdisc_reduce_backlog(0, 100)   // parent qlen == X + 9, backlog == Y - 90
  return NET_XMIT_CN;
  // no change in parent, so parent qlen == X + 9, backlog == Y - 90

} else {
  qdisc_reduce_backlog(1, 200); // parent qlen == X + 8, backlog == Y - 190
  return NET_XMIT_SUCCESS;
  // parent does qlen +=1, backlog += 100, so parent qlen == X + 9, backlog == Y - 90
}


In both cases, what happens is that we drop one or more packets, reduce
the backlog by the number of packets/bytes dropped *while compensating
for what the parent qdisc does on return*. So the adjustments made by
the segmentation makes no difference to the final outcome.

However, we do have one problem with the ACK thinning code: in the 'if
(ack)' branch, we currently adjust 'len' if we drop an ACK. Meaning that
if we use that value later to adjust for what the parent qdisc, the
value will no longer agree with what the parent does. So we'll have to
introduce a new variable for the length used in the ACK thinning
calculation.

> The number of dequeued segments can be anywhere in [0, numsegs], and the
> dequeued length in [0, slen]. We cannot know the exact number without 
> checking the tin/flow index of each dropped packet. Therefore, we should
> check inside the loop (as v1 did):
>
> ```
> cake_drop(...)
> {
>     ...
>     if (likely(current_flow != idx + (tin << 16)))
>         qdisc_tree_reduce_backlog(sch, 1, len);
>     ...
> }
> ```

No, this is not needed - the calculation involving prev_qlen and
prev_backlog will correctly give us the total number of packets/bytes
dropped.
>
> This solution also has a problem, as you mentioned:
> if the flow already contains packets, dropping those packets should
> trigger backlog reduction, but our check would incorrectly skip that. One
> possible solution is to track the number of packets/segments enqueued
> in the current cake_enqueue (numsegs or 1), and then avoid calling
> `qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
> packets. If that makes sense, I'll make the patch and test it.

It does not - see above.

> -----
>
> Besides, I have a question about the condition for returning NET_XMIT_CN.
> Do we return NET_XMIT_CN when:
>
> The incoming packet itself is dropped? (makes more sense to me)
> or
> The same flow dequeued once? (This is the current logic)

The same flow. The current logic it correct.

-Toke