lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRhUsbR6DT1F0bqc@p1>
Date: Sat, 15 Nov 2025 03:23:45 -0700
From: Xiang Mei <xmei5@....edu>
To: Toke Høiland-Jørgensen <toke@...e.dk>
Cc: security@...nel.org, netdev@...r.kernel.org, cake@...ts.bufferbloat.net,
	bestswngs@...il.com
Subject: Re: [PATCH net v3] net/sched: sch_cake: Fix incorrect qlen reduction
 in cake_drop

On Thu, Nov 13, 2025 at 02:35:18PM +0100, Toke Høiland-Jørgensen wrote:
> Xiang Mei <xmei5@....edu> writes:
> 
> > There is still one problem I am not very sure since I am not very 
> > experienced with cake and gso. It's about the gso branch [1]. The slen 
> > is the lenth added to the cake sch and that branch uses 
> > `qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);` to inform the 
> > parent sched. However, when we drop the packet, it could be probmatic 
> > since we should reduce slen instead of len. Is this a potential
> > problem?
> 
> Hmm, no I think it's fine? The qdisc_tree_reduce_backlog(sch, 1-numsegs,
> len-slen) *increases* the backlog with the difference between the
> original length and the number of new segments. And then we *decrease*
> the backlog with the number of bytes we dropped.
> 
> The compensation we're doing is for the backlog update of the parent,
> which is still using the original packet length regardless of any
> splitting, so that doesn't change the compensation value.
> 
> -Toke

I still think current method to reduce backlog may be problematic:
What you said is stated for the GSO branch when cake_queue returns
NET_XMIT_SUCCESS, but it may lead to issues when it returns NET_XMIT_CN.
For the normal case where no dropping happens, the current implementation
is correct. We can see how qlen and backlog change as follows:

backlog:
	-(len - slen)  Reason: qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen);
	+len           Reason: parent enqueue)
	Total: slen
qlen:
	-(1 - numsegs) Reason: qdisc_tree_reduce_backlog(sch, 1 - numsegs, len - slen);
	+1 	       Reason: parent enqueue
	Total: numsegs

This makes sense because we split one packet into numsegs packets of total
length slen and enqueue them all. When a drop happens, we must fix both 
qlen and backlog.

In the not patched code, cake_drop() calls qdisc_tree_reduce_backlog() for
dropped packets. This works in most cases but ignores the scenario where 
we drop (parts of) the incoming packet, meaning the expected:

```
backlog += len
qlen += 1
```

will not run because the parent scheduler stops enqueueing after seeing
NET_XMIT_CN. For normal packets (non-GSO), it's easy to fix: just do
qdisc_tree_reduce_backlog(sch, 1, len). However, GSO splitting makes this
difficult because we may have already added multiple segments into the
flow, and we don’t know how many of them were dequeued.

The number of dequeued segments can be anywhere in [0, numsegs], and the
dequeued length in [0, slen]. We cannot know the exact number without 
checking the tin/flow index of each dropped packet. Therefore, we should
check inside the loop (as v1 did):

```
cake_drop(...)
{
    ...
    if (likely(current_flow != idx + (tin << 16)))
        qdisc_tree_reduce_backlog(sch, 1, len);
    ...
}
```

This solution also has a problem, as you mentioned:
if the flow already contains packets, dropping those packets should
trigger backlog reduction, but our check would incorrectly skip that. One
possible solution is to track the number of packets/segments enqueued
in the current cake_enqueue (numsegs or 1), and then avoid calling
`qdisc_tree_reduce_backlog(sch, 1, len)` for the 1 or numsegs dropped
packets. If that makes sense, I'll make the patch and test it.

-----

Besides, I have a question about the condition for returning NET_XMIT_CN.
Do we return NET_XMIT_CN when:

The incoming packet itself is dropped? (makes more sense to me)
or
The same flow dequeued once? (This is the current logic)

If we keep the current logic, the above patch approach works. If not, we 
need additional checks because we append the incoming packet to the tail
but drops occur at the head.

Thanks,
Xiang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ