[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8eae6bfd-c641-4fd4-9642-7fc038a974a2@redhat.com>
Date: Tue, 20 May 2025 15:33:58 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: chia-yu.chang@...ia-bell-labs.com, horms@...nel.org,
donald.hunter@...il.com, xandfury@...il.com, netdev@...r.kernel.org,
dave.taht@...il.com, jhs@...atatu.com, kuba@...nel.org,
stephen@...workplumber.org, xiyou.wangcong@...il.com, jiri@...nulli.us,
davem@...emloft.net, edumazet@...gle.com, andrew+netdev@...n.ch,
ast@...erby.net, liuhangbin@...il.com, shuah@...nel.org,
linux-kselftest@...r.kernel.org, ij@...nel.org, ncardwell@...gle.com,
koen.de_schepper@...ia-bell-labs.com, g.white@...lelabs.com,
ingemar.s.johansson@...csson.com, mirja.kuehlewind@...csson.com,
cheshire@...le.com, rs.ietf@....at, Jason_Livingood@...cast.com,
vidhi_goel@...le.com
Cc: Olga Albisser <olga@...isser.org>,
Olivier Tilmans <olivier.tilmans@...ia.com>,
Henrik Steen <henrist@...rist.net>, Bob Briscoe <research@...briscoe.net>
Subject: Re: [PATCH v16 net-next 3/5] sched: Add enqueue/dequeue of dualpi2
qdisc
On 5/16/25 2:01 AM, chia-yu.chang@...ia-bell-labs.com wrote:
> From: Koen De Schepper <koen.de_schepper@...ia-bell-labs.com>
>
> DualPI2 provides L4S-type low latency & loss to traffic that uses a
> scalable congestion controller (e.g. TCP-Prague, DCTCP) without
> degrading the performance of 'classic' traffic (e.g. Reno,
> Cubic etc.). It is to be the reference implementation of IETF RFC9332
> DualQ Coupled AQM (https://datatracker.ietf.org/doc/html/rfc9332).
>
> Note that creating two independent queues cannot meet the goal of
> DualPI2 mentioned in RFC9332: "...to preserve fairness between
> ECN-capable and non-ECN-capable traffic." Further, it could even
> lead to starvation of Classic traffic, which is also inconsistent
> with the requirements in RFC9332: "...although priority MUST be
> bounded in order not to starve Classic traffic." DualPI2 is
> designed to maintain approximate per-flow fairness on L-queue and
> C-queue by forming a single qdisc using the coupling factor and
> scheduler between two queues.
>
> The qdisc provides two queues called low latency and classic. It
> classifies packets based on the ECN field in the IP headers. By
> default it directs non-ECN and ECT(0) into the classic queue and
> ECT(1) and CE into the low latency queue, as per the IETF spec.
>
> Each queue runs its own AQM:
> * The classic AQM is called PI2, which is similar to the PIE AQM but
> more responsive and simpler. Classic traffic requires a decent
> target queue (default 15ms for Internet deployment) to fully
> utilize the link and to avoid high drop rates.
> * The low latency AQM is, by default, a very shallow ECN marking
> threshold (1ms) similar to that used for DCTCP.
>
> The DualQ isolates the low queuing delay of the Low Latency queue
> from the larger delay of the 'Classic' queue. However, from a
> bandwidth perspective, flows in either queue will share out the link
> capacity as if there was just a single queue. This bandwidth pooling
> effect is achieved by coupling together the drop and ECN-marking
> probabilities of the two AQMs.
>
> The PI2 AQM has two main parameters in addition to its target delay.
> The integral gain factor alpha is used to slowly correct any persistent
> standing queue error from the target delay, while the proportional gain
> factor beta is used to quickly compensate for queue changes (growth or
> shrinkage). Either alpha and beta are given as a parameter, or they can
> be calculated by tc from alternative typical and maximum RTT parameters.
>
> Internally, the output of a linear Proportional Integral (PI)
> controller is used for both queues. This output is squared to
> calculate the drop or ECN-marking probability of the classic queue.
> This counterbalances the square-root rate equation of Reno/Cubic,
> which is the trick that balances flow rates across the queues. For
> the ECN-marking probability of the low latency queue, the output of
> the base AQM is multiplied by a coupling factor. This determines the
> balance between the flow rates in each queue. The default setting
> makes the flow rates roughly equal, which should be generally
> applicable.
>
> If DUALPI2 AQM has detected overload (due to excessive non-responsive
> traffic in either queue), it will switch to signaling congestion
> solely using drop, irrespective of the ECN field. Alternatively, it
> can be configured to limit the drop probability and let the queue
> grow and eventually overflow (like tail-drop).
>
> GSO splitting in DUALPI2 is configurable from userspace while the
> default behavior is to split gso. When running DUALPI2 at unshaped
> 10gigE with 4 download streams test, splitting gso apart results in
> halving the latency with no loss in throughput:
>
> Summary of tcp_4down run 'no_split_gso':
> avg median # data pts
> Ping (ms) ICMP : 0.53 0.30 ms 350
> TCP download avg : 2326.86 N/A Mbits/s 350
> TCP download sum : 9307.42 N/A Mbits/s 350
> TCP download::1 : 2672.99 2568.73 Mbits/s 350
> TCP download::2 : 2586.96 2570.51 Mbits/s 350
> TCP download::3 : 1786.26 1798.82 Mbits/s 350
> TCP download::4 : 2261.21 2309.49 Mbits/s 350
>
> Summart of tcp_4down run 'split_gso':
> avg median # data pts
> Ping (ms) ICMP : 0.22 0.23 ms 350
> TCP download avg : 2335.02 N/A Mbits/s 350
> TCP download sum : 9340.09 N/A Mbits/s 350
> TCP download::1 : 2335.30 2334.22 Mbits/s 350
> TCP download::2 : 2334.72 2334.20 Mbits/s 350
> TCP download::3 : 2335.28 2334.58 Mbits/s 350
> TCP download::4 : 2334.79 2334.39 Mbits/s 350
>
> A similar result is observed when running DUALPI2 at unshaped 1gigE
> with 1 download stream test:
>
> Summary of tcp_1down run 'no_split_gso':
> avg median # data pts
> Ping (ms) ICMP : 1.13 1.25 ms 350
> TCP download : 941.41 941.46 Mbits/s 350
>
> Summart of tcp_1down run 'split_gso':
> avg median # data pts
> Ping (ms) ICMP : 0.51 0.55 ms 350
> TCP download : 941.41 941.45 Mbits/s 350
>
> Additional details can be found in the draft:
> https://datatracker.ietf.org/doc/html/rfc9332
>
> Signed-off-by: Koen De Schepper <koen.de_schepper@...ia-bell-labs.com>
> Co-developed-by: Olga Albisser <olga@...isser.org>
> Signed-off-by: Olga Albisser <olga@...isser.org>
> Co-developed-by: Olivier Tilmans <olivier.tilmans@...ia.com>
> Signed-off-by: Olivier Tilmans <olivier.tilmans@...ia.com>
> Co-developed-by: Henrik Steen <henrist@...rist.net>
> Signed-off-by: Henrik Steen <henrist@...rist.net>
> Signed-off-by: Bob Briscoe <research@...briscoe.net>
> Signed-off-by: Ilpo Järvinen <ij@...nel.org>
> Co-developed-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> Acked-by: Dave Taht <dave.taht@...il.com>
> ---
> include/net/dropreason-core.h | 6 +
> net/sched/Kconfig | 12 +
> net/sched/Makefile | 1 +
> net/sched/sch_dualpi2.c | 449 ++++++++++++++++++++++++++++++++++
> 4 files changed, 468 insertions(+)
>
> diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
> index bea77934a235..faae9f416e54 100644
> --- a/include/net/dropreason-core.h
> +++ b/include/net/dropreason-core.h
> @@ -120,6 +120,7 @@
> FN(ARP_PVLAN_DISABLE) \
> FN(MAC_IEEE_MAC_CONTROL) \
> FN(BRIDGE_INGRESS_STP_STATE) \
> + FN(DUALPI2_STEP_DROP) \
> FNe(MAX)
>
> /**
> @@ -570,6 +571,11 @@ enum skb_drop_reason {
> * ingress bridge port does not allow frames to be forwarded.
> */
> SKB_DROP_REASON_BRIDGE_INGRESS_STP_STATE,
> + /**
> + * @SKB_DROP_REASON_DUALPI2_STEP_DROP: dropped by the step drop
> + * threshold of DualPI2 qdisc.
> + */
> + SKB_DROP_REASON_DUALPI2_STEP_DROP,
> /**
> * @SKB_DROP_REASON_MAX: the maximum of core drop reasons, which
> * shouldn't be used as a real 'reason' - only for tracing code gen
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index 9f0b3f943fca..dda66a3590d8 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -415,6 +415,18 @@ config NET_SCH_BPF
>
> If unsure, say N.
>
> +config NET_SCH_DUALPI2
> + tristate "Dual Queue PI Square (DUALPI2) scheduler"
> + help
> + Say Y here if you want to use the Dual Queue Proportional Integral
> + Controller Improved with a Square scheduling algorithm.
> + For more information, please see https://tools.ietf.org/html/rfc9332
> +
> + To compile this driver as a module, choose M here: the module
> + will be called sch_dualpi2.
> +
> + If unsure, say N.
> +
> menuconfig NET_SCH_DEFAULT
> bool "Allow override default queue discipline"
> help
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 904d784902d1..5078ea84e6ad 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -63,6 +63,7 @@ obj-$(CONFIG_NET_SCH_CBS) += sch_cbs.o
> obj-$(CONFIG_NET_SCH_ETF) += sch_etf.o
> obj-$(CONFIG_NET_SCH_TAPRIO) += sch_taprio.o
> obj-$(CONFIG_NET_SCH_BPF) += bpf_qdisc.o
> +obj-$(CONFIG_NET_SCH_DUALPI2) += sch_dualpi2.o
>
> obj-$(CONFIG_NET_CLS_U32) += cls_u32.o
> obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o
> diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
> index 97986c754e47..7ecd7502332c 100644
> --- a/net/sched/sch_dualpi2.c
> +++ b/net/sched/sch_dualpi2.c
> @@ -113,8 +113,44 @@ struct dualpi2_sched_data {
> u32 step_marks; /* ECN mark pkt counter due to step AQM */
> u32 memory_used; /* Memory used of both queues */
> u32 max_memory_used;/* Maximum used memory */
> +
> + /* Deferred drop statistics */
> + u32 deferred_drops_cnt; /* Packets dropped */
> + u32 deferred_drops_len; /* Bytes dropped */
> +};
> +
> +struct dualpi2_skb_cb {
> + u64 ts; /* Timestamp at enqueue */
> + u8 apply_step:1, /* Can we apply the step threshold */
> + classified:2, /* Packet classification results */
> + ect:2; /* Packet ECT codepoint */
> +};
> +
> +enum dualpi2_classification_results {
> + DUALPI2_C_CLASSIC = 0, /* C-queue */
> + DUALPI2_C_L4S = 1, /* L-queue (scale mark/classic drop) */
> + DUALPI2_C_LLLL = 2, /* L-queue (no drops/marks) */
> + __DUALPI2_C_MAX /* Keep last*/
> };
>
> +static struct dualpi2_skb_cb *dualpi2_skb_cb(struct sk_buff *skb)
> +{
> + qdisc_cb_private_validate(skb, sizeof(struct dualpi2_skb_cb));
> + return (struct dualpi2_skb_cb *)qdisc_skb_cb(skb)->data;
> +}
> +
> +static u64 dualpi2_sojourn_time(struct sk_buff *skb, u64 reference)
> +{
> + return reference - dualpi2_skb_cb(skb)->ts;
> +}
> +
> +static u64 head_enqueue_time(struct Qdisc *q)
> +{
> + struct sk_buff *skb = qdisc_peek_head(q);
> +
> + return skb ? dualpi2_skb_cb(skb)->ts : 0;
> +}
> +
> static u32 dualpi2_scale_alpha_beta(u32 param)
> {
> u64 tmp = ((u64)param * MAX_PROB >> ALPHA_BETA_SCALING);
> @@ -136,6 +172,25 @@ static ktime_t next_pi2_timeout(struct dualpi2_sched_data *q)
> return ktime_add_ns(ktime_get_ns(), q->pi2_tupdate);
> }
>
> +static bool skb_is_l4s(struct sk_buff *skb)
> +{
> + return dualpi2_skb_cb(skb)->classified == DUALPI2_C_L4S;
> +}
> +
> +static bool skb_in_l_queue(struct sk_buff *skb)
> +{
> + return dualpi2_skb_cb(skb)->classified != DUALPI2_C_CLASSIC;
> +}
> +
> +static bool dualpi2_mark(struct dualpi2_sched_data *q, struct sk_buff *skb)
> +{
> + if (INET_ECN_set_ce(skb)) {
> + q->ecn_mark++;
> + return true;
> + }
> + return false;
> +}
> +
> static void dualpi2_reset_c_protection(struct dualpi2_sched_data *q)
> {
> q->c_protection_credit = q->c_protection_init;
> @@ -155,6 +210,398 @@ static void dualpi2_calculate_c_protection(struct Qdisc *sch,
> dualpi2_reset_c_protection(q);
> }
>
> +static bool dualpi2_roll(u32 prob)
> +{
> + return get_random_u32() <= prob;
> +}
> +
> +/* Packets in the C-queue are subject to a marking probability pC, which is the
> + * square of the internal PI probability (i.e., have an overall lower mark/drop
> + * probability). If the qdisc is overloaded, ignore ECT values and only drop.
> + *
> + * Note that this marking scheme is also applied to L4S packets during overload.
> + * Return true if packet dropping is required in C queue
> + */
> +static bool dualpi2_classic_marking(struct dualpi2_sched_data *q,
> + struct sk_buff *skb, u32 prob,
> + bool overload)
> +{
> + if (dualpi2_roll(prob) && dualpi2_roll(prob)) {
> + if (overload || dualpi2_skb_cb(skb)->ect == INET_ECN_NOT_ECT)
> + return true;
> + dualpi2_mark(q, skb);
> + }
> + return false;
> +}
> +
> +/* Packets in the L-queue are subject to a marking probability pL given by the
> + * internal PI probability scaled by the coupling factor.
> + *
> + * On overload (i.e., @local_l_prob is >= 100%):
> + * - if the qdisc is configured to trade losses to preserve latency (i.e.,
> + * @q->drop_overload), apply classic drops first before marking.
> + * - otherwise, preserve the "no loss" property of ECN at the cost of queueing
> + * delay, eventually resulting in taildrop behavior once sch->limit is
> + * reached.
> + * Return true if packet dropping is required in L queue
> + */
> +static bool dualpi2_scalable_marking(struct dualpi2_sched_data *q,
> + struct sk_buff *skb,
> + u64 local_l_prob, u32 prob,
> + bool overload)
> +{
> + if (overload) {
> + /* Apply classic drop */
> + if (!q->drop_overload ||
> + !(dualpi2_roll(prob) && dualpi2_roll(prob)))
> + goto mark;
> + return true;
> + }
> +
> + /* We can safely cut the upper 32b as overload==false */
> + if (dualpi2_roll(local_l_prob)) {
> + /* Non-ECT packets could have classified as L4S by filters. */
> + if (dualpi2_skb_cb(skb)->ect == INET_ECN_NOT_ECT)
> + return true;
> +mark:
> + dualpi2_mark(q, skb);
> + }
> + return false;
> +}
> +
> +/* Decide whether a given packet must be dropped (or marked if ECT), according
> + * to the PI2 probability.
> + *
> + * Never mark/drop if we have a standing queue of less than 2 MTUs.
> + */
> +static bool must_drop(struct Qdisc *sch, struct dualpi2_sched_data *q,
> + struct sk_buff *skb)
> +{
> + u64 local_l_prob;
> + bool overload;
> + u32 prob;
> +
> + if (sch->qstats.backlog < 2 * psched_mtu(qdisc_dev(sch)))
> + return false;
> +
> + prob = READ_ONCE(q->pi2_prob);
> + local_l_prob = (u64)prob * q->coupling_factor;
> + overload = local_l_prob > MAX_PROB;
> +
> + switch (dualpi2_skb_cb(skb)->classified) {
> + case DUALPI2_C_CLASSIC:
> + return dualpi2_classic_marking(q, skb, prob, overload);
> + case DUALPI2_C_L4S:
> + return dualpi2_scalable_marking(q, skb, local_l_prob, prob,
> + overload);
> + default: /* DUALPI2_C_LLLL */
> + return false;
> + }
> +}
> +
> +static void dualpi2_read_ect(struct sk_buff *skb)
> +{
> + struct dualpi2_skb_cb *cb = dualpi2_skb_cb(skb);
> + int wlen = skb_network_offset(skb);
> +
> + switch (skb_protocol(skb, true)) {
> + case htons(ETH_P_IP):
> + wlen += sizeof(struct iphdr);
> + if (!pskb_may_pull(skb, wlen) ||
> + skb_try_make_writable(skb, wlen))
> + goto not_ecn;
> +
> + cb->ect = ipv4_get_dsfield(ip_hdr(skb)) & INET_ECN_MASK;
> + break;
> + case htons(ETH_P_IPV6):
> + wlen += sizeof(struct ipv6hdr);
> + if (!pskb_may_pull(skb, wlen) ||
> + skb_try_make_writable(skb, wlen))
> + goto not_ecn;
> +
> + cb->ect = ipv6_get_dsfield(ipv6_hdr(skb)) & INET_ECN_MASK;
> + break;
> + default:
> + goto not_ecn;
> + }
> + return;
> +
> +not_ecn:
> + /* Non pullable/writable packets can only be dropped hence are
> + * classified as not ECT.
> + */
> + cb->ect = INET_ECN_NOT_ECT;
> +}
> +
> +static int dualpi2_skb_classify(struct dualpi2_sched_data *q,
> + struct sk_buff *skb)
> +{
> + struct dualpi2_skb_cb *cb = dualpi2_skb_cb(skb);
> + struct tcf_result res;
> + struct tcf_proto *fl;
> + int result;
> +
> + dualpi2_read_ect(skb);
> + if (cb->ect & q->ecn_mask) {
> + cb->classified = DUALPI2_C_L4S;
> + return NET_XMIT_SUCCESS;
> + }
> +
> + if (TC_H_MAJ(skb->priority) == q->sch->handle &&
> + TC_H_MIN(skb->priority) < __DUALPI2_C_MAX) {
> + cb->classified = TC_H_MIN(skb->priority);
> + return NET_XMIT_SUCCESS;
> + }
> +
> + fl = rcu_dereference_bh(q->tcf_filters);
> + if (!fl) {
> + cb->classified = DUALPI2_C_CLASSIC;
> + return NET_XMIT_SUCCESS;
> + }
> +
> + result = tcf_classify(skb, NULL, fl, &res, false);
> + if (result >= 0) {
> +#ifdef CONFIG_NET_CLS_ACT
> + switch (result) {
> + case TC_ACT_STOLEN:
> + case TC_ACT_QUEUED:
> + case TC_ACT_TRAP:
> + return NET_XMIT_SUCCESS | __NET_XMIT_STOLEN;
> + case TC_ACT_SHOT:
> + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> + }
> +#endif
> + cb->classified = TC_H_MIN(res.classid) < __DUALPI2_C_MAX ?
> + TC_H_MIN(res.classid) : DUALPI2_C_CLASSIC;
> + }
> + return NET_XMIT_SUCCESS;
> +}
> +
> +static int dualpi2_enqueue_skb(struct sk_buff *skb, struct Qdisc *sch,
> + struct sk_buff **to_free)
> +{
> + struct dualpi2_sched_data *q = qdisc_priv(sch);
> + struct dualpi2_skb_cb *cb;
> +
> + if (unlikely(qdisc_qlen(sch) >= sch->limit) ||
> + unlikely((u64)q->memory_used + skb->truesize > q->memory_limit)) {
> + qdisc_qstats_overlimit(sch);
> + if (skb_in_l_queue(skb))
> + qdisc_qstats_overlimit(q->l_queue);
> + return qdisc_drop_reason(skb, sch, to_free,
> + SKB_DROP_REASON_QDISC_OVERLIMIT);
> + }
> +
> + if (q->drop_early && must_drop(sch, q, skb)) {
> + qdisc_drop_reason(skb, sch, to_free,
> + SKB_DROP_REASON_QDISC_OVERLIMIT);
I think it's better to use a different drop reason here or it will be
hard to distinguish between packets dropped here or above.
Possibly SKB_DROP_REASON_QDISC_CONGESTED.
> + return NET_XMIT_SUCCESS | __NET_XMIT_BYPASS;
> + }
> +
> + cb = dualpi2_skb_cb(skb);
> + cb->ts = ktime_get_ns();
> + q->memory_used += skb->truesize;
> + if (q->memory_used > q->max_memory_used)
> + q->max_memory_used = q->memory_used;
> +
> + if (qdisc_qlen(sch) > q->maxq)
> + q->maxq = qdisc_qlen(sch);
> +
> + if (skb_in_l_queue(skb)) {
> + /* Only apply the step if a queue is building up */
> + dualpi2_skb_cb(skb)->apply_step = skb_is_l4s(skb) &&
> + qdisc_qlen(q->l_queue) >= q->min_qlen_step;
> + /* Keep the overall qdisc stats consistent */
> + ++sch->q.qlen;
> + qdisc_qstats_backlog_inc(sch, skb);
> + ++q->packets_in_l;
> + if (!q->l_head_ts)
> + q->l_head_ts = cb->ts;
This chunck is very hard to read. Please insert a black line before the
2nd comment and consider introducing an helper to compute the
'apply_step' value.
> + return qdisc_enqueue_tail(skb, q->l_queue);
> + }
> + ++q->packets_in_c;
> + if (!q->c_head_ts)
> + q->c_head_ts = cb->ts;
> + return qdisc_enqueue_tail(skb, sch);
> +}
> +
> +/* By default, dualpi2 will split GSO skbs into independent skbs and enqueue
> + * each of those individually. This yields the following benefits, at the
> + * expense of CPU usage:
> + * - Finer-grained AQM actions as the sub-packets of a burst no longer share the
> + * same fate (e.g., the random mark/drop probability is applied individually)
> + * - Improved precision of the starvation protection/WRR scheduler at dequeue,
> + * as the size of the dequeued packets will be smaller.
> + */
> +static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> + struct sk_buff **to_free)
> +{
> + struct dualpi2_sched_data *q = qdisc_priv(sch);
> + int err;
> +
> + err = dualpi2_skb_classify(q, skb);
> + if (err != NET_XMIT_SUCCESS) {
> + if (err & __NET_XMIT_BYPASS)
> + qdisc_qstats_drop(sch);
> + __qdisc_drop(skb, to_free);
> + return err;
> + }
> +
> + if (q->split_gso && skb_is_gso(skb)) {
> + netdev_features_t features;
> + struct sk_buff *nskb, *next;
> + int cnt, byte_len, orig_len;
> + int err;
> +
> + features = netif_skb_features(skb);
> + nskb = skb_gso_segment(skb, features & ~NETIF_F_GSO_MASK);
> + if (IS_ERR_OR_NULL(nskb))
> + return qdisc_drop(skb, sch, to_free);
> +
> + cnt = 1;
> + byte_len = 0;
> + orig_len = qdisc_pkt_len(skb);
> + skb_list_walk_safe(nskb, nskb, next) {
> + skb_mark_not_on_list(nskb);
> + qdisc_skb_cb(nskb)->pkt_len = nskb->len;
> + dualpi2_skb_cb(nskb)->classified =
> + dualpi2_skb_cb(skb)->classified;
Possibly just:
*qdisc_skb_cb(nskb) = *dualpi2_skb_cb(skb);
> + dualpi2_skb_cb(nskb)->ect = dualpi2_skb_cb(skb)->ect;
> + err = dualpi2_enqueue_skb(nskb, sch, to_free);
> + if (err == NET_XMIT_SUCCESS) {
> + /* Compute the backlog adjustment that needs
> + * to be propagated in the qdisc tree to reflect
> + * all new skbs successfully enqueued.
> + */
> + ++cnt;
> + byte_len += nskb->len;
> + }
> + }
> + if (err == NET_XMIT_SUCCESS) {
> + /* The caller will add the original skb stats to its
> + * backlog, compensate this.
> + */
> + --cnt;
> + byte_len -= orig_len;
> + }
> + qdisc_tree_reduce_backlog(sch, -cnt, -byte_len);
> + consume_skb(skb);
> + return err;
Using the return value from the last enqueue operation is IMHO
inaccurate and confusing. Instead you could return NET_XMIT_SUCCESS if
at least a skb is enqueued successfully, or possibly always
unconditionally NET_XMIT_SUCCESS (will simplify the code a bit)
> + }
> + return dualpi2_enqueue_skb(skb, sch, to_free);
> +}
> +
> +/* Select the queue from which the next packet can be dequeued, ensuring that
> + * neither queue can starve the other with a WRR scheduler.
> + *
> + * The sign of the WRR credit determines the next queue, while the size of
> + * the dequeued packet determines the magnitude of the WRR credit change. If
> + * either queue is empty, the WRR credit is kept unchanged.
> + *
> + * As the dequeued packet can be dropped later, the caller has to perform the
> + * qdisc_bstats_update() calls.
> + */
> +static struct sk_buff *dequeue_packet(struct Qdisc *sch,
> + struct dualpi2_sched_data *q,
> + int *credit_change,
> + u64 now)
> +{
> + struct sk_buff *skb = NULL;
> + int c_len;
> +
> + *credit_change = 0;
> + c_len = qdisc_qlen(sch) - qdisc_qlen(q->l_queue);
> + if (qdisc_qlen(q->l_queue) && (!c_len || q->c_protection_credit <= 0)) {
> + skb = __qdisc_dequeue_head(&q->l_queue->q);
> + WRITE_ONCE(q->l_head_ts, head_enqueue_time(q->l_queue));
> + if (c_len)
> + *credit_change = q->c_protection_wc;
> + qdisc_qstats_backlog_dec(q->l_queue, skb);
Please add an empty line here.
> + /* Keep the global queue size consistent */
> + --sch->q.qlen;
> + q->memory_used -= skb->truesize;
> + } else if (c_len) {
> + skb = __qdisc_dequeue_head(&sch->q);
> + WRITE_ONCE(q->c_head_ts, head_enqueue_time(sch));
> + if (qdisc_qlen(q->l_queue))
> + *credit_change = ~((s32)q->c_protection_wl) + 1;
> + q->memory_used -= skb->truesize;
> + } else {
> + dualpi2_reset_c_protection(q);
> + return NULL;
> + }
> + *credit_change *= qdisc_pkt_len(skb);
> + qdisc_qstats_backlog_dec(sch, skb);
> + return skb;
> +}
> +
> +static int do_step_aqm(struct dualpi2_sched_data *q, struct sk_buff *skb,
> + u64 now)
> +{
> + u64 qdelay = 0;
> +
> + if (q->step_in_packets)
> + qdelay = qdisc_qlen(q->l_queue);
> + else
> + qdelay = dualpi2_sojourn_time(skb, now);
> +
> + if (dualpi2_skb_cb(skb)->apply_step && qdelay > q->step_thresh) {
> + if (!dualpi2_skb_cb(skb)->ect)
> + /* Drop this non-ECT packet */
> + return 1;
Please add brackets and an empty line:
if (!dualpi2_skb_cb(skb)->ect) {
/* Drop this non-ECT packet */
return 1;
}
if (dualpi2_mark(q, skb))
/P
Powered by blists - more mailing lists