lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMnTTQ-BtS0EBqB-5yNAAmvk9r67oX7n7S0Ywhc23s49EQ@mail.gmail.com>
Date: Thu, 5 Dec 2024 07:40:17 -0500
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Martin Ottens <martin.ottens@....de>
Cc: Stephen Hemminger <stephen@...workplumber.org>, Cong Wang <xiyou.wangcong@...il.com>, 
	Jiri Pirko <jiri@...nulli.us>, "David S. Miller" <davem@...emloft.net>, 
	Eric Dumazet <edumazet@...gle.com>, Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Simon Horman <horms@...nel.org>, netdev@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2] net/sched: netem: account for backlog updates from
 child qdisc

On Wed, Dec 4, 2024 at 7:29 AM Martin Ottens <martin.ottens@....de> wrote:
>
> In general, 'qlen' of any classful qdisc should keep track of the
> number of packets that the qdisc itself and all of its children holds.
> In case of netem, 'qlen' only accounts for the packets in its internal
> tfifo. When netem is used with a child qdisc, the child qdisc can use
> 'qdisc_tree_reduce_backlog' to inform its parent, netem, about created
> or dropped SKBs. This function updates 'qlen' and the backlog statistics
> of netem, but netem does not account for changes made by a child qdisc.
> 'qlen' then indicates the wrong number of packets in the tfifo.
> If a child qdisc creates new SKBs during enqueue and informs its parent
> about this, netem's 'qlen' value is increased. When netem dequeues the
> newly created SKBs from the child, the 'qlen' in netem is not updated.
> If 'qlen' reaches the configured sch->limit, the enqueue function stops
> working, even though the tfifo is not full.
>
> Reproduce the bug:
> Ensure that the sender machine has GSO enabled. Configure netem as root
> qdisc and tbf as its child on the outgoing interface of the machine
> as follows:
> $ tc qdisc add dev <oif> root handle 1: netem delay 100ms limit 100
> $ tc qdisc add dev <oif> parent 1:0 tbf rate 50Mbit burst 1542 latency 50ms
>
> Send bulk TCP traffic out via this interface, e.g., by running an iPerf3
> client on the machine. Check the qdisc statistics:
> $ tc -s qdisc show dev <oif>
>
> tbf segments the GSO SKBs (tbf_segment) and updates the netem's 'qlen'.
> The interface fully stops transferring packets and "locks". In this case,
> the child qdisc and tfifo are empty, but 'qlen' indicates the tfifo is at
> its limit and no more packets are accepted.
>

Would be nice to see the before and after (your change) output of the
stats to illustrate

> This patch adds a counter for the entries in the tfifo. Netem's 'qlen' is
> only decreased when a packet is returned by its dequeue function, and not
> during enqueuing into the child qdisc. External updates to 'qlen' are thus
> accounted for and only the behavior of the backlog statistics changes. As
> in other qdiscs, 'qlen' then keeps track of  how many packets are held in
> netem and all of its children. As before, sch->limit remains as the
> maximum number of packets in the tfifo. The same applies to netem's
> backlog statistics.
> (Note: the 'backlog' byte-statistic of netem is incorrect in the example
> above even after the patch is applied due to a bug in tbf. See my
> previous patch ([PATCH] net/sched: tbf: correct backlog statistic for
> GSO packets)).
>
> Signed-off-by: Martin Ottens <martin.ottens@....de>

Fixes missing - probably since 2.6.x. Stephen?

Your fix seems reasonable but I am curious: does this only happen with
TCP? If yes, perhaps the
GSO handling maybe contributing?
Can you run iperf with udp and see if the issue shows up again? Or
ping -f with size 1024.
You can slow down tbf for example with something like: tbf rate 20kbit
buffer 1600 limit 3000

cheers,
jamal


> ---
>  net/sched/sch_netem.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fe6fed291a7b..71ec9986ed37 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -79,6 +79,8 @@ struct netem_sched_data {
>         struct sk_buff  *t_head;
>         struct sk_buff  *t_tail;
>
> +       u32 t_len;
> +
>         /* optional qdisc for classful handling (NULL at netem init) */
>         struct Qdisc    *qdisc;
>
> @@ -383,6 +385,7 @@ static void tfifo_reset(struct Qdisc *sch)
>         rtnl_kfree_skbs(q->t_head, q->t_tail);
>         q->t_head = NULL;
>         q->t_tail = NULL;
> +       q->t_len = 0;
>  }
>
>  static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
> @@ -412,6 +415,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
>                 rb_link_node(&nskb->rbnode, parent, p);
>                 rb_insert_color(&nskb->rbnode, &q->t_root);
>         }
> +       q->t_len++;
>         sch->q.qlen++;
>  }
>
> @@ -518,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>                         1<<get_random_u32_below(8);
>         }
>
> -       if (unlikely(sch->q.qlen >= sch->limit)) {
> +       if (unlikely(q->t_len >= sch->limit)) {
>                 /* re-link segs, so that qdisc_drop_all() frees them all */
>                 skb->next = segs;
>                 qdisc_drop_all(skb, sch, to_free);
> @@ -702,8 +706,8 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>  tfifo_dequeue:
>         skb = __qdisc_dequeue_head(&sch->q);
>         if (skb) {
> -               qdisc_qstats_backlog_dec(sch, skb);
>  deliver:
> +               qdisc_qstats_backlog_dec(sch, skb);
>                 qdisc_bstats_update(sch, skb);
>                 return skb;
>         }
> @@ -719,8 +723,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
>                 if (time_to_send <= now && q->slot.slot_next <= now) {
>                         netem_erase_head(q, skb);
> -                       sch->q.qlen--;
> -                       qdisc_qstats_backlog_dec(sch, skb);
> +                       q->t_len--;
>                         skb->next = NULL;
>                         skb->prev = NULL;
>                         /* skb->dev shares skb->rbnode area,
> @@ -747,16 +750,21 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>                                         if (net_xmit_drop_count(err))
>                                                 qdisc_qstats_drop(sch);
>                                         qdisc_tree_reduce_backlog(sch, 1, pkt_len);
> +                                       sch->qstats.backlog -= pkt_len;
> +                                       sch->q.qlen--;
>                                 }
>                                 goto tfifo_dequeue;
>                         }
> +                       sch->q.qlen--;
>                         goto deliver;
>                 }
>
>                 if (q->qdisc) {
>                         skb = q->qdisc->ops->dequeue(q->qdisc);
> -                       if (skb)
> +                       if (skb) {
> +                               sch->q.qlen--;
>                                 goto deliver;
> +                       }
>                 }
>
>                 qdisc_watchdog_schedule_ns(&q->watchdog,
> @@ -766,8 +774,10 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
>
>         if (q->qdisc) {
>                 skb = q->qdisc->ops->dequeue(q->qdisc);
> -               if (skb)
> +               if (skb) {
> +                       sch->q.qlen--;
>                         goto deliver;
> +               }
>         }
>         return NULL;
>  }
> --
> 2.39.5
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ