netdev - RE: [PATCH net-next v2] net: sched: fix skb leak in dev_requeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <6AADFAC011213A4C87B956458587ADB401315127@dggemi507-mbx.china.huawei.com>
Date:   Wed, 27 Dec 2017 08:02:59 +0000
From:   "weiyongjun (A)" <weiyongjun1@...wei.com>
To:     John Fastabend <john.fastabend@...il.com>,
        Jamal Hadi Salim <jhs@...atatu.com>,
        Cong Wang <xiyou.wangcong@...il.com>,
        Jiri Pirko <jiri@...nulli.us>
CC:     "netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: RE: [PATCH net-next v2] net: sched: fix skb leak in
 dev_requeue_skb()

On 12/27/2017 1:24 PM, John Fastabend wrote:
> On 12/24/2017 07:49 PM, Wei Yongjun wrote:
> > When dev_requeue_skb() is called with bluked skb list, only the
> > first skb of the list will be requeued to qdisc layer, and leak
> > the others without free them.
> >
> > TCP is broken due to skb leak since no free skb will be considered
> > as still in the host queue and never be retransmitted. This happend
> > when dev_requeue_skb() called from qdisc_restart().
> >   qdisc_restart
> >   |-- dequeue_skb
> >   |-- sch_direct_xmit()
> >       |-- dev_requeue_skb() <-- skb may bluked
> >
> > Fix dev_requeue_skb() to requeue the full bluked list.
> >
> > Fixes: a53851e2c321 ("net: sched: explicit locking in gso_cpu fallback")
> > Signed-off-by: Wei Yongjun <weiyongjun1@...wei.com>
> > ---
> > v1 -> v2: add net-next prefix
> > ---
> 
> First, thanks for tracking this down.
> 
> > diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> > index 981c08f..0df2dbf 100644
> > --- a/net/sched/sch_generic.c
> > +++ b/net/sched/sch_generic.c
> > @@ -111,10 +111,16 @@ static inline void
> qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
> >
> >  static inline int __dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
> >  {
> > -	__skb_queue_head(&q->gso_skb, skb);
> > -	q->qstats.requeues++;
> > -	qdisc_qstats_backlog_inc(q, skb);
> > -	q->q.qlen++;	/* it's still part of the queue */
> > +	while (skb) {
> > +		struct sk_buff *next = skb->next;
> > +
> > +		__skb_queue_tail(&q->gso_skb, skb);
> 
> Was the change from __skb_queue_head to __skb_queue_tail here
> intentional? We should re-queue packets to the head of the list.

skb is dequeue from gso_skb in order, so if packets re-queue to the head
of the list, skb will be sent in reverse order, so I think __skb_queue_tail is
better here. Tcpdump show those skbs are out of order packets.

> 
> > +		q->qstats.requeues++;
> > +		qdisc_qstats_backlog_inc(q, skb);
> > +		q->q.qlen++;	/* it's still part of the queue */
> > +
> > +		skb = next;
> > +	}
> >  	__netif_schedule(q);
> >
> >  	return 0;
> > @@ -124,13 +130,20 @@ static inline int dev_requeue_skb_locked(struct
> sk_buff *skb, struct Qdisc *q)
> >  {
> >  	spinlock_t *lock = qdisc_lock(q);
> >
> > -	spin_lock(lock);
> > -	__skb_queue_tail(&q->gso_skb, skb);
> > -	spin_unlock(lock);
> > +	while (skb) {
> > +		struct sk_buff *next = skb->next;
> > +
> > +		spin_lock(lock);
> 
> In this case I suspect its better to move the lock to be around the
> while loop rather than grab and drop it repeatedly. I don't have
> any data at this point so OK either way. Assuming other head/tail
> comment is addressed.

I will fix it in v3.

> 
> > +		__skb_queue_tail(&q->gso_skb, skb);
> 
> Same here *_tail should be *_head?
> 
> > +		spin_unlock(lock);
> > +
> > +		qdisc_qstats_cpu_requeues_inc(q);
> > +		qdisc_qstats_cpu_backlog_inc(q, skb);
> > +		qdisc_qstats_cpu_qlen_inc(q);
> > +
> > +		skb = next;
> > +	}
> >
> > -	qdisc_qstats_cpu_requeues_inc(q);
> > -	qdisc_qstats_cpu_backlog_inc(q, skb);
> > -	qdisc_qstats_cpu_qlen_inc(q);
> >  	__netif_schedule(q);
> >
> >  	return 0;
> >