[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080806215258.GA3306@ami.dom.local>
Date: Wed, 6 Aug 2008 23:52:59 +0200
From: Jarek Poplawski <jarkao2@...il.com>
To: Jussi Kivilinna <jussi.kivilinna@...et.fi>
Cc: David Miller <davem@...emloft.net>, kaber@...sh.net,
netdev@...r.kernel.org
Subject: Re: qdisc_enqueue, NET_XMIT_SUCCESS and kfree_skb (Was: Re: [PATCH
take 2] net_sched: Add qdisc __NET_XMIT_BYPASS flag)
On Wed, Aug 06, 2008 at 10:42:48PM +0300, Jussi Kivilinna wrote:
...
> Ok, I went throught all enqueue (and requeue) functions for any case of
> freeing skb and returning full NET_XMIT_SUCCESS without new flags and
> found only in sch_blackhole (qdisc_drop + return NET_XMIT_SUCCESS).
Very interesting observation. Probably mostly theoretical (I wonder
how many people use this). There is a question if this code can be
returned in such a case? noop returns NET_XMIT_CN, which looks safer,
but maybe this is an exception? I don't know. Anyway, if it happens
e.g. with forwarded skb it looks like reading after kfree.
> This could be fixed by delaying kfree_skb to exit on qdisc_enqueue_root,
> here's (completely untested) patch:
> ---
> diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> index a7abfda..ca083c6 100644
> --- a/include/net/sch_generic.h
> +++ b/include/net/sch_generic.h
> @@ -175,6 +175,7 @@ struct tcf_proto
>
> struct qdisc_skb_cb {
> unsigned int pkt_len;
> + __u8 delayed_enqueue_free:1;
> char data[];
> };
>
> @@ -364,10 +365,23 @@ static inline int qdisc_enqueue(struct sk_buff
> *skb, struct Qdisc *sch)
> return sch->enqueue(skb, sch);
> }
>
> +static inline void qdisc_delayed_kfree_skb(struct sk_buff *skb)
> +{
> + qdisc_skb_cb(skb)->delayed_enqueue_free = 1;
> +}
> +
> static inline int qdisc_enqueue_root(struct sk_buff *skb, struct Qdisc *sch)
> {
> + int ret;
> +
> + qdisc_skb_cb(skb)->delayed_enqueue_free = 0;
> qdisc_skb_cb(skb)->pkt_len = skb->len;
> - return qdisc_enqueue(skb, sch) & NET_XMIT_MASK;
> + ret = qdisc_enqueue(skb, sch);
> +
> + if (ret == NET_XMIT_SUCCESS &&
> qdisc_skb_cb(skb)->delayed_enqueue_free)
> + kfree_skb(skb);
> +
> + return ret & NET_XMIT_MASK;
> }
>
> static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct
> Qdisc *sch,
> diff --git a/net/sched/sch_blackhole.c b/net/sched/sch_blackhole.c
> index 507fb48..13230bd 100644
> --- a/net/sched/sch_blackhole.c
> +++ b/net/sched/sch_blackhole.c
> @@ -19,7 +19,8 @@
>
> static int blackhole_enqueue(struct sk_buff *skb, struct Qdisc *sch)
> {
> - qdisc_drop(skb, sch);
> + qdisc_delayed_kfree_skb(skb);
> + sch->qstats.drops++;
> return NET_XMIT_SUCCESS;
> }
> ---
>
> If this isn't good way to solve this, qdisc_pkt_len use for stats could be
> fixed with either passing packet length pointer throught qdisc tree or adding
> new qdisc_pkt_len_diff and adding difference in at dequeue as you said
> (but here
> inner dequeue could return NULL and difference wouldn't be added after all but
> well it is just stats).
I doubt that such a rare case should change the way all packets are
treated, but if so, there probably could be used one of these new
__NET_XMIT flags for this.
> As I went throught code I found two cases where skb pointer is used
> after inner
> enqueue with full NET_XMIT_SUCCESS (other than qdisc_pkt_len for stats): HTB
> uses skb_is_gso(), HFSC uses packet length for set_active(). HTB is trivial
> (for me) to fix while HFSC isn't. Because HFSC part it would be easier for me
> to declare full NET_XMIT_SUCCESS as safe zone for skb pointer.
I guess some wiser guys should decide how serious problem it is.
>
> - Jussi
>
> PS. I noticed something fishy in HTB; HTB always returns NET_XMIT_DROP if
> qdisc_enqueue doesn't return full NET_XMIT_SUCCESS, shouldn't it return return
> value from qdisc_enqueue. Same in HTB requeue. That can't be right, right?
>
Yes, very good point, and quite hard to diagnose bug - happily solved
already (but not fixed yet) by David Miller himself.
Jarek P.
PS: it seems your mailer wrapped some lines of above patch.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists