[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87ipvlosvz.fsf_-_@cruithne.co.teklibre.org>
Date: Mon, 14 Mar 2011 23:27:28 -0600
From: d@...t.net (Dave Täht)
To: Eric Dumazet <eric.dumazet@...il.com>
Cc: Jonathan Morton <chromatix99@...il.com>,
David Miller <davem@...emloft.net>,
netdev <netdev@...r.kernel.org>
Subject: ECN + pfifo_fast borked? (Was Re: [Bloat] shaper team forming up)
Eric Dumazet <eric.dumazet@...il.com> writes:
> Le lundi 14 mars 2011 à 21:24 +0100, Eric Dumazet a écrit :
>
> remove CC to bloat lists for now, adding David Miller to thread.
>
>> Le lundi 14 mars 2011 à 21:55 +0200, Jonathan Morton a écrit :
>> > On 14 Mar, 2011, at 9:26 pm, Dave Täht wrote:
>> >
>> > > Over the weekend, Dan Siemon uncovered a possible bad interaction
>> > > between ECN and the default pfifo_fast qdisc in Linux.
>> > >
>> > > http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/
>> >
Totally irrelevant bit elided
>> CC netdev, where linux network dev can take a look.
>>
>> I would say that this is a wrong analysis :
>>
>> 1) ECN uses two low order bits of TOS byte
>>
>> 2) pfifo_fast uses skb->priority
>>
>>
>> skb->priority = rt_tos2priority(iph->tos);
>>
>> #define IPTOS_TOS_MASK 0x1E
>> #define IPTOS_TOS(tos) ((tos)&IPTOS_TOS_MASK)
>>
>> static inline char rt_tos2priority(u8 tos)
>> {
>> return ip_tos2prio[IPTOS_TOS(tos)>>1];
>> }
>>
>> No interference between two mechanisms, unless sysadmin messed up things
>> (skb_edit)
>>
>>
>
> David, it seems ip_tos2prio is wrong on its 2nd entry :
>
> #define TC_PRIO_BESTEFFORT 0
> #define TC_PRIO_FILLER 1
> #define TC_PRIO_BULK 2
> #define TC_PRIO_INTERACTIVE_BULK 4
> #define TC_PRIO_INTERACTIVE 6
> #define TC_PRIO_CONTROL 7
>
> #define TC_PRIO_MAX 15
>
> net/ipv4/route.c:170:#define ECN_OR_COST(class) TC_PRIO_##class
>
> const __u8 ip_tos2prio[16] = {
> TC_PRIO_BESTEFFORT, /* 0 : for flow without ECN */
> ECN_OR_COST(FILLER), /* 1 : flow with ECN */
> ...
> };
>
>
>
>
> This means ECN enabled flows got TC_PRIO_FILLER (what the hell is
> that ?)
>
> pfifo_fast has :
>
> static const u8 prio2band[TC_PRIO_MAX+1] =
> { 1, 2, 2, 2, 1, 2, 0, 0 , 1, 1, 1, 1, 1, 1, 1, 1 };
>
> So a non ECN enabled flow goes to band 1, while an ECN enabled one is in
> band 2 (!). Thus, ECN enabled flows have a chance being droped more
> often than non ECN flows. Thats not fair...
>
> What do you think ?
Well, that makes 3 of us that think it's wrong. Can we get more?
(I'll run through the math again in the morning)
It's most often not actually "enablement" but "assertion", when for
example an ECN bit is put on an ACK packet (by an application, or qdisc)
, it drops that ACK packet into the 2 queue - leaving all the other
non-ECN asserted packets in that flow to flow out ahead of it.
Or so dan siemon & I & now you, think. It's late and I really want to recheck
the math and the shifts in the morning. However, if true... this would
explain much ECN related weirdness precisely where it has been hard to
measure, on heavily loaded systems.
>
> Thanks
>
> diff --git a/net/ipv4/route.c b/net/ipv4/route.c
> index 6ed6603..fabfe81 100644
> --- a/net/ipv4/route.c
> +++ b/net/ipv4/route.c
> @@ -171,7 +171,7 @@ static struct dst_ops ipv4_dst_ops = {
>
> const __u8 ip_tos2prio[16] = {
> TC_PRIO_BESTEFFORT,
> - ECN_OR_COST(FILLER),
> + ECN_OR_COST(BESTEFFORT),
> TC_PRIO_BESTEFFORT,
> ECN_OR_COST(BESTEFFORT),
> TC_PRIO_BULK,
>
I think this is a good short term fix, but it will mildly upset people
that actually still use minimum cost and don't use ECN. That said,
RFC1349 has been obsolete for a decade now, and ECN enabled servers are
at 12% penetration according to MIT.
Still, long term, doing a sch_pfifo_dscp that would be fully compliant
with the relevant modern RFCs and eventually making that the standard
would be good.
--
Dave Taht
http://nex-6.taht.net
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists