[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20091102125345.3c39c42e@nehalam>
Date: Mon, 2 Nov 2009 12:53:45 -0800
From: Stephen Hemminger <shemminger@...tta.com>
To: Patrick McHardy <kaber@...sh.net>
Cc: Ryousei Takano <ryousei@...il.com>,
Linux Netdev List <netdev@...r.kernel.org>,
takano-ryousei@...t.go.jp
Subject: Re: HTB accuracy on 10GbE
On Mon, 02 Nov 2009 16:43:42 +0100
Patrick McHardy <kaber@...sh.net> wrote:
> Ryousei Takano wrote:
> > Hi Stephen and all,
> >
> > I have observed a HTB accuracy problem on the Linux kernel 2.6.30 and
> > the Myri-10G 10 GbE NIC.
> > HTB can control the transmission rate at Gigabit speed, however it can
> > not work well at 10 Gigabit speed.
> >
> > I asked Stephen this problem at Japan Linux Symposium. He mentioned a
> > HTB bug related to the timer granularity.
> > I want to know what is happen, and what should be do for fixing it.
> >
> > Any comments and suggestions will be welcome.
> >
> > For more detail, please see the following page:
> > http://code.google.com/p/pspacer/wiki/HTBon10GbE
>
> This is not an easy problem to fix. Userspace, the kernel and the
> netlink API use 32 bit for timing related values, which is too small
> to use more than microsecond resolution. All of them need to be
> converted to use bigger types, additionally some kind of compatibility
> handling to deal with old iproute versions still using microsecond
> resolution is required.
The existing API is a legacy mish-mash. The field is limited to 32 bits,
but it might be possible to use a finer scale.
Maybe if kernel advertised finer resolution through /proc/net/psched
then table could be finer grained. This would maintain compatibility
between kernel and user space. You would need to have new kernel and
new iproute to get nanosecond resolution but older combinations would
still work.
The downside is that by using nanosecond resolution the rates are upper
bounded at 4.2seconds / packet.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists