netdev - Re: [PATCH REPOST 1/2] NET: Accurate packet scheduling for ATM/ADSL (kernel)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <456ED79E.2070707@trash.net>
Date:	Thu, 30 Nov 2006 14:07:42 +0100
From:	Patrick McHardy <kaber@...sh.net>
To:	Russell Stuart <russell-tcatm@...art.id.au>
CC:	hadi@...erus.ca, netdev@...r.kernel.org,
	David Miller <davem@...emloft.net>,
	Jesper Dangaard Brouer <hawk@...u.dk>
Subject: Re: [PATCH REPOST 1/2] NET: Accurate	packet	scheduling	for	ATM/ADSL
 (kernel)

First, sorry for letting you wait so long ..

Russell Stuart wrote:
> On Tue, 2006-10-24 at 18:19 +0200, Patrick McHardy wrote:
> 
>>No, my patch works for qdiscs with and without RTABs, this
>>is where they overlap.
> 
> 
> Could you explain how this works?  I didn't see how
> qdiscs that used RTAB to measure rates of transmission 
> could use your STAB to do the same thing.  At least not
> without substantial modifications to your patch.

Qdiscs don't use RTABs to measure rates but to calculate
transmission times. Transmission time is always related
to the length, the difference between our patches is that
you modify the RTABs in advance to include the overhead
in the calculation, my patch changes the length used to
look up the transmission time. Which works with or
without RTABs.

>>No, as we already discussed, SFQ uses the packet size for
>>calculating remaining quanta, and fairness would increase
>>if the real transmission size (and time) were used. RED
>>uses the backlog size to calculate the drop probabilty
>>(and supports attaching inner qdiscs nowadays), so keeping
>>accurate backlog statistics seems to be a win as well
>>(besides their use for estimators). It is also possible
>>to specify the maximum latency for TBF instead of a byte
>>limit (which is passed as max. backlog value to the inner
>>bfifo qdisc), this would also need accurate backlog statistics.
> 
> 
> This is all beside the point if you can show how
> you patch gets rid of RTAB - currently I am acting
> under the assumption it doesn't.  If it does you
> get all you describe for free.

Why?

> Otherwise - yes, you are correct.  The ATM patch does
> not introduce accurate packet lengths into the kernel,
> which is what is required to give the benefits you
> describe.  But that was never the ATM patches goal.
> The ATM patch gives accurate rate calculations for ATM
> links, nothing more.  Accurate packet length calculations
> is apparently the goal of your patch, and I wish you 
> luck with it.

Again, its not rate calculations but transmission time
calculations, which _are a function of the length_.

>>Ethernet, VLAN, Tunnels, ... its especially useful for tunnels
>>if you also shape on the underlying device since the qdisc
>>on the tunnel device and the qdisc on the underlying device
>>should ideally be in sync (otherwise no accurate bandwidth
>>reservation is possible).
> 
> 
> Hmmm - not as far as I am aware.  In all those cases
> the IP layer breaks up the data into MTU sized packets
> before they get to the scheduler.  ATM is the only
> technology I am known of where setting the MTU to be
> bigger than the end to end link can support is normal.

Thats not the point. If I want to do scheduling on the
ipip device and on the underlying device at the same
time I need to reserve the amount of bandwidth given to
the ipip device + the bandwidth uses for encapsulation
on the underlying device. The easy way to do this is
to use the same amount of bandwidth on both devices
and make the scheduler on the ipip device aware that
some overhead is going to be added. The hard way is
to calculate the worst case (bandwidth / minimum packet
size * overhead per packet) and add that on the
underlying device.

>>Either you or Jesper pointed to this code in iproute:
>>
>>        for (i=0; i<256; i++) {
>>                unsigned sz = (i<<cell_log);
>>...
>>                rtab[i] = tc_core_usec2tick(1000000*((double)sz/bps));
>>
>>which tends to underestimate the transmission time by using
>>the smallest possible size for each cell.
> 
> 
> Firstly, yes you are correct.  It will under some
> circumstances underestimate the number of cells it
> takes to send a packet.  The reason is because the 
> whole aim of the ATM patch was to make maximum use 
> of the ATM link, while at the same time keeping 
> control of scheduling decisions.  To keep control of
> scheduling decisions, we must _never_ overestimate 
> the speed of the link.  If we do the ISP will take 
> control of the scheduling.

Underestimating the transmission time is equivalent to
overestimating the rate.

> At first sight this seems a minor issue.  Its not, because
> the error can be large.  An example of overestimating the
> link speed would be were one RTAB entry covers both the
> 2 and 3 cell cases.  If we say the IP packet is going to
> use 2 cells, and in fact it uses 3, then the error is 50%.
> This is a huge error, and in fact eliminating this error
> is the whole point of the ATM patch.
> 
> As an example of its impact, I was trying to make VOIP
> work over a shared link.  If the ISP starts making the
> scheduling decisions then VOIP packets start being
> dropped or delayed, rendering VOIP unusable.  So in
> order to use VOIP on the link I have to understate the
> link capacity by 50%.  As it happens, VOIP generates a
> stream of packets in the 2-3 cell size range, the actual
> size depending on the codec negotiated by the end points.
> 
> Jesper in his thesis gives perhaps an more important
> example what happens if you overestimate the link speed.
> It turns out in interacts with TCP's flow control badly,
> slowing down all TCP flows over the link.  The reasons
> are subtle so I won't go into it here.  But the end
> result is if you overestimate the link speed and let the
> ISP do the scheduling, you end up under-utilising the 
> ATM link.
> 
> So in the ATM patch there is a deliberate design decision -
> we always assign an RTAB entry the smallest cell size it 
> covers.  Originally Jesper and I wrote our own versions
> of the ATM patch independently, and we both made the same
> design decision - I presume for the same reason.
> 
> Secondly, and possibly more importantly, the ATM patch is
> designed so that a single RTAB entry always covers exactly
> one cell size.  So on a patched kernel the underestimate  
> never occurs - the rate returned by the RTAB is always
> exactly correct.  In fact, that aspect of it seems to cause 
> you the most trouble - the off by one error and so on. The 
> code you point out is only there so the new version of "tc" 
> also works as well as it can for non-patched kernels.

I'm not really convinced, but I mostly lost interest in this
in the mean time, so let me retract my NACK and let others
decide.
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html