lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cb00fa210911161209v384f883chd1b821ca40dde06a@mail.gmail.com>
Date:	Mon, 16 Nov 2009 17:09:42 -0300
From:	Ivo Calado <ivocalado@...edded.ufcg.edu.br>
To:	Gerrit Renker <gerrit@....abdn.ac.uk>, dccp <dccp@...r.kernel.org>,
	netdev <netdev@...r.kernel.org>
Subject: Re: Doubt in implementations of mean loss interval at sender side

On Mon, Nov 9, 2009 at 4:09 AM, Gerrit Renker <gerrit@....abdn.ac.uk> wrote:
> | > To sum up, here is whay I think is minimally required to satisfy the union
> | > of RFC 4340, 4342, 4828, 5348, and 5622:
> | >
> | >        struct tfrc_tx_packet_info {
> | >                u64     seqno:48,
> | >                        is_ect0:1,
> | >                        is_data_packet:1,
> | >                        is_in_loss_interval:1;
> | >                u32     send_time;
> | >                u32     rtt_estimate;
> | >                struct tfrc_tx_packet_info *next; /* FIFO */
> | >        };
> | >
> | > That would be a per-packet storage cost of about 16 bytes, plus the pointer
> | > (8 bytes on 64-bit architectures). One could avoid the pointer by defining a
> | >        u64     base_seqno;
> | > and then
> | >        struct tfrc_tx_packet_info[some constant here];
> | > and then index the array relative to the base_seqno.
> | >
> |
> | Yes, I believe that struct is enough too. But how long would be necessary
> | the struct array to be?
> |
> The problem is the same as with Ack Vectors - the array (or list) can grow
> arbitrarily large. You made a good reply, since all the questions are
> inter-related. The first two I see here are
>
>  1) the choice of data structure (array or list)
>  2) the design of a garbage-collector
>
> This includes your point from above, about the maximum size. To draw the
> analogy to Ack Vectors, at the moment they use a fixed size. On certain
> mediums (WiFi) there exist situations where even that fixed limit is
> reached, causing an overflow with Ack Vectors that have reached a size
> of 2 * 253 = 506 bytes.
>
> Looking after old data of sent packets is similar, the main difference I
> see that at some stage "unused" old entries need to be collected, to avoid
> the overflow problem which occurs when using a fixed-size structure.
>
> I find that 'Acknowledgments of Acknowledgments' is a bad idea, since it
> means implementing reliable delivery over unreliable transport; on the
> one hand DCCP is designed to be unreliable, but here suddenly is a break
> with that design decision.
>
> So point (2) will probably mean coming up with some guessed heuristics
> that will work for most scenarios, but may fail in others.
>
>
> This is why I am not a big fan of the sender-based solution: to solve (2) and
> your question from above requires a lot of testing of the garbage-collection
> and book-keeping algorithms, rather than on the actual congestion control.
>
> One can spend a lot of time going over these issues, but of what use is the
> most ingenious data structure if the overall protocol behavior does not
> deliver a useful performance to users of the protocol?

Yes, a sender-based implementation seems really complicated, mainly in
principle.

>
> | > IIb) Further remarks
> | > --------------------
> | > At first sight it would seem that storing the RTT also solves the problem
> | > of inaccurate RTTs used at the receiver. Unfortunately, this is not the
> | > case. X_recv is sampled over intervals of varying length which may or may
> | > not equal the RTT.  To factor out the effect of window counters, the sender
> | > would need to store the packet size as well and would need to use rather
> | > complicated computations - an ugly workaround.
> |
> | I didn't understand how the packet size would help and what
> | computations are needed.
> |
> The above still refers to the earlier posting about letting the sender
> supply the RTT estimate R_i in packet `i' as defined in RFC 5348, 3.2.1.
>
> Though the same section later suggests that a coarse-grained timestamp
> is sufficient, in practice the inaccuracy of the RTT means inaccurate
> X_recv, and as a consequence sub-optimal protocol performance.
>
> The problem is that the algorithm from RFC 4342, 8.1 assumes that the
> rate of change of the window counter also relates to the change of
> packet spacing (the difference between the T_i packe arrivel times).
>
> Howver, especially when using high-speed (1/10 Gbit) networking, this
> assumption often does not hold in practice. Packets are sent by the network
> card in bunches, or intervening switches/routers cause a compression of
> packet inter-arrival times. Hence it is perfectly possible that a bundle of
> packets with different Window Counter CCVal values arrive at virtually
> the same time. For instance, on a 100Mbs ethernet I have seen spikes of
> X_recv of up to 2-3 Gbits/sec. Several orders of magnitude from the
> real packet speed (not to mention the unrealistic value).
>
> So the question above was asking whether there is a way for the sender
> to "compute away" the inaccuracies reported by the receiver. Your reply
> confirms my doubts that doing this is probably not possible.
>
> To clarify, I was asking whether it would be possible for the sender to
> perform step (2) of RFC 5348, 6.2; to compensate for the fact that the
> receiver does not have a reliable RTT estimate.

I understand now the issue, thanks. Isn't better to just send the RTT estimate
to the sender, as the RFC says?

>
> For example, when receiving feedback for packet `i', it would iterate
> through the list/array, going back over as many packets as are covered
> by the send time T_i of packet `i' minus the RTT estimate R_i at that
> time, sum their packet sizes, and from that value recompute X_recv.
>
> This is a bit complicated if the garbage-collector has already purged
> older entries, so part of the garbage collector would probably have
> to watch over acknowledged packets. I add this as item
>  3) validate X_recv
> to the above running list of book-keeping items done at the sender.
>
>
> | > One thing I stumbled across while reading your code was the fact that RFC 4342
> | > leaves it open as to how many Loss Intervals to send: on the one hand it follows
> | > the suggestion of RFC 5348 to use 1+NINTERVAL=9, but on the other hand it does
> | > not restrict the number of loss intervals. Also RFC 5622 does not limit the
> | > number of Loss Intervals / Data Dropped options.
> | >
> | > If receiving n > 9 Loss Intervals, what does the sender do with the n-9 older
> | > intervals? There must be some mechanism to stop these options from growing
> | > beyond bounds, so it needs to store also which loss intervals have been
> | > acknowledged, introducing the "Acknowledgment of Acknowledgments"
> | > problem.
> |
> | In RFC 4342 section 8.6 it says that the limit of loss interval data
> | to send is 28, and RFC 5622 8.7 says 84 for dropped packets option.
> | But I don't see why to send so many data in these options.
> | Yes, the most recent 9 loss intervals are required to be reported,
> | except if the sender acknowledged previous sent loss intervals, so in
> | that case only one is required, the open interval.
> | And we can avoid the "Acknowledgment of Acknowledgments" if we always send
> | the required 9 loss intervals, I think.
> |
> | > A second point is how to compute the loss event rate when n > 9. It seems
> | > that this would mean grinding through all loss intervals using a window
> | > of 9. If that is the case, the per-packet-computation costs become very
> | > expensive.
> |
> | RFC 4342 section 8.6 suggests that only 9 loss intervals are required
> | anyway. And I believe that's enough for the computation of current
> | mean loss interval. What do you think?
> |
> Yes, absolutely, I am completely in favour of this very sensible suggestion.
>
> If people really must experiment with such outdated data, that could be
> done in truly experimental patches. Especially since RFC 5348 normatively
> recommends a value of n = 8 in section 5.4. And we are saving further
> headaches about the book-keeping/garbage collection of old data.
>
> | > II) Computational part of the implementation
> | > --------------------------------------------
> | > If only Loss Intervals alone are used, only these need to be verified
> | > before being used to alter the sender behaviour.
> | >
> | > But when one or more other DCCP options also appear, the verification is
> | >  * intra: make sure each received option is in itself consistent,
> | >  * inter: make sure options are mutually consistent.
> | >
> | > The second has a combinatorial effect, i.e. n! verifications for n options.
> | >
> <snip>
> |
> | Yes, there's a combinatorial problem in checking the options for consistence.
> | But, what if we find out that some option doesn't match against others?
> | What action would be taken?
> I add this as
>  4) define policy for dealing with options that are not mutually consistent
>
> | First, what can cause the receiver to send inconsistent options?
> | A bad implementation only?
> Yes I think that a bad implementation (whether on purpose or not) would be
> the main cause, since header options are protected even if partial
> checksums are used (RFC 4340, 9.2).
>
> But there is also the benign case mentioned at the end of RFC 4342, 9.2,
> where a receiver collapses multiple losses into a single loss event, i.e.
>  5) validate received Loss Intervals and regroup the receiver-based
>    information if necessary, without interpreting this as attempted
>    receiver misbehaviour.
>
> | Accordingly to ecn nonce echo sum algorithm, if a receiver is found to be
> | lying about loss or to be bad implemented, the sender adjusts the send rate
> | as if loss were perceived.
> | Can we do the same in this situation? If so, can we skip checking options
> | between them and only check ecn nonce sum?
> This is difficult since Ack Vectors and Loss Intervals use different
> definitions of ECN Nonce sum (last paragraph in RFC 4342, 9.1), i.e. we have
>  6) separate algorithms to compute Ack Vector/Loss Intervals ECN Nonce sum.
>
> With regard to (5) above, your suggestion gives
>  7) validate options, on mismatch other than (5) only validate ECN nonce.
>
> | If some option is wrong it show more loss (or any worse situation for the
> | receiver) or conceals loss. In the first case, I don't believe we need to care,
> | and in the second, the ecn nonce sum can reveal the bad acting of the receiver.
> Yes you are right, we need not worry if a receiver reports a higher loss rate
> than the verification done by the sender (which recomputes the data that the
> receiver already has computed) calculates.
>
> But for the second case, there is no guarantee to catch a misbehaving
> receiver, only a 50% chance at the end of many computations.

Isn't it 50% chance at each ecn verified? So, at the end we'll end up with 100%?

>
> RFC 4342, 9 suggests one way of verifying Loss Intervals / Ack Vectors:
>  5) occasionally do not send a packet, or send packet out of order.
>
> This increases complexity of book-keeping, the sender needs to keep track
> which of the sent packets was a fake send/drop. It also requires an algorithm
> to iterate over the sender data structures in order to find out whether the
> reasoning of the receiver is sane. I have doubts whether this can be done
> without sacrificing the performance of the in-kernel sender side.
>

I have doubts either. This seems to be too complicated and not much useful.

>
> | > III) Closing remarks in favour of receiver-based implementation
> | > ---------------------------------------------------------------
> | > Finally, both RFC 4342 and RFC 5622 do not explicitly discard the
> | > possibility of using a receiver-based implementation. Quoting
> | > RFC 4342, 3.2: "If it prefers, the sender can also use a loss event
> | >                rate calculated and reported by the receiver."
> | > Furthermore, the revised TFRC specification points out in section 7
> | > the advantages that a receiver-based implementation has:
> | >  * it does not mandate reliable delivery of packet loss data;
> | >  * it is robust against the loss of feedback packets;
> | >  * better suited for scalable server design.
> | >
> | > Quite likely, if the server does not have to store and validate a mass
> | > of data, it is also less prone to be toppled by DoS attacks.
> |
> | You're right. But what the RFC's says about it is almost exactly the
> | opposite, isn't? What can we do about it? I like the receiver-based design,
> | but I believe that loss intervals are interesting, mostly because  of
> | receiver behavior verification.
> |
> While writing the above reply, I was amazed to see how much of the computation
> that has already been done at the receiver needs to be done again at the sender,
> ust in order to be able to verify the data.
>
> To me this seems very inefficient.
>
> Moreover, the biggest danger I see here is spending a lot of time with the
> details of sender book-keeping and verification, just to then see that the
> performance of CCID-3/4 in practice turns out to be below the standards
> acceptable to even modest users.
>
> I think it is clearly better to prefer the simplest possible implementation
> in such cases, to better debug live protocol performance.
>
> In particular, since CCID-4 is designed to be an experimental protocol
> (i.e. if it works, RFC 5622 may mature into a Proposed Standard, if not,
> it might be superseded by a different specification).
>
> And I think that testing the actual user performance has the highest priority.

Yes, we can work more with a simpler implementation at the receiver
side and focus
on performance and test, and features too. After, we have a stable
version and good enough in performance terms,
 we can continue improving the sender side.

>
> The literature on the subject is almost exclusively done on experiences in
> ns-2 userland. Almost no Internet experiments at all have been done with DCCP.
>
> This is because the IPPROTO = 33 identifier needs to be entered
> especially into a firewall, which opens holes that firewall
> administrators don't like to open (unless the firewall is based on
> a recent Linux kernel, opening all ports for IP protocol identifier
> 33 is the only way of allowing DCCP traffic in/out of a firewall).
>
> In addition, most people use NAT at home, putting another obstacle
> on experiments. The result is then that tests are done in a lab testbed
> or in virtualisation - emulated networks.
>
> To conclude, I still think that the simpler, receiver-based implementation
> gives a better start. A 'perfect' receiver implementation is also a good
> reference point to start protocol evaluation: if the performance is bad
> despite getting things right at the receiver, then other parts of the
> protocol need investigation/improvement.

I agree. It's a risk to work on the sender at the moment, implementing
these features,
algorithms and ending with a CCID that doesn't match the expected performance.
Can you list the pending tasks in both code and tests to be done?


Cheers,

Ivo

--
Ivo Augusto Andrade Rocha Calado
MSc. Candidate
Embedded Systems and Pervasive Computing Lab - http://embedded.ufcg.edu.br
Systems and Computing Department - http://www.dsc.ufcg.edu.br
Electrical Engineering and Informatics Center - http://www.ceei.ufcg.edu.br
Federal University of Campina Grande - http://www.ufcg.edu.br

PGP: 0x03422935
Putt's Law:
      Technology is dominated by two types of people:
              Those who understand what they do not manage.
              Those who manage what they do not understand.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ