lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20091109060906.GA5126@gerrit.erg.abdn.ac.uk>
Date:	Mon, 9 Nov 2009 07:09:06 +0100
From:	Gerrit Renker <gerrit@....abdn.ac.uk>
To:	Ivo Calado <ivocalado@...edded.ufcg.edu.br>
Cc:	dccp <dccp@...r.kernel.org>, netdev <netdev@...r.kernel.org>
Subject: Re: Doubt in implementations of mean loss interval at sender side

| > To sum up, here is whay I think is minimally required to satisfy the union
| > of RFC 4340, 4342, 4828, 5348, and 5622:
| >
| >        struct tfrc_tx_packet_info {
| >                u64     seqno:48,
| >                        is_ect0:1,
| >                        is_data_packet:1,
| >                        is_in_loss_interval:1;
| >                u32     send_time;
| >                u32     rtt_estimate;
| >                struct tfrc_tx_packet_info *next; /* FIFO */
| >        };
| >
| > That would be a per-packet storage cost of about 16 bytes, plus the pointer
| > (8 bytes on 64-bit architectures). One could avoid the pointer by defining a
| >        u64     base_seqno;
| > and then
| >        struct tfrc_tx_packet_info[some constant here];
| > and then index the array relative to the base_seqno.
| >
| 
| Yes, I believe that struct is enough too. But how long would be necessary
| the struct array to be?
| 
The problem is the same as with Ack Vectors - the array (or list) can grow
arbitrarily large. You made a good reply, since all the questions are 
inter-related. The first two I see here are

 1) the choice of data structure (array or list)
 2) the design of a garbage-collector

This includes your point from above, about the maximum size. To draw the
analogy to Ack Vectors, at the moment they use a fixed size. On certain
mediums (WiFi) there exist situations where even that fixed limit is
reached, causing an overflow with Ack Vectors that have reached a size
of 2 * 253 = 506 bytes.

Looking after old data of sent packets is similar, the main difference I
see that at some stage "unused" old entries need to be collected, to avoid
the overflow problem which occurs when using a fixed-size structure.

I find that 'Acknowledgments of Acknowledgments' is a bad idea, since it
means implementing reliable delivery over unreliable transport; on the
one hand DCCP is designed to be unreliable, but here suddenly is a break
with that design decision.

So point (2) will probably mean coming up with some guessed heuristics
that will work for most scenarios, but may fail in others.


This is why I am not a big fan of the sender-based solution: to solve (2) and
your question from above requires a lot of testing of the garbage-collection
and book-keeping algorithms, rather than on the actual congestion control.

One can spend a lot of time going over these issues, but of what use is the
most ingenious data structure if the overall protocol behaviour does not
deliver a useful performance to users of the protocol?

| > IIb) Further remarks
| > --------------------
| > At first sight it would seem that storing the RTT also solves the problem
| > of inaccurate RTTs used at the receiver. Unfortunately, this is not the
| > case. X_recv is sampled over intervals of varying length which may or may
| > not equal the RTT.  To factor out the effect of window counters, the sender
| > would need to store the packet size as well and would need to use rather
| > complicated computations - an ugly workaround.
| 
| I didn't understand how the packet size would help and what
| computations are needed.
| 
The above still refers to the earlier posting about letting the sender
supply the RTT estimate R_i in packet `i' as defined in RFC 5348, 3.2.1.

Though the same section later suggests that a coarse-grained timestamp 
is sufficient, in practice the inaccuracy of the RTT means inaccurate
X_recv, and as a consequence sub-optimal protocol performance.

The problem is that the algorithm from RFC 4342, 8.1 assumes that the
rate of change of the window counter also relates to the change of
packet spacing (the difference between the T_i packe arrivel times).

Howver, especially when using high-speed (1/10 Gbit) networking, this
assumption often does not hold in practice. Packets are sent by the network
card in bunches, or intervening switches/routers cause a compression of
packet inter-arrival times. Hence it is perfectly possible that a bundle of
packets with different Window Counter CCVal values arrive at virtually
the same time. For instance, on a 100Mbs ethernet I have seen spikes of
X_recv of up to 2-3 Gbits/sec. Several orders of magnitude from the
real packet speed (not to mention the unrealistic value).

So the question above was asking whether there is a way for the sender
to "compute away" the inaccuracies reported by the receiver. Your reply
confirms my doubts that doing this is probably not possible.

To clarify, I was asking whether it would be possible for the sender to
perform step (2) of RFC 5348, 6.2; to compensate for the fact that the
receiver does not have a reliable RTT estimate.

For example, when receiving feedback for packet `i', it would iterate
through the list/array, going back over as many packets as are covered
by the send time T_i of packet `i' minus the RTT estimate R_i at that
time, sum their packet sizes, and from that value recompute X_recv.

This is a bit complicated if the garbage-collector has already purged
older entries, so part of the garbage collector would probably have
to watch over acknowledged packets. I add this as item
 3) validate X_recv 
to the above running list of book-keeping items done at the sender.


| > One thing I stumbled across while reading your code was the fact that RFC 4342
| > leaves it open as to how many Loss Intervals to send: on the one hand it follows
| > the suggestion of RFC 5348 to use 1+NINTERVAL=9, but on the other hand it does
| > not restrict the number of loss intervals. Also RFC 5622 does not limit the
| > number of Loss Intervals / Data Dropped options.
| >
| > If receiving n > 9 Loss Intervals, what does the sender do with the n-9 older
| > intervals? There must be some mechanism to stop these options from growing
| > beyond bounds, so it needs to store also which loss intervals have been
| > acknowledged, introducing the "Acknowledgment of Acknowledgments"
| > problem.
| 
| In RFC 4342 section 8.6 it says that the limit of loss interval data
| to send is 28, and RFC 5622 8.7 says 84 for dropped packets option.
| But I don't see why to send so many data in these options.
| Yes, the most recent 9 loss intervals are required to be reported,
| except if the sender acknowledged previous sent loss intervals, so in
| that case only one is required, the open interval.
| And we can avoid the "Acknowledgment of Acknowledgments" if we always send
| the required 9 loss intervals, I think.
| 
| > A second point is how to compute the loss event rate when n > 9. It seems
| > that this would mean grinding through all loss intervals using a window
| > of 9. If that is the case, the per-packet-computation costs become very
| > expensive.
| 
| RFC 4342 section 8.6 suggests that only 9 loss intervals are required
| anyway. And I believe that's enough for the computation of current
| mean loss interval. What do you think?
| 
Yes, absolutely, I am completely in favour of this very sensible suggestion.

If people really must experiment with such outdated data, that could be
done in truly experimental patches. Especially since RFC 5348 normatively
recommends a value of n = 8 in section 5.4. And we are saving further
headaches about the book-keeping/garbage collection of old data.

| > II) Computational part of the implementation
| > --------------------------------------------
| > If only Loss Intervals alone are used, only these need to be verified
| > before being used to alter the sender behaviour.
| >
| > But when one or more other DCCP options also appear, the verification is
| >  * intra: make sure each received option is in itself consistent,
| >  * inter: make sure options are mutually consistent.
| >
| > The second has a combinatorial effect, i.e. n! verifications for n options.
| >
<snip>
| 
| Yes, there's a combinatorial problem in checking the options for consistence.
| But, what if we find out that some option doesn't match against others?
| What action would be taken?
I add this as
 4) define policy for dealing with options that are not mutually consistent

| First, what can cause the receiver to send inconsistent options?
| A bad implementation only?
Yes I think that a bad implementation (whether on purpose or not) would be
the main cause, since header options are protected even if partial
checksums are used (RFC 4340, 9.2).

But there is also the benign case mentioned at the end of RFC 4342, 9.2,
where a receiver collapses multiple losses into a single loss event, i.e.
 5) validate received Loss Intervals and regroup the receiver-based 
    information if necessary, without interpreting this as attempted
    receiver misbehaviour.

| Accordingly to ecn nonce echo sum algorithm, if a receiver is found to be 
| lying about loss or to be bad implemented, the sender adjusts the send rate
| as if loss were perceived.
| Can we do the same in this situation? If so, can we skip checking options
| between them and only check ecn nonce sum?
This is difficult since Ack Vectors and Loss Intervals use different
definitions of ECN Nonce sum (last paragraph in RFC 4342, 9.1), i.e. we have
 6) separate algorithms to compute Ack Vector/Loss Intervals ECN Nonce sum.

With regard to (5) above, your suggestion gives
 7) validate options, on mismatch other than (5) only validate ECN nonce.

| If some option is wrong it show more loss (or any worse situation for the
| receiver) or conceals loss. In the first case, I don't believe we need to care,
| and in the second, the ecn nonce sum can reveal the bad acting of the receiver.
Yes you are right, we need not worry if a receiver reports a higher loss rate
than the verification done by the sender (which recomputes the data that the
receiver already has computed) calculates.

But for the second case, there is no guarantee to catch a misbehaving
receiver, only a 50% chance at the end of many computations.

RFC 4342, 9 suggests one way of verifying Loss Intervals / Ack Vectors:
 5) occasionally do not send a packet, or send packet out of order.

This increases complexity of book-keeping, the sender needs to keep track
which of the sent packets was a fake send/drop. It also requires an algorithm
to iterate over the sender data structures in order to find out whether the
reasoning of the receiver is sane. I have doubts whether this can be done
without sacrificing the performance of the in-kernel sender side.


| > III) Closing remarks in favour of receiver-based implementation
| > ---------------------------------------------------------------
| > Finally, both RFC 4342 and RFC 5622 do not explicitly discard the
| > possibility of using a receiver-based implementation. Quoting
| > RFC 4342, 3.2: "If it prefers, the sender can also use a loss event
| >                rate calculated and reported by the receiver."
| > Furthermore, the revised TFRC specification points out in section 7
| > the advantages that a receiver-based implementation has:
| >  * it does not mandate reliable delivery of packet loss data;
| >  * it is robust against the loss of feedback packets;
| >  * better suited for scalable server design.
| >
| > Quite likely, if the server does not have to store and validate a mass
| > of data, it is also less prone to be toppled by DoS attacks.
| 
| You're right. But what the RFC's says about it is almost exactly the
| opposite, isn't? What can we do about it? I like the receiver-based design,
| but I believe that loss intervals are interesting, mostly because  of
| receiver behavior verification.
| 
While writing the above reply, I was amazed to see how much of the computation
that has already been done at the receiver needs to be done again at the sender, 
ust in order to be able to verify the data.

To me this seems very inefficient.

Moreover, the biggest danger I see here is spending a lot of time with the
details of sender book-keeping and verification, just to then see that the
performance of CCID-3/4 in practice turns out to be below the standards
acceptable to even modest users.

I think it is clearly better to prefer the simplest possible implementation
in such cases, to better debug live protocol performance.

In particular, since CCID-4 is designed to be an experimental protocol
(i.e. if it works, RFC 5622 may mature into a Proposed Standard, if not,
it might be superseded by a different specification).

And I think that testing the actual user performance has the highest priority.

The literature on the subject is almost exclusively done on experiences in
ns-2 userland. Almost no Internet experiments at all have been done with DCCP.

This is because the IPPROTO = 33 identifier needs to be entered
especially into a firewall, which opens holes that firewall
administrators don't like to open (unless the firewall is based on
a recent Linux kernel, opening all ports for IP protocol identifier
33 is the only way of allowing DCCP traffic in/out of a firewall).
		
In addition, most people use NAT at home, putting another obstacle
on experiments. The result is then that tests are done in a lab testbed
or in virtualisation - emulated networks.

To conclude, I still think that the simpler, receiver-based implementation
gives a better start. A 'perfect' receiver implementation is also a good
reference point to start protocol evaluation: if the performance is bad
despite getting things right at the receiver, then other parts of the
protocol need investigation/improvement.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ