linux-kernel - Re: [PATCH v8 09/14] iio: afe: rescale: fix accuracy for small

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YT05wgzDT1r2KdpO@shaak>
Date:   Sat, 11 Sep 2021 19:20:34 -0400
From:   Liam Beguin <liambeguin@...il.com>
To:     Peter Rosin <peda@...ntia.se>
Cc:     jic23@...nel.org, lars@...afoo.de, linux-kernel@...r.kernel.org,
        linux-iio@...r.kernel.org, devicetree@...r.kernel.org,
        robh+dt@...nel.org
Subject: Re: [PATCH v8 09/14] iio: afe: rescale: fix accuracy for small

On Mon, Aug 30, 2021 at 03:03:52PM +0200, Peter Rosin wrote:
> On 2021-08-29 06:41, Liam Beguin wrote:
> > On Thu, Aug 26, 2021 at 11:53:14AM +0200, Peter Rosin wrote:
> >> On 2021-08-24 22:28, Liam Beguin wrote:
> >>> On Mon Aug 23, 2021 at 00:18:55 +0200, Peter Rosin wrote:
> >>>> [I started to write an answer to your plans in the v7 thread, but didn't
> >>>> have time to finish before v8 appeared...]
> >>>>
> >>>> On 2021-08-20 21:17, Liam Beguin wrote:
> >>>>> From: Liam Beguin <lvb@...hos.com>
> >>>>>
> >>>>> The approximation caused by integer divisions can be costly on smaller
> >>>>> scale values since the decimal part is significant compared to the
> >>>>> integer part. Switch to an IIO_VAL_INT_PLUS_NANO scale type in such
> >>>>> cases to maintain accuracy.
> >>>>
> >>>
> >>> Hi Peter,
> >>>
> >>> Thanks for taking time to look at this in detail again. I really
> >>> appreciate all the feedback you've provided.
> >>>
> >>>> The conversion to int-plus-nano may also carry a cost of accuracy.
> >>>>
> >>>> 90/1373754273 scaled by 261/509 is 3.359e-8, the old code returns 3.348e-8,
> >>>> but the new one gets you 3.3e-8 (0.000000033, it simply cannot provide more
> >>>> digits). So, in this case you lose precision with the new code.
> >>>>
> >>>> Similar problem with 100 / 2^30 scaled by 3782/7000. It is 5.032e-8, the old
> >>>> code returns 5.029e-8, but the new one gets you the inferior 5.0e-8.
> >>>>
> >>>
> >>> I see what you mean here.
> >>> I added test cases with these values to see exactly what we get.
> >>
> >> Excellent!
> >>
> >>>
> >>> Expected rel_ppm < 0, but
> >>>     rel_ppm == 1000000
> >>>
> >>>      real=0.000000000
> >>>  expected=0.000000033594
> >>> # iio_rescale_test_scale: not ok 42 - v8 - 90/1373754273 scaled by 261/509
> >>> Expected rel_ppm < 0, but
> >>>     rel_ppm == 1000000
> >>>
> >>>      real=0.000000000
> >>>  expected=0.000000050318
> >>> # iio_rescale_test_scale: not ok 43 - v8 - 100/1073741824 scaled by 3782/7000
> >>>
> >>>
> >>> The main issue is that the first two examples return 0 which night be worst
> >>> that loosing a little precision.
> >>
> >> They shouldn't return zero?
> >>
> >> Here's the new code quoted from the test robot (and assuming
> >> a 64-bit machine, thus ignoring the 32-bit problem on line 56).
> >>
> >>     36		case IIO_VAL_FRACTIONAL:
> >>     37		case IIO_VAL_FRACTIONAL_LOG2:
> >>     38			tmp = (s64)*val * 1000000000LL;
> >>     39			tmp = div_s64(tmp, rescale->denominator);
> >>     40			tmp *= rescale->numerator;
> >>     41	
> >>     42			tmp = div_s64_rem(tmp, 1000000000LL, &rem);
> >>     43			*val = tmp;
> >>     44	
> >>     45			/*
> >>     46			 * For small values, the approximation can be costly,
> >>     47			 * change scale type to maintain accuracy.
> >>     48			 *
> >>     49			 * 100 vs. 10000000 NANO caps the error to about 100 ppm.
> >>     50			 */
> >>     51			if (scale_type == IIO_VAL_FRACTIONAL)
> >>     52				tmp = *val2;
> >>     53			else
> >>     54				tmp = 1 << *val2;
> >>     55	
> >>   > 56			 if (abs(rem) > 10000000 && abs(*val / tmp) < 100) {
> >>     57				 *val = div_s64_rem(*val, tmp, &rem2);
> >>     58	
> >>     59				 *val2 = div_s64(rem, tmp);
> >>     60				 if (rem2)
> >>     61					 *val2 += div_s64(rem2 * 1000000000LL, tmp);
> >>     62	
> >>     63				 return IIO_VAL_INT_PLUS_NANO;
> >>     64			 }
> >>     65	
> >>     66			return scale_type;
> >>
> >> When I go through the above manually, I get:
> >>
> >> line 
> >> 38: tmp = 90000000000    ; 90 * 1000000000
> >> 39: tmp = 176817288      ; 90000000000 / 509
> >> 40: tmp = 46149312168    ; 176817288 * 261
> >> 42: rem = 149312168      ; 46149312168 % 1000000000
> >>     tmp = 46             ; 46149312168 / 1000000000
> >> 43: *val = 46
> >> 51: if (<fractional>) [yes]
> >> 52: tmp = 1373754273
> >> 56: if (149312168 > 10000000 && 46/1373754273 < 100) [yes && yes]
> >> 57: rem2 = 46            ; 46 % 1373754273
> >>     *val = 0             ; 46 / 1373754273
> >> 59: *val2 = 0            ; 149312168 / 1373754273
> >> 60: if 46 [yes]
> >> 61: *val2 = 33           ; 0 + 46 * 1000000000 / 1373754273
> >> 63: return <int-plus-nano> [0.000000033]
> >>
> >> and
> >>
> >> line 
> >> 38: tmp = 100000000000   ; 100 * 1000000000
> >> 39: tmp = 14285714       ; 100000000000 / 7000
> >> 40: tmp = 54028570348    ; 176817288 * 3782
> >> 42: rem = 28570348       ; 54028570348 % 1000000000
> >>     tmp = 54             ; 54028570348 / 1000000000
> >> 43: *val = 54
> >> 51: if (<fractional>) [no]
> >> 54: tmp = 1073741824     ; 1 << 30
> >> 56: if (28570348 > 10000000 && 54/1073741824 < 100) [yes && yes]
> >> 57: rem2 = 54            ; 54 % 1073741824
> >>     *val = 0             ; 54 / 1073741824
> >> 59: *val2 = 0            ; 28570348 / 1073741824
> >> 60: if 46 [yes]
> >> 61: *val2 = 50           ; 0 + 54 * 1000000000 / 1073741824
> >> 63: return <int-plus-nano> [0.000000050]
> >>
> >> Why do you get zero, what am I missing?
> > 
> > So... It turns out, I swapped schan and rescaler values which gives us:
> 
> Ahh, good. The explanation is simple!
> 
> > 
> > numerator = 90
> > denominator = 1373754273
> > schan_val = 261
> > schan_val2 = 509
> > 
> > line
> > 38: tmp = 261000000000   ; 261 * 1000000000
> > 39: tmp = 189            ; 261000000000 / 1373754273
> > 40: tmp = 17010          ; 189 * 90
> > 42: rem = 17010          ; 17010 % 1000000000
> >     tmp = 0              ; 17010 / 1000000000
> > 43: *val = 0
> > 51: if (<fractional>) [yes]
> > 52: tmp = 509
> > 56: if (17010 > 10000000 && 0/509 < 100) [no && yes]
> > 66: *val = 0
> >     *val2 = 509
> >     return <fractional> [0.000000000]
> > 
> > Swapping back the values, I get the same results as you!
> > 
> > Also, replacing line 56 from the snippet above with
> > 
> > 	- if (abs(rem) > 10000000 && abs(div64_s64(*val, tmp)) < 100) {
> > 	+ if (abs(rem)) {
> > 
> > Fixes these precision errors. It also prevents us from returning
> > different scales if we swap the two divisions (schan and rescaler
> > parameters).
> 
> No, it doesn't fix the precision problems. Not really, it only reduces
> them. See below.
> 
> *snip*
> 
> >>> Considering these null values and the possible issue of not always having the
> >>> same scale type, would it be better to always return an IIO_VAL_INT_PLUS_NANO
> >>> scale?
> >>
> >> No, that absolutely kills the precision for small values that are much
> >> better off as-is. The closer you get to zero, the more the conversion
> >> to int-plus-nano hurts, relatively speaking.
> > 
> > I'm not sure I understand what you mean. The point of switching to
> > IIO_VAL_INT_PLUS_NANO at the moment is to get more precision on small
> > values. Am I missing something?

Hi Peter,

Apologies for the late reply.

> Yes, apparently :-) We always sacrifice accuracy by going to
> IIO_VAL_INT_PLUS_NANO. More is lost with smaller numbers, relatively.
> That is an inherent property of fix-point style representations such
> as IIO_VAL_INT_PLUS_NANO. Unless we get lucky and just happen to be
> able to represent the desired number exactly of course, but that tends
> to be both non-interesting and the exception.

I think I see where our misunderstanding comes from :-)

I understand that mathematically, IIO_VAL_FRACTIONAL is more accurate
than IIO_VAL_INT_PLUS_NANO for rational numbers given that it provides
an exact value to the IIO consumer.

My point is that the IIO core will represent IIO_VAL_FRACTIONAL scales
as fixed point when using iio_format_value().

Also, my current setup, uses drivers/hwmon/iio_hwmon.c which is worst
since it relies on iio_read_channel_processed() to get an integer scale.

(I wonder if it would make sense at some point to update iio_hwmon.c to
use iio_format_value() instead).

> Let's go back to the example from my response to the 8/14 patch, i.e.
> 5/32768 scaled by 3/10000. With the current code this yields
> 
> 15 / 327680000 (0.0000000457763671875)
> 
> Note, the above is /exact/. With IIO_VAL_INT_PLUS_NANO we instead get
> the truncated 0.000000045
> 
> The relative error introduced by the IIO_VAL_INT_PLUS_NANO conversion
> is almost 1.7% in this case. Sure, rounding instead of truncating
> would reduce that to 0.5%, but that's not really a significant
> improvement if you compare to /no/ error. Besides, there are many
> smaller numbers with even bigger relative conversion "noise".
> 
> And remember, this function is used to rescale the scale of the
> raw values. We are going to multiply the scale and the raw values
> at some point. If we have rounding errors in the scale, they will
> multiply with the raw values. It wouldn't look too good if something
> that should be able to reach 3V with a lot of accuracy (ca 26 bits)
> instead caps out at 2.94912V (or hits 3.014656V) because of accuracy
> issues with the scaling (1.7% too low or 0.5% too high).

I understand your point, but a device that has an IIO_VAL_FRACTIONAL
scale with *val=15 and *val2=327680000 will also show a scale of
0.000000045 in the sysfs interface.

Since other parts of the core already behave like this, I'm inclined to
say that this is a more general "issue", and that this kind of precision
loss would only affect consumers making direct use of the scaling
values. With all this, I wonder how careful we really have to be with
these extra digits.

> It's impossible to do better than exact, which is what we have now for
> IIO_VAL_FRACTIONAL and IIO_VAL_INT (for IIO_VAL_FRACTIONAL_LOG2, not
> so much...). At least as long as there's no overflow.

Right, but like I said above, depending on which path you take, that
value might not be exact in the end.

Thanks,
Liam

> 
> Cheers,
> Peter