[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87muzt5usg.fsf@linux.intel.com>
Date: Wed, 28 Feb 2018 11:26:55 +0200
From: Felipe Balbi <felipe.balbi@...ux.intel.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Alessandro Zummo <a.zummo@...ertech.it>,
Alexandre Belloni <alexandre.belloni@...e-electrons.com>,
linux-kernel@...r.kernel.org, linux-rtc@...r.kernel.org
Subject: Re: HRTimer causing rtctest to fail
Hi,
Thomas Gleixner <tglx@...utronix.de> writes:
> On Tue, 27 Feb 2018, Felipe Balbi wrote:
>> I'm facing an odd problem with v4.16-rc2 (also reproducible on v4.15
>> final) where rtctest fails sometimes which PIE coming too late with
>> frequencies >= 1024 Hz.
>>
>> I've modified rtctest.c a bit so that it continues running even after
>> first failure just so I could get a glimpse of how many times it fails.
>>
>> Here are the results:
>>
>> Error 1024Hz i 17/20 diff 1089/976
>>
>> Error 2048Hz i 2/20 diff 538/488
>> Error 2048Hz i 9/20 diff 558/488
>> Error 2048Hz i 16/20 diff 560/488
>>
>> Error 4096Hz i 2/20 diff 305/244
>> Error 4096Hz i 7/20 diff 288/244
>> Error 4096Hz i 9/20 diff 285/244
>> Error 4096Hz i 11/20 diff 310/244
>> Error 4096Hz i 16/20 diff 284/244
>> Error 4096Hz i 20/20 diff 317/244
>>
>>
>> I added a few trace_printk() calls around rtc_dev_read() (and some other
>> functions) and captured the time for entry and exit of that function. I
>> noticed that HRTimer fires way too late and also way too early
>> sometimes. On the order of 100+ us.
>
> Power management, deep idle states?
>
>> For example with iterations 17 of 1024 Hz (seen above) and 18 (also of
>> 1024 Hz) I captured the following time stamps (all in uS):
>>
>> | Start | End | Expected End | Diff |
>> |----------+----------+--------------+------|
>> | 35900626 | 35901711 | 35901603 | -108 | (too late)
>> | 35901826 | 35902628 | 35902803 | 174 | (too early)
>
> Too early? hrtimers never fire too early. The measurement in that test is
> bogus:
>
> gettimeofday(&start, NULL);
> /* This blocks */
> retval = read(fd, &data, sizeof(unsigned long));
> if (retval == -1) {
> perror("read");
> exit(errno);
> }
> gettimeofday(&end, NULL);
>
> It's measuring relative time between going to sleep and wakeup, so if an
> event comes late, then the next one will claim to be early which is
> nonsense because periodic mode runs on an absolute time line.
>
> 1 2 3 4 5 6 7
> Expected: | | | | | | |
> Real: | | | | | | |
>
> ^ late event
> So 2->3 would give you (period + latency) and 3->4 (period - latency) ....
>
> And if the task gets scheduled out before the taking the start time stamp
> then this is also giving bogus results.
>
> What the test really should figure out is the deviation of the absolute
> time line. That's hard because the RTC code starts the hrtimer with:
>
> hrtimer_start(&rtc->pie_timer, period, HRTIMER_MODE_REL);
>
> which means relative to 'now'. So there is no good way for user space to
> determine the expected timeline.
>
>> I can't come up with a proper explanation as to why HRTimer is firing at
>> such wildly odd extremes. On most of the measurements, HRTimer executes
>> within 5 uS of scheduled time and I can't convince myself that
>> scheduling latency would get so high all of a sudden. Specially
>> considering I'm testing with a 4 core machine and there's nothing else
>> running on it. It's mostly idle.
>
> Enable the hrtimer and scheduling tracepoints. That should give you a hint
> what's going on.
Thanks, that does give me a lot more information. So here you go:
rtctest-1348 [003] d..2 313.766141: hrtimer_start: hrtimer=00000000667ce595 function=rtc_pie_update_irq expires=313573983010 softexpires=313573983010 mode=REL
<idle>-0 [003] d.h1 313.767189: hrtimer_expire_entry: hrtimer=00000000667ce595 function=rtc_pie_update_irq now=313574053764
We still have a 70754 nS deviation. After changing to absolute time,
the deviation remains:
<idle>-0 [000] dNh2 29.303251: hrtimer_start: hrtimer=000000006858b496 function=rtc_pie_update_irq expires=28765551360 softexpires=28765551360 mode=ABS
<idle>-0 [000] d.h1 29.303565: hrtimer_expire_entry: hrtimer=000000006858b496 function=rtc_pie_update_irq now=28765621916
I took a few measurements (76 to be exact) and plotted the deviation
from expected expiry time (now - expires from the tracepoint data) and
it's looking rather odd. PNG attached (all numbers in nS). Could this be
caused by FW stealing CPU behind Linux's back? I can take many more
measurements, if necessary. Just let me know.
cheers
Download attachment "chart.png" of type "image/png" (42201 bytes)
--
balbi
Download attachment "signature.asc" of type "application/pgp-signature" (833 bytes)
Powered by blists - more mailing lists