[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77f07e94-82a2-5dc4-2483-e2ecff151e66@redhat.com>
Date: Wed, 4 Jun 2025 23:17:15 +0200 (CEST)
From: Sebastian Ott <sebott@...hat.com>
To: Zenghui Yu <yuzenghui@...wei.com>
cc: Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>,
Colton Lewis <coltonlewis@...gle.com>,
Ricardo Koller <ricarkol@...gle.com>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Shuah Khan <shuah@...nel.org>,
linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [PATCH v2 0/3] KVM: arm64: selftests: arch_timer_edge_cases
fixes
On Wed, 4 Jun 2025, Sebastian Ott wrote:
> On Tue, 3 Jun 2025, Zenghui Yu wrote:
>> On 2025/5/27 22:24, Sebastian Ott wrote:
>>> Some small fixes for arch_timer_edge_cases that I stumbled upon
>>> while debugging failures for this selftest on ampere-one.
>>>
>>> Changes since v1: modified patch 3 based on suggestions from Marc.
>>>
>>> I've done some tests with this on various machines - seems to be all
>>> good, however on ampere-one I now hit this in 10% of the runs:
>>> ==== Test Assertion Failure ====
>>> arm64/arch_timer_edge_cases.c:481: timer_get_cntct(timer) >= DEF_CNT +
>>> (timer_get_cntfrq() * (uint64_t)(delta_2_ms) / 1000)
>>> pid=166657 tid=166657 errno=4 - Interrupted system call
>>> 1 0x0000000000404db3: test_run at arch_timer_edge_cases.c:933
>>> 2 0x0000000000401f9f: main at arch_timer_edge_cases.c:1062
>>> 3 0x0000ffffaedd625b: ?? ??:0
>>> 4 0x0000ffffaedd633b: ?? ??:0
>>> 5 0x00000000004020af: _start at ??:?
>>> timer_get_cntct(timer) >= DEF_CNT + msec_to_cycles(delta_2_ms)
>>>
>>> This is not new, it was just hidden behind the other failure. I'll
>>> try to figure out what this is about (seems to be independent of
>>> the wait time)..
>>
>> Not sure if you have figured it out. I can easily reproduce it on my box
>> and I *guess* it is that we have some random XVAL values when we enable
>> the timer..
>
> Yes, I think so, too.
>
>> test_reprogramming_timer()
>> {
>> local_irq_disable();
>> reset_timer_state(timer, DEF_CNT);
>
> My first attempt was to also initialize cval here
Forgot to mention that I did this because my tests have shown
that the interrupt didn't only trigger early (like before the
reprogrammed delta) but instantly. This seemed to work but I think
the order in set_tval_irq() is the actual issue.
>
>>
>> /* Program the timer to DEF_CNT + delta_1_ms. */
>> set_tval_irq(timer, msec_to_cycles(delta_1_ms), CTL_ENABLE);
>>
>> [...]
>> }
>>
>> set_tval_irq()
>> {
>> timer_set_ctl(timer, ctl);
>>
>> // There is a window that we enable the timer with *random* XVAL
>> // values and we may get the unexpected interrupt.. And it's
>> // unlikely that KVM can be aware of TVAL's change (and
>> // re-evaluate the interrupt's pending state) before hitting the
>> // GUEST_ASSERT().
>>
>> timer_set_tval(timer, tval_cycles);
>
> Yes, I stumbled over this as well. I've always assumed that this order is
> becauase of this from the architecture "If CNTV_CTL_EL0.ENABLE is 0, the
> value returned is UNKNOWN." However re-reading that part today I realized
> that this only concerns register reads.
>
> Maybe somone on cc knows why it's in that order?
>
> I'm currently testing this with the above swapped and it's looking good,
> so far.
>
>> }
>>
>> I'm not familiar with the test so I'm not 100% sure that this is the
>> root cause. But I hope this helps with your analysis ;-) .
>
> It did, thanks!
>
> Sebastian
>
Powered by blists - more mailing lists