[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ba2432e4-cfa9-62a8-626e-97aab44c9ffe@arm.com>
Date: Thu, 21 Sep 2023 13:07:29 +0100
From: Robin Murphy <robin.murphy@....com>
To: Mark-PK Tsai <mark-pk.tsai@...iatek.com>
Cc: amit.kachhap@....com, angelogioacchino.delregno@...labora.com,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mediatek@...ts.infradead.org, linux@...linux.org.uk,
matthias.bgg@...il.com, yj.chiang@...iatek.com
Subject: Re: [PATCH] ARM: vfp: Add vudot opcode to VFP undef hook
On 21/09/2023 3:13 am, Mark-PK Tsai wrote:
>> On 2023-09-20 09:39, Mark-PK Tsai wrote:
>>> Add vudot opcode to the VFP undef hook to fix the
>>> potentially undefined instruction error when the
>>> user space executes vudot instruction.
>>
>> Did the kernel expose a hwcap to say that the dot product extension is
>> supported? I'm pretty sure it didn't, so why would userspace expect this
>> to work? ;)
>
> The hwcap for dotprod has been exported since commit:
>
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
>
>>
>> IIRC Amit was looking at defining the hwcaps to align with arm64 compat,
>> but I believe that series faltered since most of them weren't actually
>> needed (and I think at that point it was still missing the VFP support
>> code parts). It would be nice if someone could pick up and combine both
>
> Were the mentioned series related to this commit?
>
> 62ea0d873af3 ARM: 9269/1: vfp: Add hwcap for FEAT_DotProd
Oh, that did get merged? My apologies, I grepped for the hwcaps in
arch/arm but somehow failed to spot that some definitions did exist, so
assumed it hadn't been; not sure what went wrong there :(
In that case, we definitely want this tagged as a fix, and to make sure
we double-check for any equivalent fixes still needed for the other
features too. Sorry again for the confusion.
>> efforts and get this done properly; fill in *all* the hwcaps and
>> relevant handling for extensions which Cortex-A55 supports (since
>> there's definitely more than just VUDOT), and then hopefully we're done
>> for good.
>
> Agree.
>
>>
>>> Before this commit, kernel didn't handle the undef exception
>>> caused by vudot and didn't enable VFP in lazy VFP context
>>> switch code like other NEON instructions.
>>> This led to the occurrence of the undefined instruction
>>> error as following:
>>>
>>> [ 250.741238 ] 0904 (26902): undefined instruction: pc=004014ec
>>> ...
>>> [ 250.741287 ] PC is at 0x4014ec
>>> [ 250.741298 ] LR is at 0xb677874f
>>> [ 250.741303 ] pc : [<004014ec>] lr : [<b677874f>] psr: 80070010
>>> [ 250.741309 ] sp : beffedb0 ip : b67d7864 fp : beffee58
>>> [ 250.741314 ] r10: 00000000 r9 : 00000000 r8 : 00000000
>>> [ 250.741319 ] r7 : 00000001 r6 : 00000001 r5 : beffee90 r4 : 00401470
>>> [ 250.741324 ] r3 : beffee20 r2 : beffee30 r1 : beffee40 r0 : 004003a8
>>> [ 250.741331 ] Flags: Nzcv IRQs on FIQs on Mode USER_32 ISA ARM Segment user
>>> [ 250.741339 ] Control: 10c5383d Table: 32d0406a DAC: 00000055
>>> [ 250.741348 ] Code: f4434aef f4610aef f4622aef f4634aef (fc620df4)
>>>
>>> Below is the assembly of the user program:
>>>
>>> 0x4014dc <+108>: vst1.64 {d20, d21}, [r3:128]
>>> 0x4014e0 <+112>: vld1.64 {d16, d17}, [r1:128]
>>> 0x4014e4 <+116>: vld1.64 {d18, d19}, [r2:128]
>>> 0x4014e8 <+120>: vld1.64 {d20, d21}, [r3:128] --> switch out
>>> 0x4014ec <+124>: vudot.u8 q8, q9, q10 <-- switch in, and FPEXC.EN = 0
>>> SIGILL(illegal instruction)
>>>
>>> Link: https://services.arm.com/support/s/case/5004L00000XsOjP
>>
>> Linking to your private support case is not useful to upstream. Even I
>> can't open that link.
>
> I thought that maybe someone in arm need this.
> But it seems a bit noisy so I will remove the link from v2.
Yeah, even within Arm most of us don't have permission to access the
support system.
Cheers,
Robin.
>>
>>> Signed-off-by: Mark-PK Tsai <mark-pk.tsai@...iatek.com>
>>> ---
>>> arch/arm/vfp/vfpmodule.c | 12 ++++++++++++
>>> 1 file changed, 12 insertions(+)
>>>
>>> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
>>> index 7e8773a2d99d..7eab8d1019d2 100644
>>> --- a/arch/arm/vfp/vfpmodule.c
>>> +++ b/arch/arm/vfp/vfpmodule.c
>>> @@ -788,6 +788,12 @@ static struct undef_hook neon_support_hook[] = {{
>>> .cpsr_mask = PSR_T_BIT,
>>> .cpsr_val = 0,
>>> .fn = vfp_support_entry,
>>> +}, {
>>> + .instr_mask = 0xffb00000,
>>> + .instr_val = 0xfc200000,
>>> + .cpsr_mask = PSR_T_BIT,
>>> + .cpsr_val = 0,
>>> + .fn = vfp_support_entry,
>>> }, {
>>> .instr_mask = 0xef000000,
>>> .instr_val = 0xef000000,
>>> @@ -800,6 +806,12 @@ static struct undef_hook neon_support_hook[] = {{
>>> .cpsr_mask = PSR_T_BIT,
>>> .cpsr_val = PSR_T_BIT,
>>> .fn = vfp_support_entry,
>>> +}, {
>>> + .instr_mask = 0xffb00000,
>>> + .instr_val = 0xfc200000,
>>> + .cpsr_mask = PSR_T_BIT,
>>> + .cpsr_val = PSR_T_BIT,
>>> + .fn = vfp_support_entry,
>>
>> Why have two entries conditional on each possible value of one bit for
>> otherwise identical encodings? Surely it suffices to set both cpsr_mask
>> and cpsr_val to 0?
>
> You're right.
> I will set both cpsr_mask and cpsr_val to 0 and use single entry,
> as you suggested, in the v2 patch.
>
> Thanks.
>
>>
>> Thanks,
>> Robin.
>>
>>> }};
>>>
>>> static struct undef_hook vfp_support_hook = {
Powered by blists - more mailing lists