[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CACF_fq=PwSucRBYCu7A7HvcZiGss49UQipERfmk0X84-gj5ZNw@mail.gmail.com>
Date: Tue, 25 Nov 2025 14:54:21 +0100
From: Kornel Dulęba <korneld@...gle.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Oliver Upton <oupton@...nel.org>, Joey Gouly <joey.gouly@....com>,
Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>,
Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Bartlomiej Grzesik <bgrzesik@...gle.com>, Tomasz Nowicki <tnowicki@...gle.com>,
Sebastian Ene <sebastianene@...gle.com>, linux-arm-kernel@...ts.infradead.org,
kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: arm64: Fix error checking for FFA_VERSION
On Mon, Nov 24, 2025 at 2:22 PM Marc Zyngier <maz@...nel.org> wrote:
>
> On Mon, 24 Nov 2025 11:49:08 +0000,
> Kornel Dulęba <korneld@...gle.com> wrote:
> >
> > On Sat, Nov 22, 2025 at 12:36 PM Marc Zyngier <maz@...nel.org> wrote:
> > >
> > > On Fri, 14 Nov 2025 11:11:53 +0000,
> > > "=?utf-8?q?Kornel_Dul=C4=99ba?=" <korneld@...gle.com> wrote:
> > > >
> > > > According to section 13.2 of the DEN0077 FF-A specification, when
> > > > firmware does not support the requested version, it should reply with
> > > > FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error
> > > > code as int32.
> > > > Currently, the error checking logic compares the unsigned long return
> > > > value it got from the SMC layer, against a "-1" literal. This fails due
> > > > to a type mismatch: the literal is extended to 64 bits, whereas the
> > > > register contains only 32 bits of ones(0x00000000ffffffff).
> > > > Consequently, hyp_ffa_init misinterprets the "-1" return value as an
> > > > invalid FF-A version. This prevents pKVM initialization on devices where
> > > > FF-A is not supported in firmware.
> > >
> > > Is this statement accurate? I regularly boot KVM in protected mode in
> > > environments that really cannot be suspected of implementing FF-A
> > > (there is no EL3 to start with). And yet I don't see any failure of
> > > the sort.
> > >
> > > Please clarify the circumstances this is triggered.
> >
> > I do have EL3 enabled, but the FF-A itself is not implemented.
> >
> > Without this change kvm initialization fails with the following:
> >
> > [ 0.946776][ T1] kvm [1]: nv: 554 coarse grained trap handlers
> > [ 0.952880][ T1] kvm [1]: nv: 669 fine grained trap handlers
> > [ 0.958813][ T1] kvm [1]: IPA Size Limit: 44 bits
> > (...)
> > [ 1.034089][ T1] kvm [1]: Failed to init hyp memory protection
> > [ 1.041213][ T1] kvm [1]: error initializing Hyp mode: -95
>
> [ 0.045492] kvm [1]: nv: 568 coarse grained trap handlers
> [ 0.045860] kvm [1]: nv: 664 fine grained trap handlers
> [ 0.046194] kvm [1]: IPA Size Limit: 42 bits
> [ 0.220184] kvm [1]: GICv3: no GICV resource entry
> [ 0.220525] kvm [1]: disabling GICv2 emulation
> [ 0.220852] kvm [1]: GIC system register CPU interface enabled
> [ 0.221264] kvm [1]: vgic interrupt IRQ9
> [ 0.221565] kvm [1]: Protected hVHE mode initialized successfully
>
> Ergo, it works here without FFA (this is in a nested guest that is not
> exposed anything but PSCI in the FW emulation).
I think this all boils down to how the error code is set in reply to
the FFA_VERSION SMC. In the do_ffa_version function in this file, the
return value is set with no casts: "res->a0 = FFA_RET_NOT_SUPPORTED;".
Since res->a0 is an unsigned long and `FFA_RET_NOT_SUPPORTED` is a
(-1) literal, the latter will get promoted resulting in res->a0 being
set to 0xffs.
At the same time any implementation that returns "0x00000000ffffffff"
will still be compliant with the FF-A spec, while triggering the bug
I'm trying to fix here.
>
> > I managed to narrow this down to the FFA version check by examining
> > all of the places in kvm initialization logic where -EOPNOTSUPP is
> > returned. Since printing anything in this part of the code is somewhat
> > problematic I replaced “return -EOPNOTSUPP” with “return res.s0” to
> > examine the problematic register value:
> >
> > [1.041229][ T1] kvm [1]: error initializing Hyp mode: -1
> >
> > Note that the return code itself is cast to int before being printed.
> > Then a colleague of mine recommended looking into the arm_ffa driver.
> > (“drivers/firmware/arm_ffa/driver.c”) There I found that in the
> > ffa_version_check function, the return value from the SMC call is cast
> > to s32 before being checked for errors.
> > I did the same in the kvm initialization logic, which is how this
> > patch was created. Furthermore I also examined the FF-A
> > specification(DEN0077), where the error code value type is specified
> > as int32.
> > With this change applied I can now see that kvm is up and running:
> >
> > [ 0.946839][ T1] kvm [1]: nv: 554 coarse grained trap handlers
> > [ 0.952940][ T1] kvm [1]: nv: 669 fine grained trap handlers
> > [ 0.958867][ T1] kvm [1]: IPA Size Limit: 44 bits
> > (...)
> > [ 1.061717][ T1] kvm [1]: Protected hVHE mode initialized successfully
> >
> > The /dev/kvm file is also there.
>
> I don't dispute the bug. I dispute the assertion that the absence of
> FFA triggers it. So either your testing environment influences the
> behaviour, or you have extra patches that also change the behaviour,
> or something else.
To be completely transparent I stumbled upon this using the android
fork of the kernel, but I don't think that matters here at all.
On my device all cores start in EL2, so the problematic SMC call
should go straight to the FW. I believe that the FW sets the a0
register to "0x00000000ffffffff", indicating that FF-A is not
available. This behavior is actually what the FF-A spec says should be
done, so this is not a case of broken FW.
At the same time it does trigger a bug in the error checking logic,
which I described above.
Powered by blists - more mailing lists