linux-kernel - Re: [PATCH] KVM: arm64: Fix error checking for FFA

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CACF_fq=PwSucRBYCu7A7HvcZiGss49UQipERfmk0X84-gj5ZNw@mail.gmail.com>
Date: Tue, 25 Nov 2025 14:54:21 +0100
From: Kornel Dulęba <korneld@...gle.com>
To: Marc Zyngier <maz@...nel.org>
Cc: Oliver Upton <oupton@...nel.org>, Joey Gouly <joey.gouly@....com>, 
	Suzuki K Poulose <suzuki.poulose@....com>, Zenghui Yu <yuzenghui@...wei.com>, 
	Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>, 
	Bartlomiej Grzesik <bgrzesik@...gle.com>, Tomasz Nowicki <tnowicki@...gle.com>, 
	Sebastian Ene <sebastianene@...gle.com>, linux-arm-kernel@...ts.infradead.org, 
	kvmarm@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: arm64: Fix error checking for FFA_VERSION

On Mon, Nov 24, 2025 at 2:22 PM Marc Zyngier <maz@...nel.org> wrote:
>
> On Mon, 24 Nov 2025 11:49:08 +0000,
> Kornel Dulęba <korneld@...gle.com> wrote:
> >
> > On Sat, Nov 22, 2025 at 12:36 PM Marc Zyngier <maz@...nel.org> wrote:
> > >
> > > On Fri, 14 Nov 2025 11:11:53 +0000,
> > > "=?utf-8?q?Kornel_Dul=C4=99ba?=" <korneld@...gle.com> wrote:
> > > >
> > > > According to section 13.2 of the DEN0077 FF-A specification, when
> > > > firmware does not support the requested version, it should reply with
> > > > FFA_RET_NOT_SUPPORTED(-1). Table 13.6 specifies the type of the error
> > > > code as int32.
> > > > Currently, the error checking logic compares the unsigned long return
> > > > value it got from the SMC layer, against a "-1" literal. This fails due
> > > > to a type mismatch: the literal is extended to 64 bits, whereas the
> > > > register contains only 32 bits of ones(0x00000000ffffffff).
> > > > Consequently, hyp_ffa_init misinterprets the "-1" return value as an
> > > > invalid FF-A version. This prevents pKVM initialization on devices where
> > > > FF-A is not supported in firmware.
> > >
> > > Is this statement accurate? I regularly boot KVM in protected mode in
> > > environments that really cannot be suspected of implementing FF-A
> > > (there is no EL3 to start with). And yet I don't see any failure of
> > > the sort.
> > >
> > > Please clarify the circumstances this is triggered.
> >
> > I do have EL3 enabled, but the FF-A itself is not implemented.
> >
> > Without this change kvm initialization fails with the following:
> >
> > [    0.946776][    T1] kvm [1]: nv: 554 coarse grained trap handlers
> > [    0.952880][    T1] kvm [1]: nv: 669 fine grained trap handlers
> > [    0.958813][    T1] kvm [1]: IPA Size Limit: 44 bits
> > (...)
> > [    1.034089][    T1] kvm [1]: Failed to init hyp memory protection
> > [    1.041213][    T1] kvm [1]: error initializing Hyp mode: -95
>
> [    0.045492] kvm [1]: nv: 568 coarse grained trap handlers
> [    0.045860] kvm [1]: nv: 664 fine grained trap handlers
> [    0.046194] kvm [1]: IPA Size Limit: 42 bits
> [    0.220184] kvm [1]: GICv3: no GICV resource entry
> [    0.220525] kvm [1]: disabling GICv2 emulation
> [    0.220852] kvm [1]: GIC system register CPU interface enabled
> [    0.221264] kvm [1]: vgic interrupt IRQ9
> [    0.221565] kvm [1]: Protected hVHE mode initialized successfully
>
> Ergo, it works here without FFA (this is in a nested guest that is not
> exposed anything but PSCI in the FW emulation).

I think this all boils down to how the error code is set in reply to
the FFA_VERSION SMC. In the do_ffa_version function in this file, the
return value is set with no casts: "res->a0 = FFA_RET_NOT_SUPPORTED;".
Since res->a0 is an unsigned long and `FFA_RET_NOT_SUPPORTED` is a
(-1) literal, the latter will get promoted resulting in res->a0 being
set to 0xffs.
At the same time any implementation that returns "0x00000000ffffffff"
will still be compliant with the FF-A spec, while triggering the bug
I'm trying to fix here.

>
> > I managed to narrow this down to the FFA version check by examining
> > all of the places in kvm initialization logic where -EOPNOTSUPP is
> > returned. Since printing anything in this part of the code is somewhat
> > problematic I replaced “return -EOPNOTSUPP” with “return res.s0” to
> > examine the problematic register value:
> >
> > [1.041229][    T1] kvm [1]: error initializing Hyp mode: -1
> >
> > Note that the return code itself is cast to int before being printed.
> > Then a colleague of mine recommended looking into the arm_ffa driver.
> > (“drivers/firmware/arm_ffa/driver.c”) There I found that in the
> > ffa_version_check function, the return value from the SMC call is cast
> > to s32 before being checked for errors.
> > I did the same in the kvm initialization logic, which is how this
> > patch was created. Furthermore I also examined the FF-A
> > specification(DEN0077), where the error code value type is specified
> > as int32.
> > With this change applied I can now see that kvm is up and running:
> >
> > [    0.946839][    T1] kvm [1]: nv: 554 coarse grained trap handlers
> > [    0.952940][    T1] kvm [1]: nv: 669 fine grained trap handlers
> > [    0.958867][    T1] kvm [1]: IPA Size Limit: 44 bits
> > (...)
> > [    1.061717][    T1] kvm [1]: Protected hVHE mode initialized successfully
> >
> > The /dev/kvm file is also there.
>
> I don't dispute the bug. I dispute the assertion that the absence of
> FFA triggers it. So either your testing environment influences the
> behaviour, or you have extra patches that also change the behaviour,
> or something else.

To be completely transparent I stumbled upon this using the android
fork of the kernel, but I don't think that matters here at all.
On my device all cores start in EL2, so the problematic SMC call
should go straight to the FW. I believe that the FW sets the a0
register to "0x00000000ffffffff", indicating that FF-A is not
available. This behavior is actually what the FF-A spec says should be
done, so this is not a case of broken FW.
At the same time it does trigger a bug in the error checking logic,
which I described above.