linux-kernel - Re: drm/etnaviv: detecting disabled Vivante GPU?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250904113609.18c39d38@donnerap>
Date: Thu, 4 Sep 2025 11:36:09 +0100
From: Andre Przywara <andre.przywara@....com>
To: Christian Gmeiner <christian.gmeiner@...il.com>
Cc: Lucas Stach <l.stach@...gutronix.de>, Russell King
 <linux+etnaviv@...linux.org.uk>, etnaviv@...ts.freedesktop.org,
 dri-devel@...ts.freedesktop.org, linux-kernel@...r.kernel.org, Chen-Yu Tsai
 <wens@...e.org>, linux-sunxi <linux-sunxi@...ts.linux.dev>
Subject: Re: drm/etnaviv: detecting disabled Vivante GPU?

On Thu, 4 Sep 2025 12:10:30 +0200
Christian Gmeiner <christian.gmeiner@...il.com> wrote:

> Hi
> 
> >
> > the Allwinner A523/A527/T527 family of SoCs feature a Vivante
> > "VIP9000"(?) NPU, though it seems to be disabled on many SKUs.
> > See https://linux-sunxi.org/A523#Family_of_sun55iw3 for a table, the
> > row labelled "NPU" indicates which model has the IP. We suspect it's
> > all the same die, with the NPU selectively fused off on some packages.
> >
> > Board vendors seem to use multiple SKUs of the SoC on the same board,
> > so it's hard to say which particular board has the NPU or not. We
> > figured that on unsupported SoCs all the NPU registers read as 0,
> > though, so were wondering if that could be considered as a bail-out
> > check for the driver?
> > At the moment I get this, on a SoC with a disabled NPU:
> > [    1.677612] etnaviv etnaviv: bound 7122000.npu (ops gpu_ops)
> > [    1.683849] etnaviv-gpu 7122000.npu: model: GC0, revision: 0
> > [    1.690020] etnaviv-gpu 7122000.npu: Unknown GPU model
> > [    1.696145] [drm] Initialized etnaviv 1.4.0 for etnaviv on minor 0
> > [    1.953053] etnaviv-gpu 7122000.npu: GPU not yet idle, mask: 0x00000000
> >
> > Chen-Yu got this on his board featuring the NPU:
> >     etnaviv-gpu 7122000.npu: model: GC9000, revision: 9003
> >
> > If I get the code correctly, then etnaviv_gpu_init() correctly detects
> > the "unsupported" GPU model, and returns -ENXIO, but load_gpu() in
> > etnaviv_drv.c then somewhat ignores this, since it keeps looking for more
> > GPUs, and fails to notice that *none* showed up:
> > /sys/kernel/debug/dri/etnaviv/gpu is empty in my case.
> >  
> 
> Looks fine to me - no wrong behavior.
> 
> > Quick questions:
> > - Is reading 0 from VIVS_HI_CHIP_IDENTITY (or any other of the ID
> >   registers) an invalid ID, so we can use that to detect those disabled
> >   NPUs? If not, can any other register used to check this? The whole
> >   block seems to be RAZ/WI when the NPU is disabled.
> >
> > - Would it be acceptable to change the logic to error out of the
> >   driver's init or probe routine when no GPU/NPU has been found, at
> >   best with a proper error message? As it stands at the moment, the
> >   driver is loaded, but of course nothing is usable, so it keeps
> >   confusing users.
> >  
> 
> From an application standpoint, it’s not confusing since there is no etnaviv
> device to interact with. The user might wonder about the kernel messages,
> but that’s actually caused by an incorrect device tree. If the SoC doesn’t
> have an NPU, it shouldn’t be enabled in the DTS.

You have a point there, but as I mentioned above, that sounds tricky to
do: I have two boards that looks otherwise identical, but one has an A527,
the other an T527. And still both don't have the NPU, since only some
T527s feature it. So putting this on the user to use the right DT (or
U-Boot defconfig) does not sound very nice.

And in contrast to many other devices described in DTs, we *can* safely
detect the existence of this NPU: each of the SoCs have all the clock
gates and resets, and accesses to the MMIO frame do not fault - and the
kernel code apparently can cope with this situation already. So yeah, we
could smear something into U-Boot, to put a status = "disabled"; in there,
but I would like to avoid that, especially if the kernel is almost there
already.

Cheers,
Andre