lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad92cf06-636a-417a-b03b-0d90c9243446@linaro.org>
Date: Thu, 15 May 2025 10:17:56 +0100
From: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
To: Dikshita Agarwal <quic_dikshita@...cinc.com>,
 Vikash Garodia <quic_vgarodia@...cinc.com>,
 Mauro Carvalho Chehab <mchehab@...nel.org>,
 Stanimir Varbanov <stanimir.varbanov@...aro.org>,
 Hans Verkuil <hans.verkuil@...co.com>
Cc: linux-media@...r.kernel.org, linux-arm-msm@...r.kernel.org,
 linux-kernel@...r.kernel.org, Vedang Nagar <quic_vnagar@...cinc.com>
Subject: Re: [PATCH v3 1/2] media: venus: fix TOCTOU vulnerability when
 reading packets from shared memory

On 14/05/2025 14:38, Dikshita Agarwal wrote:
> From: Vedang Nagar <quic_vnagar@...cinc.com>
> 
> Currently, Time-Of-Check to Time-Of-Use (TOCTOU) issue happens when
> handling packets from firmware via shared memory.
> 
> The problematic code pattern:
> 
> u32 dwords = *rd_ptr >> 2;
> if (!dwords || (dwords << 2) >  IFACEQ_VAR_HUGE_PKT_SIZE))
>     return -EINVAL;
> 
> memcpy(pkt, rd_ptr, dwords << 2);
> 
> Here, *rd_ptr is used to determine the size of the packet and is
> validated. However, since rd_ptr points to firmware-controlled memory,
> the firmware could change the contents (e.g., embedded header fields
> like pkt->hdr.size) after the size was validated but before or during
> the memcpy() call.
> 
> This opens up a race window where a malicious or buggy firmware could
> inject inconsistent or malicious data, potentially leading to
> information leaks, driver crashes, or undefined behavior.
> 
> Fix this by rechecking the packet size field from shared memory
> immediately before the memcpy() to ensure it has not beenn altered.
> 
> Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files")
> Signed-off-by: Vedang Nagar <quic_vnagar@...cinc.com>
> Co-developed-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
> Signed-off-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
> ---
>   drivers/media/platform/qcom/venus/hfi_venus.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c
> index b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644
> --- a/drivers/media/platform/qcom/venus/hfi_venus.c
> +++ b/drivers/media/platform/qcom/venus/hfi_venus.c
> @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev,
>   	new_rd_idx = rd_idx + dwords;
>   	if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
>   		if (new_rd_idx < qsize) {
> +			if ((*rd_ptr >> 2) != dwords)
> +				return -EINVAL;
> +
>   			memcpy(pkt, rd_ptr, dwords << 2);
>   		} else {
>   			size_t len;
> 

Here's how this code fragment looks after the change, I'll add two "}" 
for readability and annotate

dwords = *rd_ptr >> 2; // read the value here
if (!dwords)
        return -EINVAL;

new_rd_idx = rd_idx + dwords;

// validate the size against a maximum value
// this step is correct
if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
         if (new_rd_idx < qsize) {
                 // Re-read the value because firmware
                 // might have changed the value
                 if ((*rd_ptr >> 2) != dwords)
                         return -EINVAL;

                 // now trust dwords
                 memcpy(pkt, rd_ptr, dwords << 2);
         }
}

But this is all wrong.

There is no atomicity on the APSS side between the first verification of 
dwords size and the mempcpy() the commit log itself shows that the 
firmware is free-running with respect to the instruction pipeline of the 
APSS, it is an AMP problem.

Adding another check of the dwords side right before the memcpy() 
doesn't address the problem which the commit log describes as the 
firmware updating the length field of a header in shared memory.

There are perhaps 10 assembly instructions between the first check and 
the procedure prologue of the memcpy();

Adding another length check right before the memcpy() simply reduces the 
number of CPU instructions - the effective window that the firmware can 
update that header still.

if ((*rd_ptr >> 2) != dwords) // conditional branch operation

memcpy(pkt, rd_ptr, dwords << 2);

Begins with a procedure prologue - setting up the call stack - and then 
is a series of fetch/stores to copy data from here to there

The memcpy() itself by its nature it not atomic on the front-side-bus of 
the APSS to shared DRAM with the firmware.

On a CPU and SoC architecture level this fix just doesn't work.

To be honest we are already doing the right thing in this routine.

1. Reading the value from the packet header.
2. Validating the given size against the maximum size
3. Rejecting the memcpy() if the given size _at_the_time_we_read_ is too
    large.

The alternative to guarantee would be something like

asm("bus master asserts bus lock to PAGE/PAGES involved");
dwords = *rd_ptr;
if (dwords > MAX_VALUE)
     return -EFIRMWARE_BUG;
memcpy(dst, src, dwords >> 2);

asm("bus master unlocks memory");

Lets say we make the change proposed in this patch, here is how it breaks:

if ((*rd_ptr >> 2) != dwords)
     return -EINVAL;

// now trust dwords
memcpy(pkt, rd_ptr, dwords << 2);


objdump 
qlt-kernel/build/x1e80100-crd_qlt_integration/drivers/media/platform/qcom/venus/venus-core.o 
--disassemble=venus_read_queue.isra.0

5c48:	540000c9 	b.ls	5c60 <venus_read_queue.isra.0+0x110>  // b.plast
5c4c:	2a0303e2 	mov	w2, w3
5c50:	aa0703e0 	mov	x0, x7
5c54:	94000000 	bl	0 <memcpy>
5c58:	52800000 	mov	w0, #0x0

Your conditional jump is @ 0x5c48 your call to memcpy @ 0x5c54

Between 0x5c48 and 0x5c54 the firmware can update the value _again_
Indeed the firmware can update the value up until the time we complete 
reading the bit of the pkt header in memcpy() so an additional few 
instructions for sure.

You could make some type of argument to re-verify the content of the pkt 
_after_ the memcpy()

But the only verification that makes any sense _before_ the memcpy() is 
to verify the length at the point you _read_ - subsequent to the 
latching operation - were we fetch the length value from DRAM into our 
CPU cache, operating stack and/or local registers.

Once that data is fetched within the cache/stack/registers of the 
CPU/APSS that is the relevant value.

For the fix you have here to work you need this

5c48:   MAGICOP         memorybuslock
5c48:	540000c9 	b.ls	5c60 <venus_read_queue.isra.0+0x110>  // b.plast
5c4c:	2a0303e2 	mov	w2, w3
5c50:	aa0703e0 	mov	x0, x7
5c54:	94000000 	bl	0 <memcpy>
5c58:	52800000 	mov	w0, #0x0
5c5c:   MAGICUNOP       memorybusunlock

Because the firmware is free-running - with respect to the instruction 
pipeline of the above assembly.

If you really want to verify the data is still valid - it should be done 
_after_ the memcpy();

But even then I'd say to you, why verify _after_ the memcpy() - and what 
happens on the instruction directly _after_ the verification - is the 
data considered more valid now ?

i.e. this:

memcpy(pkt, rd_ptr, dwords << 2);

if ((*rd_ptr >> 2) != dwords)
     return -EINVAL;

doesn't have the above described architectural race condition but it 
doesn't make the data any more trustworthy - because it doesn't have 
atomicity

memcpy(pkt, rd_ptr, dwords << 2);

if ((*rd_ptr >> 2) != dwords)
     return -EINVAL;

dev_info(dev, "The value of *rd_ptr %lu!=%lu can be different now\n",
          *rd_ptr >> 2, dwords);

Sorry this patch just can't work. It's a very hard NAK from me.

---
bod

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ