[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ad92cf06-636a-417a-b03b-0d90c9243446@linaro.org>
Date: Thu, 15 May 2025 10:17:56 +0100
From: Bryan O'Donoghue <bryan.odonoghue@...aro.org>
To: Dikshita Agarwal <quic_dikshita@...cinc.com>,
Vikash Garodia <quic_vgarodia@...cinc.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Stanimir Varbanov <stanimir.varbanov@...aro.org>,
Hans Verkuil <hans.verkuil@...co.com>
Cc: linux-media@...r.kernel.org, linux-arm-msm@...r.kernel.org,
linux-kernel@...r.kernel.org, Vedang Nagar <quic_vnagar@...cinc.com>
Subject: Re: [PATCH v3 1/2] media: venus: fix TOCTOU vulnerability when
reading packets from shared memory
On 14/05/2025 14:38, Dikshita Agarwal wrote:
> From: Vedang Nagar <quic_vnagar@...cinc.com>
>
> Currently, Time-Of-Check to Time-Of-Use (TOCTOU) issue happens when
> handling packets from firmware via shared memory.
>
> The problematic code pattern:
>
> u32 dwords = *rd_ptr >> 2;
> if (!dwords || (dwords << 2) > IFACEQ_VAR_HUGE_PKT_SIZE))
> return -EINVAL;
>
> memcpy(pkt, rd_ptr, dwords << 2);
>
> Here, *rd_ptr is used to determine the size of the packet and is
> validated. However, since rd_ptr points to firmware-controlled memory,
> the firmware could change the contents (e.g., embedded header fields
> like pkt->hdr.size) after the size was validated but before or during
> the memcpy() call.
>
> This opens up a race window where a malicious or buggy firmware could
> inject inconsistent or malicious data, potentially leading to
> information leaks, driver crashes, or undefined behavior.
>
> Fix this by rechecking the packet size field from shared memory
> immediately before the memcpy() to ensure it has not beenn altered.
>
> Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files")
> Signed-off-by: Vedang Nagar <quic_vnagar@...cinc.com>
> Co-developed-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
> Signed-off-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
> ---
> drivers/media/platform/qcom/venus/hfi_venus.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c
> index b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644
> --- a/drivers/media/platform/qcom/venus/hfi_venus.c
> +++ b/drivers/media/platform/qcom/venus/hfi_venus.c
> @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev,
> new_rd_idx = rd_idx + dwords;
> if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
> if (new_rd_idx < qsize) {
> + if ((*rd_ptr >> 2) != dwords)
> + return -EINVAL;
> +
> memcpy(pkt, rd_ptr, dwords << 2);
> } else {
> size_t len;
>
Here's how this code fragment looks after the change, I'll add two "}"
for readability and annotate
dwords = *rd_ptr >> 2; // read the value here
if (!dwords)
return -EINVAL;
new_rd_idx = rd_idx + dwords;
// validate the size against a maximum value
// this step is correct
if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
if (new_rd_idx < qsize) {
// Re-read the value because firmware
// might have changed the value
if ((*rd_ptr >> 2) != dwords)
return -EINVAL;
// now trust dwords
memcpy(pkt, rd_ptr, dwords << 2);
}
}
But this is all wrong.
There is no atomicity on the APSS side between the first verification of
dwords size and the mempcpy() the commit log itself shows that the
firmware is free-running with respect to the instruction pipeline of the
APSS, it is an AMP problem.
Adding another check of the dwords side right before the memcpy()
doesn't address the problem which the commit log describes as the
firmware updating the length field of a header in shared memory.
There are perhaps 10 assembly instructions between the first check and
the procedure prologue of the memcpy();
Adding another length check right before the memcpy() simply reduces the
number of CPU instructions - the effective window that the firmware can
update that header still.
if ((*rd_ptr >> 2) != dwords) // conditional branch operation
memcpy(pkt, rd_ptr, dwords << 2);
Begins with a procedure prologue - setting up the call stack - and then
is a series of fetch/stores to copy data from here to there
The memcpy() itself by its nature it not atomic on the front-side-bus of
the APSS to shared DRAM with the firmware.
On a CPU and SoC architecture level this fix just doesn't work.
To be honest we are already doing the right thing in this routine.
1. Reading the value from the packet header.
2. Validating the given size against the maximum size
3. Rejecting the memcpy() if the given size _at_the_time_we_read_ is too
large.
The alternative to guarantee would be something like
asm("bus master asserts bus lock to PAGE/PAGES involved");
dwords = *rd_ptr;
if (dwords > MAX_VALUE)
return -EFIRMWARE_BUG;
memcpy(dst, src, dwords >> 2);
asm("bus master unlocks memory");
Lets say we make the change proposed in this patch, here is how it breaks:
if ((*rd_ptr >> 2) != dwords)
return -EINVAL;
// now trust dwords
memcpy(pkt, rd_ptr, dwords << 2);
objdump
qlt-kernel/build/x1e80100-crd_qlt_integration/drivers/media/platform/qcom/venus/venus-core.o
--disassemble=venus_read_queue.isra.0
5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast
5c4c: 2a0303e2 mov w2, w3
5c50: aa0703e0 mov x0, x7
5c54: 94000000 bl 0 <memcpy>
5c58: 52800000 mov w0, #0x0
Your conditional jump is @ 0x5c48 your call to memcpy @ 0x5c54
Between 0x5c48 and 0x5c54 the firmware can update the value _again_
Indeed the firmware can update the value up until the time we complete
reading the bit of the pkt header in memcpy() so an additional few
instructions for sure.
You could make some type of argument to re-verify the content of the pkt
_after_ the memcpy()
But the only verification that makes any sense _before_ the memcpy() is
to verify the length at the point you _read_ - subsequent to the
latching operation - were we fetch the length value from DRAM into our
CPU cache, operating stack and/or local registers.
Once that data is fetched within the cache/stack/registers of the
CPU/APSS that is the relevant value.
For the fix you have here to work you need this
5c48: MAGICOP memorybuslock
5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast
5c4c: 2a0303e2 mov w2, w3
5c50: aa0703e0 mov x0, x7
5c54: 94000000 bl 0 <memcpy>
5c58: 52800000 mov w0, #0x0
5c5c: MAGICUNOP memorybusunlock
Because the firmware is free-running - with respect to the instruction
pipeline of the above assembly.
If you really want to verify the data is still valid - it should be done
_after_ the memcpy();
But even then I'd say to you, why verify _after_ the memcpy() - and what
happens on the instruction directly _after_ the verification - is the
data considered more valid now ?
i.e. this:
memcpy(pkt, rd_ptr, dwords << 2);
if ((*rd_ptr >> 2) != dwords)
return -EINVAL;
doesn't have the above described architectural race condition but it
doesn't make the data any more trustworthy - because it doesn't have
atomicity
memcpy(pkt, rd_ptr, dwords << 2);
if ((*rd_ptr >> 2) != dwords)
return -EINVAL;
dev_info(dev, "The value of *rd_ptr %lu!=%lu can be different now\n",
*rd_ptr >> 2, dwords);
Sorry this patch just can't work. It's a very hard NAK from me.
---
bod
Powered by blists - more mailing lists