[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0c50c24a-35fa-acfb-a807-b4ed5394506b@quicinc.com>
Date: Thu, 15 May 2025 15:26:38 +0530
From: Vikash Garodia <quic_vgarodia@...cinc.com>
To: Bryan O'Donoghue <bryan.odonoghue@...aro.org>,
Dikshita Agarwal
<quic_dikshita@...cinc.com>,
Mauro Carvalho Chehab <mchehab@...nel.org>,
Stanimir Varbanov <stanimir.varbanov@...aro.org>,
Hans Verkuil
<hans.verkuil@...co.com>
CC: <linux-media@...r.kernel.org>, <linux-arm-msm@...r.kernel.org>,
<linux-kernel@...r.kernel.org>, Vedang Nagar <quic_vnagar@...cinc.com>
Subject: Re: [PATCH v3 1/2] media: venus: fix TOCTOU vulnerability when
reading packets from shared memory
On 5/15/2025 2:47 PM, Bryan O'Donoghue wrote:
> On 14/05/2025 14:38, Dikshita Agarwal wrote:
>> From: Vedang Nagar <quic_vnagar@...cinc.com>
>>
>> Currently, Time-Of-Check to Time-Of-Use (TOCTOU) issue happens when
>> handling packets from firmware via shared memory.
>>
>> The problematic code pattern:
>>
>> u32 dwords = *rd_ptr >> 2;
>> if (!dwords || (dwords << 2) > IFACEQ_VAR_HUGE_PKT_SIZE))
>> return -EINVAL;
>>
>> memcpy(pkt, rd_ptr, dwords << 2);
>>
>> Here, *rd_ptr is used to determine the size of the packet and is
>> validated. However, since rd_ptr points to firmware-controlled memory,
>> the firmware could change the contents (e.g., embedded header fields
>> like pkt->hdr.size) after the size was validated but before or during
>> the memcpy() call.
>>
>> This opens up a race window where a malicious or buggy firmware could
>> inject inconsistent or malicious data, potentially leading to
>> information leaks, driver crashes, or undefined behavior.
>>
>> Fix this by rechecking the packet size field from shared memory
>> immediately before the memcpy() to ensure it has not beenn altered.
>>
>> Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files")
>> Signed-off-by: Vedang Nagar <quic_vnagar@...cinc.com>
>> Co-developed-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
>> Signed-off-by: Dikshita Agarwal <quic_dikshita@...cinc.com>
>> ---
>> drivers/media/platform/qcom/venus/hfi_venus.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c
>> b/drivers/media/platform/qcom/venus/hfi_venus.c
>> index
>> b5f2ea8799507f9b83f1529e70061ea89a9cc5c8..163c8d16530bc44a84b2b21076e6189d476fe360 100644
>> --- a/drivers/media/platform/qcom/venus/hfi_venus.c
>> +++ b/drivers/media/platform/qcom/venus/hfi_venus.c
>> @@ -295,6 +295,9 @@ static int venus_read_queue(struct venus_hfi_device *hdev,
>> new_rd_idx = rd_idx + dwords;
>> if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
>> if (new_rd_idx < qsize) {
>> + if ((*rd_ptr >> 2) != dwords)
>> + return -EINVAL;
>> +
>> memcpy(pkt, rd_ptr, dwords << 2);
>> } else {
>> size_t len;
>>
>
> Here's how this code fragment looks after the change, I'll add two "}" for
> readability and annotate
>
> dwords = *rd_ptr >> 2; // read the value here
> if (!dwords)
> return -EINVAL;
>
> new_rd_idx = rd_idx + dwords;
>
> // validate the size against a maximum value
> // this step is correct
> if (((dwords << 2) <= IFACEQ_VAR_HUGE_PKT_SIZE) && rd_idx <= qsize) {
> if (new_rd_idx < qsize) {
> // Re-read the value because firmware
> // might have changed the value
> if ((*rd_ptr >> 2) != dwords)
> return -EINVAL;
>
> // now trust dwords
> memcpy(pkt, rd_ptr, dwords << 2);
> }
> }
>
> But this is all wrong.
>
> There is no atomicity on the APSS side between the first verification of dwords
> size and the mempcpy() the commit log itself shows that the firmware is
> free-running with respect to the instruction pipeline of the APSS, it is an AMP
> problem.
>
> Adding another check of the dwords side right before the memcpy() doesn't
> address the problem which the commit log describes as the firmware updating the
> length field of a header in shared memory.
>
> There are perhaps 10 assembly instructions between the first check and the
> procedure prologue of the memcpy();
>
> Adding another length check right before the memcpy() simply reduces the number
> of CPU instructions - the effective window that the firmware can update that
> header still.
>
> if ((*rd_ptr >> 2) != dwords) // conditional branch operation
>
> memcpy(pkt, rd_ptr, dwords << 2);
>
> Begins with a procedure prologue - setting up the call stack - and then is a
> series of fetch/stores to copy data from here to there
>
> The memcpy() itself by its nature it not atomic on the front-side-bus of the
> APSS to shared DRAM with the firmware.
>
> On a CPU and SoC architecture level this fix just doesn't work.
>
> To be honest we are already doing the right thing in this routine.
>
> 1. Reading the value from the packet header.
> 2. Validating the given size against the maximum size
> 3. Rejecting the memcpy() if the given size _at_the_time_we_read_ is too
> large.
>
> The alternative to guarantee would be something like
>
> asm("bus master asserts bus lock to PAGE/PAGES involved");
> dwords = *rd_ptr;
> if (dwords > MAX_VALUE)
> return -EFIRMWARE_BUG;
> memcpy(dst, src, dwords >> 2);
>
> asm("bus master unlocks memory");
>
> Lets say we make the change proposed in this patch, here is how it breaks:
>
> if ((*rd_ptr >> 2) != dwords)
> return -EINVAL;
>
> // now trust dwords
> memcpy(pkt, rd_ptr, dwords << 2);
>
>
> objdump
> qlt-kernel/build/x1e80100-crd_qlt_integration/drivers/media/platform/qcom/venus/venus-core.o --disassemble=venus_read_queue.isra.0
>
> 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast
> 5c4c: 2a0303e2 mov w2, w3
> 5c50: aa0703e0 mov x0, x7
> 5c54: 94000000 bl 0 <memcpy>
> 5c58: 52800000 mov w0, #0x0
>
> Your conditional jump is @ 0x5c48 your call to memcpy @ 0x5c54
>
> Between 0x5c48 and 0x5c54 the firmware can update the value _again_
> Indeed the firmware can update the value up until the time we complete reading
> the bit of the pkt header in memcpy() so an additional few instructions for sure.
>
> You could make some type of argument to re-verify the content of the pkt _after_
> the memcpy()
>
> But the only verification that makes any sense _before_ the memcpy() is to
> verify the length at the point you _read_ - subsequent to the latching operation
> - were we fetch the length value from DRAM into our CPU cache, operating stack
> and/or local registers.
>
> Once that data is fetched within the cache/stack/registers of the CPU/APSS that
> is the relevant value.
>
> For the fix you have here to work you need this
>
> 5c48: MAGICOP memorybuslock
> 5c48: 540000c9 b.ls 5c60 <venus_read_queue.isra.0+0x110> // b.plast
> 5c4c: 2a0303e2 mov w2, w3
> 5c50: aa0703e0 mov x0, x7
> 5c54: 94000000 bl 0 <memcpy>
> 5c58: 52800000 mov w0, #0x0
> 5c5c: MAGICUNOP memorybusunlock
>
> Because the firmware is free-running - with respect to the instruction pipeline
> of the above assembly.
>
> If you really want to verify the data is still valid - it should be done _after_
> the memcpy();
>
> But even then I'd say to you, why verify _after_ the memcpy() - and what happens
> on the instruction directly _after_ the verification - is the data considered
> more valid now ?
the patch _only_ reduces the window where data in shared queue can go wrong.
Doing it after memcpy() would be better here given the data would not be read
further from shared queue, which would avoid the case of data getting updated later.
memcpy(hfi_dev->pkt_buf, rd_ptr from shared queue, dwords..)
pkt_hdr = (struct hfi_pkt_hdr *) (hfi_dev->pkt_buf);
if ((pkt_hdr->size >> 2) != dwords)
return -EINVAL;
Regards,
Vikash
>
> i.e. this:
>
> memcpy(pkt, rd_ptr, dwords << 2);
>
> if ((*rd_ptr >> 2) != dwords)
> return -EINVAL;
>
> doesn't have the above described architectural race condition but it doesn't
> make the data any more trustworthy - because it doesn't have atomicity
>
> memcpy(pkt, rd_ptr, dwords << 2);
>
> if ((*rd_ptr >> 2) != dwords)
> return -EINVAL;
>
> dev_info(dev, "The value of *rd_ptr %lu!=%lu can be different now\n",
> *rd_ptr >> 2, dwords);
>
> Sorry this patch just can't work. It's a very hard NAK from me.
>
> ---
> bod
Powered by blists - more mailing lists