linux-kernel - Re: [PATCH v3] tee: optee: prevent use-after-free when the client exits before the supplicant

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4c9e4f5f-e0f3-42aa-852f-064f4024af26@oss.qualcomm.com>
Date: Wed, 4 Feb 2026 09:56:12 +1100
From: Amirreza Zarrabi <amirreza.zarrabi@....qualcomm.com>
To: Jens Wiklander <jens.wiklander@...aro.org>
Cc: Sumit Garg <sumit.garg@...nel.org>, Arnd Bergmann <arnd@...db.de>,
        Michael Wu <michael@...winnertech.com>, linux-arm-msm@...r.kernel.org,
        op-tee@...ts.trustedfirmware.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] tee: optee: prevent use-after-free when the client
 exits before the supplicant

Hi Jens,

On 2/3/2026 5:59 PM, Jens Wiklander wrote:
> Hi,
> 
> On Tue, Feb 3, 2026 at 3:09 AM Amirreza Zarrabi
> <amirreza.zarrabi@....qualcomm.com> wrote:
>>
>> Hi Jens,
>>
>> On 2/2/2026 10:36 PM, Jens Wiklander wrote:
>>> Hi Amir,
>>>
>>> On Thu, Jan 29, 2026 at 4:22 AM Amirreza Zarrabi
>>> <amirreza.zarrabi@....qualcomm.com> wrote:
>>>>
>>>> Commit 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop") made the
>>>> client wait as killable so it can be interrupted during shutdown or
>>>> after a supplicant crash. This changes the original lifetime expectations:
>>>> the client task can now terminate while the supplicant is still processing
>>>> its request.
>>>>
>>>> If the client exits first it removes the request from its queue and
>>>> kfree()s it, while the request ID remains in supp->idr. A subsequent
>>>> lookup on the supplicant path then dereferences freed memory, leading to
>>>> a use-after-free.
>>>>
>>>> Serialise access to the request with supp->mutex:
>>>>
>>>>   * Hold supp->mutex in optee_supp_recv() and optee_supp_send() while
>>>>     looking up and touching the request.
>>>>   * Let optee_supp_thrd_req() notice that the client has terminated and
>>>>     signal optee_supp_send() accordingly.
>>>>
>>>> With these changes the request cannot be freed while the supplicant still
>>>> has a reference, eliminating the race.
>>>>
>>>> Fixes: 70b0d6b0a199 ("tee: optee: Fix supplicant wait loop")
>>>> Signed-off-by: Amirreza Zarrabi <amirreza.zarrabi@....qualcomm.com>
>>>> ---
>>>> Changes in v3:
>>>> - Introduce processed flag instead of -1 for req->id.
>>>> - Update optee_supp_release() as reported by Michael Wu.
>>>> - Use mutex instead of guard.
>>>> - Link to v2: https://lore.kernel.org/r/20250617-fix-use-after-free-v2-1-1fbfafec5917@oss.qualcomm.com
>>>>
>>>> Changes in v2:
>>>> - Replace the static variable with a sentinel value.
>>>> - Fix the issue with returning the popped request to the supplicant.
>>>> - Link to v1: https://lore.kernel.org/r/20250605-fix-use-after-free-v1-1-a70d23bff248@oss.qualcomm.com
>>>> ---
>>>>  drivers/tee/optee/supp.c | 122 +++++++++++++++++++++++++++++++++--------------
>>>>  1 file changed, 86 insertions(+), 36 deletions(-)
>>>
>>> I had forgotten about this. I'd like to prioritize getting this fixed
>>> soon. By the way, how did you test this?
>>>
>>
>> Thanks for the update. I currently don't have access to the setup required to run
>> the tests myself. My plan is to finalize the design and implementation, then
>> ask Michael Wu to run his use case. Based on his earlier feedback, the patch
>> appears to be working as expected.
>>
>> https://lore.kernel.org/all/292653ba-3836-00f1-acd4-a28b1c54388c@allwinnertech.com/
> 
> OK
> 
>>
>>>>
>>>> diff --git a/drivers/tee/optee/supp.c b/drivers/tee/optee/supp.c
>>>> index d0f397c90242..0ec66008df19 100644
>>>> --- a/drivers/tee/optee/supp.c
>>>> +++ b/drivers/tee/optee/supp.c
>>>> @@ -10,7 +10,11 @@
>>>>  struct optee_supp_req {
>>>>         struct list_head link;
>>>>
>>>> +       int id;
>>>> +
>>>>         bool in_queue;
>>>> +       bool processed;
>>>> +
>>>>         u32 func;
>>>>         u32 ret;
>>>>         size_t num_params;
>>>> @@ -19,6 +23,9 @@ struct optee_supp_req {
>>>>         struct completion c;
>>>>  };
>>>>
>>>> +/* It is temporary request used for invalid pending request in supp->idr. */
>>>> +#define INVALID_REQ_PTR ((struct optee_supp_req *)ERR_PTR(-ENOENT))
>>>> +
>>>>  void optee_supp_init(struct optee_supp *supp)
>>>>  {
>>>>         memset(supp, 0, sizeof(*supp));
>>>> @@ -46,6 +53,10 @@ void optee_supp_release(struct optee_supp *supp)
>>>>         /* Abort all request retrieved by supplicant */
>>>>         idr_for_each_entry(&supp->idr, req, id) {
>>>>                 idr_remove(&supp->idr, id);
>>>> +               /* Skip if request was already marked invalid */
>>>> +               if (IS_ERR(req))
>>>> +                       continue;
>>>> +
>>>>                 req->ret = TEEC_ERROR_COMMUNICATION;
>>>>                 complete(&req->c);
>>>>         }
>>>> @@ -102,6 +113,7 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>         mutex_lock(&supp->mutex);
>>>>         list_add_tail(&req->link, &supp->reqs);
>>>>         req->in_queue = true;
>>>> +       req->processed = false;
>>>>         mutex_unlock(&supp->mutex);
>>>>
>>>>         /* Tell an eventual waiter there's a new request */
>>>> @@ -117,21 +129,40 @@ u32 optee_supp_thrd_req(struct tee_context *ctx, u32 func, size_t num_params,
>>>>         if (wait_for_completion_killable(&req->c)) {
>>>>                 mutex_lock(&supp->mutex);
>>>>                 if (req->in_queue) {
>>>> +                       /* Supplicant has not seen this request yet. */
>>>>                         list_del(&req->link);
>>>>                         req->in_queue = false;
>>>> +
>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>> +               } else if (req->processed) {
>>>> +                       /*
>>>> +                        * Supplicant has processed this request. Ignore the
>>>> +                        * kill signal for now and submit the result.
>>>> +                        */
>>>> +                       ret = req->ret;
>>>> +               } else {
>>>> +                       /*
>>>> +                        * Supplicant is in the middle of processing this
>>>> +                        * request. Replace req with INVALID_REQ_PTR so that
>>>> +                        * the ID remains busy, causing optee_supp_send() to
>>>> +                        * fail on the next call to supp_pop_req() with this ID.
>>>> +                        */
>>>> +                       idr_replace(&supp->idr, INVALID_REQ_PTR, req->id);
>>>> +                       ret = TEEC_ERROR_COMMUNICATION;
>>>>                 }
>>>> +
>>>>                 mutex_unlock(&supp->mutex);
>>>> -               req->ret = TEEC_ERROR_COMMUNICATION;
>>>> +       } else {
>>>> +               ret = req->ret;
>>>>         }
>>>>
>>>> -       ret = req->ret;
>>>>         kfree(req);
>>>>
>>>>         return ret;
>>>>  }
>>>>
>>>>  static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>> -                                             int num_params, int *id)
>>>> +                                             int num_params)
>>>>  {
>>>>         struct optee_supp_req *req;
>>>>
>>>> @@ -153,8 +184,8 @@ static struct optee_supp_req  *supp_pop_entry(struct optee_supp *supp,
>>>>                 return ERR_PTR(-EINVAL);
>>>>         }
>>>>
>>>> -       *id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>> -       if (*id < 0)
>>>> +       req->id = idr_alloc(&supp->idr, req, 1, 0, GFP_KERNEL);
>>>> +       if (req->id < 0)
>>>>                 return ERR_PTR(-ENOMEM);
>>>
>>> Since we're now storing the supplicant request ID, wouldn't it make
>>> sense to pre-allocate the ID when allocating the request to avoid this
>>> error case?
>>>
>>
>> True, but allocating the ID at this stage has one advantage.
>> If an ID is not available, the request can remain on the request list,
>> allowing the supplicant to retry later when resources become available.
>> If ID allocation fails during request creation, I have no choice but
>> to drop the request and report an error to optee.
> 
> We're allocating in the range 1..INT_MAX, and not more than a handful
> are expected to be active at a time. If we run out of IDs, we have
> bigger problems.
> 

Ack.

>>
>>>>
>>>>         list_del(&req->link);
>>>> @@ -214,7 +245,6 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>         struct optee *optee = tee_get_drvdata(teedev);
>>>>         struct optee_supp *supp = &optee->supp;
>>>>         struct optee_supp_req *req = NULL;
>>>> -       int id;
>>>>         size_t num_meta;
>>>>         int rc;
>>>>
>>>> @@ -224,15 +254,48 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>
>>>>         while (true) {
>>>>                 mutex_lock(&supp->mutex);
>>>> -               req = supp_pop_entry(supp, *num_params - num_meta, &id);
>>>> -               mutex_unlock(&supp->mutex);
>>>>
>>>> -               if (req) {
>>>> -                       if (IS_ERR(req))
>>>> -                               return PTR_ERR(req);
>>>> -                       break;
>>>> +               req = supp_pop_entry(supp, *num_params - num_meta);
>>>> +               if (!req) {
>>>> +                       mutex_unlock(&supp->mutex);
>>>> +                       goto wait_for_request;
>>>> +               }
>>>> +
>>>> +               if (IS_ERR(req)) {
>>>> +                       rc = PTR_ERR(req);
>>>> +                       mutex_unlock(&supp->mutex);
>>>> +
>>>> +                       return rc;
>>>>                 }
>>>>
>>>> +               /*
>>>> +                * Process the request while holding the lock, so that
>>>> +                * optee_supp_thrd_req() doesn't pull the request from under us.
>>>> +                */
>>>> +
>>>> +               if (num_meta) {
>>>> +                       /*
>>>> +                        * tee-supplicant support meta parameters ->
>>>> +                        * requests can be processed asynchronously.
>>>> +                        */
>>>> +                       param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>> +                                     TEE_IOCTL_PARAM_ATTR_META;
>>>> +                       param->u.value.a = req->id;
>>>> +                       param->u.value.b = 0;
>>>> +                       param->u.value.c = 0;
>>>> +               } else {
>>>> +                       supp->req_id = req->id;
>>>> +               }
>>>> +
>>>> +               *func = req->func;
>>>> +               *num_params = req->num_params + num_meta;
>>>> +               memcpy(param + num_meta, req->param,
>>>> +                      sizeof(struct tee_param) * req->num_params);
>>>> +
>>>> +               mutex_unlock(&supp->mutex);
>>>> +               return 0;
>>>
>>> Do we really need to move this into the loop? The structure of the
>>> function becomes a bit unusual and harder to read.
>>>
>>
>> Ack. I'll reorganize this function.
>>
>>>> +
>>>> +wait_for_request:
>>>>                 /*
>>>>                  * If we didn't get a request we'll block in
>>>>                  * wait_for_completion() to avoid needless spinning.
>>>> @@ -243,29 +306,10 @@ int optee_supp_recv(struct tee_context *ctx, u32 *func, u32 *num_params,
>>>>                  */
>>>>                 if (wait_for_completion_interruptible(&supp->reqs_c))
>>>>                         return -ERESTARTSYS;
>>>> -       }
>>>>
>>>> -       if (num_meta) {
>>>> -               /*
>>>> -                * tee-supplicant support meta parameters -> requsts can be
>>>> -                * processed asynchronously.
>>>> -                */
>>>> -               param->attr = TEE_IOCTL_PARAM_ATTR_TYPE_VALUE_INOUT |
>>>> -                             TEE_IOCTL_PARAM_ATTR_META;
>>>> -               param->u.value.a = id;
>>>> -               param->u.value.b = 0;
>>>> -               param->u.value.c = 0;
>>>> -       } else {
>>>> -               mutex_lock(&supp->mutex);
>>>> -               supp->req_id = id;
>>>> -               mutex_unlock(&supp->mutex);
>>>> +               /* Check for the next request in the queue. */
>>>>         }
>>>>
>>>> -       *func = req->func;
>>>> -       *num_params = req->num_params + num_meta;
>>>> -       memcpy(param + num_meta, req->param,
>>>> -              sizeof(struct tee_param) * req->num_params);
>>>> -
>>>>         return 0;
>>>>  }
>>>>
>>>> @@ -297,12 +341,18 @@ static struct optee_supp_req *supp_pop_req(struct optee_supp *supp,
>>>>         if (!req)
>>>>                 return ERR_PTR(-ENOENT);
>>>>
>>>> +       /* optee_supp_thrd_req() already returned to optee. */
>>>> +       if (IS_ERR(req))
>>>> +               goto failed_req;
>>>> +
>>>>         if ((num_params - nm) != req->num_params)
>>>>                 return ERR_PTR(-EINVAL);
>>>>
>>>> +       *num_meta = nm;
>>>> +failed_req:
>>>>         idr_remove(&supp->idr, id);
>>>>         supp->req_id = -1;
>>>> -       *num_meta = nm;
>>>> +
>>>>
>>>>         return req;
>>>>  }
>>>> @@ -328,9 +378,8 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>
>>>>         mutex_lock(&supp->mutex);
>>>>         req = supp_pop_req(supp, num_params, param, &num_meta);
>>>> -       mutex_unlock(&supp->mutex);
>>>> -
>>>>         if (IS_ERR(req)) {
>>>> +               mutex_unlock(&supp->mutex);
>>>
>>> We need a way to tell the difference between an id not found and an id
>>> removed because of a killed requester.
>>> How about storing NULL for revoked requests instead of an err-pointer?
>>>
>>
>> Not sure I'm following correctly. Are you expecting supp_pop_req()
>> to return NULL instead of an err-pointer when a request has been revoked?
> 
> I was looking at it again, and storing an err-pointer as you do in
> this patch has the advantage that we can tell whether the ID has been
> revoked or was never supplied. In the latter case, it suggests that
> the supplicant is doing something wrong and might as well restart in
> an attempt to recover. So, please keep using an err-pointer as a
> placeholder, but we must be able to distinguish a revoked request from
> other errors to make sure that the supplicant doesn't restart due to a
> revoked request.
> 

Understood. What if I switch the stored err-pointer to EBADF instead of ENOENT
(which seems more natural), so it doesn't overlap with other supp_pop_req() error
codes and the supplicant can reliably detect it.

Best Regards,
Amir

> Cheers,
> Jens
> 
>>
>> Best Rearads,
>> Amir
>>
>>> Cheers,
>>> Jens
>>>
>>>>                 /* Something is wrong, let supplicant restart. */
>>>>                 return PTR_ERR(req);
>>>>         }
>>>> @@ -355,9 +404,10 @@ int optee_supp_send(struct tee_context *ctx, u32 ret, u32 num_params,
>>>>                 }
>>>>         }
>>>>         req->ret = ret;
>>>> -
>>>> +       req->processed = true;
>>>>         /* Let the requesting thread continue */
>>>>         complete(&req->c);
>>>> +       mutex_unlock(&supp->mutex);
>>>>
>>>>         return 0;
>>>>  }
>>>>
>>>> ---
>>>> base-commit: 3f24e4edcd1b8981c6b448ea2680726dedd87279
>>>> change-id: 20250604-fix-use-after-free-8ff1b5d5d774
>>>>
>>>> Best regards,
>>>> --
>>>> Amirreza Zarrabi <amirreza.zarrabi@....qualcomm.com>
>>>>
>>