linux-kernel - Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16f43422-91aa-4c6d-b36c-3e9cb52b1ff2@gmail.com>
Date: Mon, 4 Nov 2024 15:34:24 +0000
From: Pavel Begunkov <asml.silence@...il.com>
To: Jens Axboe <axboe@...nel.dk>,
 syzbot <syzbot+e333341d3d985e5173b2@...kaller.appspotmail.com>,
 io-uring@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-usb@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [io-uring?] [usb?] WARNING in io_get_cqe_overflow (2)

On 11/4/24 15:27, Pavel Begunkov wrote:
> On 11/4/24 15:08, Jens Axboe wrote:
>> On 11/4/24 6:13 AM, Pavel Begunkov wrote:
>>> On 11/4/24 11:31, syzbot wrote:
>>>> syzbot has bisected this issue to:
>>>>
>>>> commit 3f1a546444738b21a8c312a4b49dc168b65c8706
>>>> Author: Jens Axboe <axboe@...nel.dk>
>>>> Date:   Sat Oct 26 01:27:39 2024 +0000
>>>>
>>>>       io_uring/rsrc: get rid of per-ring io_rsrc_node list
>>>>
>>>> bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=15aaa1f7980000
>>>> start commit:   c88416ba074a Add linux-next specific files for 20241101
>>>> git tree:       linux-next
>>>> final oops:     https://syzkaller.appspot.com/x/report.txt?x=17aaa1f7980000
>>>> console output: https://syzkaller.appspot.com/x/log.txt?x=13aaa1f7980000
>>>> kernel config:  https://syzkaller.appspot.com/x/.config?x=704b6be2ac2f205f
>>>> dashboard link: https://syzkaller.appspot.com/bug?extid=e333341d3d985e5173b2
>>>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=16ec06a7980000
>>>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=12c04740580000
>>>>
>>>> Reported-by: syzbot+e333341d3d985e5173b2@...kaller.appspotmail.com
>>>> Fixes: 3f1a54644473 ("io_uring/rsrc: get rid of per-ring io_rsrc_node list")
>>>>
>>>> For information about bisection process see: https://goo.gl/tpsmEJ#bisection
>>>
>>> Previously all puts were done by requests, which in case of an exiting
>>> ring were fallback'ed to normal tw. Now, the unregister path posts CQEs,
>>> while the original task is still alive. Should be fine in general because
>>> at this point there could be no requests posting in parallel and all
>>> is synchronised, so it's a false positive, but we need to change the assert
>>> or something else.
>>
>> Maybe something ala the below? Also changes these triggers to be
>> _once(), no point spamming them.
>>
>> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
>> index 00409505bf07..7792ed91469b 100644
>> --- a/io_uring/io_uring.h
>> +++ b/io_uring/io_uring.h
>> @@ -137,10 +137,11 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
>>            * Not from an SQE, as those cannot be submitted, but via
>>            * updating tagged resources.
>>            */
>> -        if (ctx->submitter_task->flags & PF_EXITING)
>> -            lockdep_assert(current_work());
>> +        if (ctx->submitter_task->flags & PF_EXITING ||
>> +            percpu_ref_is_dying(&ctx->refs))
> 
> io_move_task_work_from_local() executes requests with a normal
> task_work of a possible alive task, which which will the check.
> 
> I was thinking to kill the extra step as it doesn't make sense,
> git garbage digging shows the patch below, but I don't remember
> if it has ever been tested.
> 
> 
> commit 65560732da185c85f472e9c94e6b8ff147fc4b96
> Author: Pavel Begunkov <asml.silence@...il.com>
> Date:   Fri Jun 7 13:13:06 2024 +0100
> 
>      io_uring: skip normal tw with DEFER_TASKRUN
>      DEFER_TASKRUN execution first falls back to normal task_work and only
>      then, when the task is dying, to workers. It's cleaner to remove the
>      middle step and use workers as the only fallback. It also detaches
>      DEFER_TASKRUN and normal task_work handling from each other.
>      Signed-off-by: Pavel Begunkov <asml.silence@...il.com>

Not sure what spacing got broken here.

Regardless, the rule with sth like that should be simpler,
i.e. a ctx is getting killed => everything is run from fallback/kthread.

> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 9789cf8c68c1..d9e3661ff93d 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -1111,9 +1111,8 @@ static inline struct llist_node *io_llist_xchg(struct llist_head *head,
>       return xchg(&head->first, new);
>   }
> 
> -static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
> +static __cold void __io_fallback_tw(struct llist_node *node, bool sync)
>   {
> -    struct llist_node *node = llist_del_all(&tctx->task_list);
>       struct io_ring_ctx *last_ctx = NULL;
>       struct io_kiocb *req;
> 
> @@ -1139,6 +1138,13 @@ static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
>       }
>   }
> 
> +static __cold void io_fallback_tw(struct io_uring_task *tctx, bool sync)
> +{
> +    struct llist_node *node = llist_del_all(&tctx->task_list);
> +
> +    __io_fallback_tw(node, sync);
> +}
> +
>   struct llist_node *tctx_task_work_run(struct io_uring_task *tctx,
>                         unsigned int max_entries,
>                         unsigned int *count)
> @@ -1287,13 +1293,7 @@ static void __cold io_move_task_work_from_local(struct io_ring_ctx *ctx)
>       struct llist_node *node;
> 
>       node = llist_del_all(&ctx->work_llist);
> -    while (node) {
> -        struct io_kiocb *req = container_of(node, struct io_kiocb,
> -                            io_task_work.node);
> -
> -        node = node->next;
> -        io_req_normal_work_add(req);
> -    }
> +    __io_fallback_tw(node, false);
>   }
> 
>   static bool io_run_local_work_continue(struct io_ring_ctx *ctx, int events,
> diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h
> index e46d13e8a215..bc0a800b5ae7 100644
> --- a/io_uring/io_uring.h
> +++ b/io_uring/io_uring.h
> @@ -128,7 +128,7 @@ static inline void io_lockdep_assert_cq_locked(struct io_ring_ctx *ctx)
>            * Not from an SQE, as those cannot be submitted, but via
>            * updating tagged resources.
>            */
> -        if (ctx->submitter_task->flags & PF_EXITING)
> +        if (percpu_ref_is_dying(&ctx->refs))
>               lockdep_assert(current_work());
>           else
>               lockdep_assert(current == ctx->submitter_task);
> 

-- 
Pavel Begunkov