[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <73b49753-da23-eef3-890b-92e85278d882@gmail.com>
Date: Thu, 31 Dec 2020 08:47:28 +0300
From: Fatih USTA <fatihusta86@...il.com>
To: "Jason A. Donenfeld" <Jason@...c4.com>
Cc: WireGuard mailing list <wireguard@...ts.zx2c4.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: wg-crypt-wg0 process
Hi Jason,
Thanks for the detailed research and explanation.That's ok for me.
Regards.
Fatih USTA
On 30.12.2020 15:39, Jason A. Donenfeld wrote:
> Hi Fatih,
>
> Thanks for the report and the detailed test case. From what I can see,
> this behavior presents itself both with the explicit ip link del and
> without. When running with debugging enabled, I can see this in dmesg:
>
> [558758.361056] wireguard: wg0: Keypair 244 destroyed for peer 21
> [558758.546649] wireguard: wg0: Peer 21 (10.150.150.2:51820) destroyed
> [558758.563317] wireguard: wg0: Interface destroyed
> [558758.567803] wireguard: wg0: Keypair 243 destroyed for peer 22
> [558758.733287] wireguard: wg0: Peer 22 (10.150.150.1:51820) destroyed
> [558758.749991] wireguard: wg0: Interface destroyed
>
> The fact that I see "Interface destroyed" for both interfaces means
> that wg_destruct() is being called, which includes these calls:
>
> destroy_workqueue(wg->handshake_receive_wq);
> destroy_workqueue(wg->handshake_send_wq);
> destroy_workqueue(wg->packet_crypt_wq);
>
> In doing so, the output of ps changes from:
>
> $ ps aux|grep wg0
> root 200479 0.0 0.0 0 0 ? I 13:06 0:00
> [kworker/0:2-wg-crypt-wg0]
> root 201226 0.0 0.0 0 0 ? I 13:08 0:00
> [kworker/1:4-wg-crypt-wg0]
> root 201476 0.0 0.0 0 0 ? I< 13:11 0:00
> [wg-crypt-wg0]
> root 201484 0.0 0.0 0 0 ? I< 13:11 0:00
> [wg-crypt-wg0]
>
> to:
>
> $ ps aux|grep wg0
> root 200479 0.0 0.0 0 0 ? I 13:06 0:00
> [kworker/0:2-wg-crypt-wg0]
> root 201226 0.0 0.0 0 0 ? I 13:08 0:00
> [kworker/1:4-wg-crypt-wg0]
>
> What I suspect is happening is that destroying the workqueue does not
> actually destroy the kthreads that they were using, so that they can
> be reused (and eventually relabeled) by other drivers. Looking at the
> stack of those indicates this is probably the case:
>
> $ cat /proc/200479/stack
> [<0>] worker_thread+0xba/0x3c0
> [<0>] kthread+0x114/0x130
> [<0>] ret_from_fork+0x1f/0x30
>
> So it's just hanging out there idle waiting to be scheduled by
> something new. In fact, while I was writing this email, that worker
> already seems to have been reclaimed by another driver:
>
> $ cat /proc/200479/comm
> kworker/0:2-events
>
> Now it's called "events".
>
> This is happening because the kthread isn't actually destroyed, and
> task->comm is being hijacked. In proc_task_name we have:
>
> if (p->flags & PF_WQ_WORKER)
> wq_worker_comm(tcomm, sizeof(tcomm), p);
> else
> __get_task_comm(tcomm, sizeof(tcomm), p);
>
> That top condition holds for workqueue workers, and wq_worker_comm
> winds up scnprintf'ing the current worker description in there:
>
> /*
> * ->desc tracks information (wq name or
> * set_worker_desc()) for the latest execution. If
> * current, prepend '+', otherwise '-'.
> */
> if (worker->desc[0] != '\0') {
> if (worker->current_work)
> scnprintf(buf + off, size - off, "+%s",
> worker->desc);
> else
> scnprintf(buf + off, size - off, "-%s",
> worker->desc);
>
> But worker->desc isn't set until process_one_work is called:
>
> /*
> * Record wq name for cmdline and debug reporting, may get
> * overridden through set_worker_desc().
> */
> strscpy(worker->desc, pwq->wq->name, WORKER_DESC_LEN);
>
> And it's never unset after the work is done and it's waiting idle in
> worker_thread for the scheduler to reschedule it and eventually call
> process_one_work on a new work unit.
>
> It would be easy to just null out that string after the work is done
> with something like:
>
> worker->current_func(work);
> worker->desc[0] = '\0';
>
> But I guess this has never sufficiently bothered anyone before. I
> suppose I could submit a patch and see how it's received. But it also
> looks like the scnprintf above in wq_worker_comm distinguishes these
> cases anyway. If there's a + it means that the work is active and if
> there's a - it means that it's an old leftover thread. So maybe this
> is fine as-is?
>
> Jason
Powered by blists - more mailing lists