linux-kernel - destroy_workqueue can livelock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <46951A98.5060107@redhat.com>
Date:	Wed, 11 Jul 2007 19:59:52 +0200
From:	Michal Schmidt <mschmidt@...hat.com>
To:	linux-kernel@...r.kernel.org
Subject: destroy_workqueue can livelock

Hi,

While using SystemTap I noticed an interesting situation. When my stap
probe was exiting, there was a several seconds long delay, during which
the CPU was 100% loaded. I narrowed the problem down to destroy_workqueue.

The attached module is a minimized testcase. To reproduce it, load the
module and then try to rmmod it from a higher priority process:
 nice -n -10 rmmod wqtest.ko  # that's how SystemTap's staprun behaves
or:
 chrt -f  90 rmmod wqtest.ko  # this may be more reliably reproducible

I tested it (with "nice") on Linux 2.6.22. The rmmod process took about
55% CPU, the workqueue thread consumed the rest. This situation can last
for minutes. As soon as the rmmod process is reniced to 0, the workqueue
is destroyed successfully and the module is unloaded.

Here's what happens in detail:

When rmmod executes cancel_rearming_delayed_workqueue() ->
wait_on_work() -> wait_on_cpu_work(), the work is the current_work on
the workqueue (it's in ssleep(1)). So wait_on_cpu_work() inserts a
wq_barrier on the workqueue and waits for the completion. As soon as
wq_barrier_func signals the completion, it is most likely preempted by
the rmmod process. At this moment, the worklist is already empty, but
cwq->current_work still points to the barrier. run_workqueue() didn't
get to reset it to NULL yet.

Now rmmod calls destroy_workqueue() -> cleanup_workqueue_thread() ->
flush_cpu_workqueue(). Because cwq->current_work!=NULL it decides to
insert another wq_barrier and wait for it to complete. But
cwq->current_work will never be reset to NULL, so
cleanup_workqueue_thread() keeps trying flush_cpu_workqueue()
indefinitely, inserting wq_barriers and waiting for them.

If rmmod's priority is lowered, run_workqueue() will not be preempted by
it and manages to reset cwq->current_work. This ends the livelock.

Can this be fixed? Or is it just a case of "Don't do that then!"?
("that" meaning destroying workqueues from negatively reniced processes)

Michal

View attachment "wqtest.c" of type "text/x-csrc" (979 bytes)