linux-kernel - Re: [dm-devel] dm-writeboost testing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <alpine.LRH.2.02.1310040923320.8473@file01.intranet.prod.int.rdu2.redhat.com>
Date:	Fri, 4 Oct 2013 09:38:50 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Akira Hayakawa <ruby.wktk@...il.com>
cc:	dm-devel@...hat.com, devel@...verdev.osuosl.org,
	thornber@...hat.com, snitzer@...hat.com,
	gregkh@...uxfoundation.org, david@...morbit.com,
	linux-kernel@...r.kernel.org, dan.carpenter@...cle.com,
	joe@...ches.com, akpm@...ux-foundation.org, m.chehab@...sung.com,
	ejt@...hat.com, agk@...hat.com, cesarb@...arb.net, tj@...nel.org
Subject: Re: [dm-devel] dm-writeboost testing



On Fri, 4 Oct 2013, Akira Hayakawa wrote:

> Hi, Mikulas,
> 
> I am sorry to say that
> I don't have such machines to reproduce the problem.
> 
> But agree with that I am dealing with workqueue subsystem
> in a little bit weird way.
> I should clean them up.
> 
> For example,
> free_cache() routine below is
> a deconstructor of the cache metadata
> including all the workqueues.
> 
> void free_cache(struct wb_cache *cache)
> {
>         cache->on_terminate = true;
> 
>         /* Kill in-kernel daemons */
>         cancel_work_sync(&cache->sync_work);
>         cancel_work_sync(&cache->recorder_work);
>         cancel_work_sync(&cache->modulator_work);
> 
>         cancel_work_sync(&cache->flush_work);
>         destroy_workqueue(cache->flush_wq);
> 
>         cancel_work_sync(&cache->barrier_deadline_work);
> 
>         cancel_work_sync(&cache->migrate_work);
>         destroy_workqueue(cache->migrate_wq);
>         free_migration_buffer(cache);
> 
>         /* Destroy in-core structures */
>         free_ht(cache);
>         free_segment_header_array(cache);
> 
>         free_rambuf_pool(cache);
> }
> 
> cancel_work_sync() before destroy_workqueue()
> can probably be removed because destroy_workqueue() first
> flush all the works.
> 
> Although I prepares independent workqueue
> for each flush_work and migrate_work
> other four works are queued into the system_wq
> through schedule_work() routine.
> This asymmetricity is not welcome for
> architecture-portable code.
> Dependencies to the subsystem should be minimized.
> In detail, workqueue subsystem is really changing
> about its concurrency support so
> trusting only the single threaded workqueue
> will be a good idea for stability.

The problem is that you are using workqueues the wrong way. You submit a 
work item to a workqueue and the work item is active until the device is 
unloaded.

If you submit a work item to a workqueue, it is required that the work 
item finishes in finite time. Otherwise, it may stall stall other tasks. 
The deadlock when I terminate Xserver is caused by this - the nvidia 
driver tries to flush system workqueue and it waits for all work items to 
terminate - but your work items don't terminate.

If you need a thread that runs for a long time, you should use 
kthread_create, not workqueues (see this 
http://people.redhat.com/~mpatocka/patches/kernel/dm-crypt-paralelizace/old-3/dm-crypt-encryption-threads.patch 
or this 
http://people.redhat.com/~mpatocka/patches/kernel/dm-crypt-paralelizace/old-3/dm-crypt-offload-writes-to-thread.patch 
as an example how to use kthreads).

Mikulas

> To begin with,
> these works are never out of queue
> until the deconstructor is called
> but they are repeating running and sleeping.
> Queuing these kind of works to system_wq
> may be unsupported.
> 
> So,
> my strategy is to clean them up in a way that
> 1. all daemons are having their own workqueue
> 2. never use cancel_work_sync() but only calls destroy_workqueue()
>    in the deconstructor free_cache() and error handling in resume_cache().
> 
> Could you please run the same test again
> after I fixed these points
> to see whether it is still reproducible?
> 
> 
> > On 3.11.3 on PA-RISC without preemption, the device unloads (although it 
> > takes many seconds and vmstat shows that the machine is idle during this 
> > time)
> This behavior is benign but probably should be improved.
> In said free_cache() it first turns `on_terminate` flag to true
> to notify all the daemons that we are shutting down.
> Since the `update_interval` and `sync_interval` are 60 seconds by default
> we must wait for them to finish for a while.
> 
> Akira
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/