linux-kernel - Re: [dm-devel] DM-CRYPT: Scale to multiple CPUs v3

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 11 Oct 2010 11:32:09 +0200
From:	Milan Broz <mbroz@...hat.com>
To:	Andi Kleen <andi@...stfloor.org>
CC:	Mike Snitzer <snitzer@...hat.com>, Andi Kleen <ak@...ux.intel.com>,
	device-mapper development <dm-devel@...hat.com>,
	pedrib@...il.com, linux-kernel@...r.kernel.org
Subject: Re: [dm-devel] DM-CRYPT: Scale to multiple CPUs v3

On 10/10/2010 10:20 PM, Andi Kleen wrote:
>> But previously, there were threads per device, so if one IO thread blocks,
>> others stacked mappings can continue
>> Now I see possibility for deadlock there because we have one io thread now
>> (assuming that 1 CPU situation Alasdair mentioned).
> 
> That path calls the crypto worker thread, not the IO worker thread?
> crypto worker should be fine here, only IO worker would be a problem
> I think because crypto doesn't really block on nested IO.

Well, crypt thread can block, this is surely not what we want in async callback
and IO thread cannot call generic_make_request() from this context as well...

But the reads are not problem - system queues IO, then after completion
it calls async crypt. When async crypt is done, kcrypt_crypt_read_done()
is called - and it is safe, there is just bio_endio().

The problem is write path - it allocates bio clone, run async crypto on it
and the final callback queues cloned bio to underlying device.
Because code cannot call generic_make_request() directly here (assuming
it still runs in interrupt mode), it submits new work to io thread.

So there the code behaves the same as before - just instead of queueing into
separate per-device io workqueue we have now just one common queue...
So question is, if this is safe in all stacked situations and cannot deadlock.

Imagine you have stacked one dm-crypt device, which implements uses
alg in sync mode over another one, which run in async mode.
So the common io thread runs both encryption and make_request...
If it can lock itself here, it is regression from previous version here.

(Note I am not blocking the patch - I think this can be solved later somehow,
but either we should know about this problem or prove it is safe.
In normal (sync) mode this path is not used at all - and this is the most
common situation.)

Milan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/