linux-kernel - Deadlocks due to per-process plugging

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120711133735.GA8122@quack.suse.cz>
Date:	Wed, 11 Jul 2012 15:37:35 +0200
From:	Jan Kara <jack@...e.cz>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	linux-fsdevel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
	Jens Axboe <jaxboe@...ionio.com>
Subject: Deadlocks due to per-process plugging

  Hello,

  we've recently hit a deadlock in our QA runs which is caused by the
per-process plugging code. The problem is as follows:
  process A					process B (kjournald)
  generic_file_aio_write()
    blk_start_plug(&plug);
    ...
    somewhere in here we allocate memory and
    direct reclaim submits buffer X for IO
    ...
    ext3_write_begin()
      ext3_journal_start()
        we need more space in a journal
        so we want to checkpoint old transactions,
        we block waiting for kjournald to commit
        a currently running transaction.
						journal_commit_transaction()
						  wait for IO on buffer X
						  to complete as it is part
						  of the current transaction

  => deadlock since A waits for B and B waits for A to do unplug.
BTW: I don't think this is really ext3/ext4 specific. I think other
filesystems can get into problems as well when direct reclaim submits some
IO and the process subsequently blocks without submitting the IO.

Effectively the per process plugging introduces a lock dependency
buffer_lock -> any lock acquired after IO submission before the process'
queue is unplugged. This certainly creates lots of cycles in the lock
dependency graph...

I'm wondering how we should fix this best. Trivial fix would be to flush
the IO plug on every schedule, not just io_schedule(), but that can have
some peformance implications I guess (the effect of plugging would be very
limited). Better (although more tedious) solution would be to push the
plugs from higher levels down into the filesystems where they could be
managed to not create problematic lock dependencies (but e.g. for ext3/ext4
that means we have to unplug after writing each page so it is effectively
rather similar to unplugging on every schedule()).

Thoughts?

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/