linux-kernel - Re: [PATCH] libata, freezer: avoid block device removal while system is frozen

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131213230744.GA17954@htj.dyndns.org>
Date:	Fri, 13 Dec 2013 18:07:44 -0500
From:	Tejun Heo <tj@...nel.org>
To:	Nigel Cunningham <nigel@...elcunningham.com.au>
Cc:	"Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
	Jens Axboe <axboe@...nel.dk>, tomaz.solc@...lix.org,
	aaron.lu@...el.com, linux-kernel@...r.kernel.org,
	Oleg Nesterov <oleg@...hat.com>,
	Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
	Fengguang Wu <fengguang.wu@...el.com>
Subject: Re: [PATCH] libata, freezer: avoid block device removal while system
 is frozen

Hello, Nigel.

On Sat, Dec 14, 2013 at 09:45:59AM +1100, Nigel Cunningham wrote:
> In your first email, in the first substantial paragraph (starting
> "Now, if the rest.."), you say "libata device removal waits for the
> scheduled writeback work item to finish". I wonder if that's the
> lynchpin. If we know the device is gone, why are we trying to write
> to it?

It's just a standard part of block device removal -
invalidate_partition(), bdi_wb_shutdown().

> All pending I/O should have been flushed when suspend/hibernate
> started, and there's no point in trying to update metadata on a

Frozen or not, it isn't guaranteed that bdi wb queue is empty when the
system went to suspend.  They're likely to be empty but there's no
guarantee.  Conversion to workqueue only makes the behavior more
deterministic.

> device we can't access, so there should be no writeback needed (and
> anything that does somehow get there should just be discarded since
> it will never succeed anyway).

Even if they'll never succeed, they still need to be issued and
drained; otherwise, we'll end up with leaked items and hung issuers.

> Having said the above, I agree that we shouldn't need to freeze
> kernel threads and workqueues themselves. I think we should be
> giving the producers of I/O the nous needed to avoid producing I/O
> during suspend/hibernate. But perhaps I'm missing something here,
> too.

I never understood that part.  Why do we need to control the
producers?  The chain between the producer and consumer is a long one
and no matter what we do with the producers, the consumers need to be
plugged all the same.  Why bother with the producers at all?  I think
that's where all this freezable kthreads started but I don't
understand what the benefit of that is.  Not only that, freezer is
awefully inadequate in its role too.  There are flurry of activities
which happen in the IO path without any thread involved and many of
them can lead to issuance of new IO, so the only thing freezer is
achieving is making existing bugs less visible, which is a bad thing
especially for suspend/resume as the failure mode often doesn't yield
to easy debugging.

I asked the same question years ago and ISTR getting only fairly vague
answers but this whole freezable kthread is expectedly proving to be a
continuous source of problems.  Let's at least find out whether we
need it and why if so.  Not some "I feel better knowing things are
calmer" type vagueness but actual technical necessity of it.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/