linux-ext4 - Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87pqh3ltc4.fsf@dmbot.sw.ru>
Date:	Mon, 07 Nov 2011 21:45:31 +0400
From:	Dmitry Monakhov <dmonakhov@...nvz.org>
To:	Jan Kara <jack@...e.cz>
Cc:	Kazuya Mio <k-mio@...jp.nec.com>, Jan Kara <jack@...e.cz>,
	ext4 <linux-ext4@...r.kernel.org>, Theodore Tso <tytso@....edu>,
	Andreas Dilger <adilger@...ger.ca>
Subject: Re: [BUG] aborted ext4 leads to inifinity loop in balance_dirty_pages

On Mon, 7 Nov 2011 18:29:39 +0100, Jan Kara <jack@...e.cz> wrote:
> On Mon 07-11-11 12:00:41, Dmitry Monakhov wrote:
> > On Fri, 28 Oct 2011 14:34:31 +0900, Kazuya Mio <k-mio@...jp.nec.com> wrote:
> > > 2011/10/25 22:40, Jan Kara wrote:
> > > >   Please no. Generally this boils down to what do we do with dirty data
> > > > when there's error in writing them out. Currently we just throw them away
> > > > (e.g. in media error case) but I don't think that's a generally good thing
> > > > because e.g. admin may want to copy the data to other working storage or
> > > > so. So I think we should rather keep the data and provide a mechanism for
> > > > userspace to ask kernel to get rid of the data (so that we don't eventually
> > > > run OOM).
> > > 
> > > I see. I agree with you.
> > > 
> > > >> Do you have any ideas?
> > > >   So the question is what would you like to achieve. If you just want to
> > > > unblock a thread then a solution would be to make a thread at
> > > > balance_dirty_pages() killable. If generally you want to get rid of dirty
> > > > memory, then I don't have a really good answer but throwing dirty data away
> > > > seems like a bad answer to me.
> > > 
> > > The problem is that we cannot unmount the corrupted filesystem due to
> > > un-killable dd process. We must bring down the system to resume the service
> > > with no dirty pages. I think it is important for the service continuity
> > > to be able to kill the thread handling in balance_dirty_pages().
> > In fact you are very lucky because dd is just deadlocked, in many cases
> > journal abort result in BUG_ON triggering(if IO load is high enough).
>   Can you provide the exact kernel message? I'd be interested...
Several times i've failed in journal_stop() here:
int jbd2_journal_stop(handle_t *handle)
{
        transaction_t *transaction = handle->h_transaction;
        journal_t *journal = transaction->t_journal;
        int err, wait_for_commit = 0;
        tid_t tid;
        pid_t pid;

        J_ASSERT(journal_current_handle() == handle);

        if (is_handle_aborted(handle))
                err = -EIO;
        else {
                J_ASSERT(atomic_read(&transaction->t_updates) > 0);
##FAILED HERE ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                err = 0;
        }


> 
> > This is because transaction abort check is racy. Right now i've no good
> > fix which has reasonable performance. My latest idea is to protect
> > transaction abort check via SRCU.
>   Yeah, the code does not seem to care about races too much but I don't see
> which BUG_ON would be triggered...
> 
> 								Honza
> -- 
> Jan Kara <jack@...e.cz>
> SUSE Labs, CR
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html