[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8737odw5xp.fsf@gooddata.com>
Date: Thu, 16 Jun 2016 16:42:58 +0200
From: Nikola Pajkovsky <nikola.pajkovsky@...ddata.com>
To: Jan Kara <jack@...e.cz>
Cc: Holger Hoffstätte
<holger@...lied-asynchrony.com>, linux-ext4@...r.kernel.org,
Jan Kara <jack@...e.com>
Subject: Re: xfstests generic/130 hang with non-4k block size ext4 on 4.7-rc1 kernel
Jan Kara <jack@...e.cz> writes:
> On Fri 10-06-16 07:52:56, Nikola Pajkovsky wrote:
>> Jan Kara <jack@...e.cz> writes:
>> > On Thu 09-06-16 09:23:29, Nikola Pajkovsky wrote:
>> >> Holger Hoffstätte <holger@...lied-asynchrony.com> writes:
>> >>
>> >> > On Wed, 08 Jun 2016 14:56:31 +0200, Jan Kara wrote:
>> >> > (snip)
>> >> >> Attached patch fixes the issue for me. I'll submit it once a full xfstests
>> >> >> run finishes for it (which may take a while as our server room is currently
>> >> >> moving to a different place).
>> >> >>
>> >> >> Honza
>> >> >> --
>> >> >> Jan Kara <jack@...e.com>
>> >> >> SUSE Labs, CR
>> >> >> From 3a120841a5d9a6c42bf196389467e9e663cf1cf8 Mon Sep 17 00:00:00 2001
>> >> >> From: Jan Kara <jack@...e.cz>
>> >> >> Date: Wed, 8 Jun 2016 10:01:45 +0200
>> >> >> Subject: [PATCH] ext4: Fix deadlock during page writeback
>> >> >>
>> >> >> Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a
>> >> >> deadlock in ext4_writepages() which was previously much harder to hit.
>> >> >> After this commit xfstest generic/130 reproduces the deadlock on small
>> >> >> filesystems.
>> >> >
>> >> > Since you marked this for -stable, just a heads-up that the previous patch
>> >> > for the data exposure was rejected from -stable (see [1]) because it
>> >> > has the mismatching "!IS_NOQUOTA(inode) &&" line, which didn't exist
>> >> > until 4.6. I removed it locally but Greg probably wants an official patch.
>> >> >
>> >> > So both this and the previous patch need to be submitted.
>> >> >
>> >> > [1] http://permalink.gmane.org/gmane.linux.kernel.stable/18074{4,5,6}
>> >>
>> >> I'm just wondering if the Jan's patch is not related to blocked
>> >> processes in following trace. It very hard to hit it and I don't have
>> >> any reproducer.
>> >
>> > This looks like a different issue. Does the machine recover itself or is it
>> > a hard hang and you have to press a reset button?
>>
>> The machine is bit bigger than I have pretend. It's 18 vcpu with 160 GB
>> ram and machine has dedicated mount point only for PostgreSQL data.
>>
>> Nevertheless, I was able always to ssh to the machine, so machine itself
>> was not in hard hang and ext4 mostly gets recover by itself (it took
>> 30min). But I have seen situation, were every process who 'touch' the ext4
>> goes immediately to D state and does not recover even after hour.
>
> If such situation happens, can you run 'echo w >/proc/sysrq-trigger' to
> dump stuck processes and also run 'iostat -x 1' for a while to see how much
> IO is happening in the system? That should tell us more.
Link to 'echo w >/proc/sysrq-trigger' is here, because it's bit bigger
to mail it.
http://expirebox.com/download/68c26e396feb8c9abb0485f857ccea3a.html
I was running iotop and there was traffic roughly ~20 KB/s write.
What was bit more interesting, was looking at
cat /proc/vmstat | egrep "nr_dirty|nr_writeback"
nr_drity had around 240 and was slowly counting up, but nr_writeback had
~8800 and was stuck for 120s.
--
Nikola
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists