linux-kernel - Re: [PATCH] writeback: Fix broken sync writeback

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Mon, 15 Feb 2010 16:58:33 +0100
From:	Jan Kara <jack@...e.cz>
To:	Jan Engelhardt <jengelh@...ozas.de>
Cc:	Jan Kara <jack@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, stable@...nel.org,
	gregkh@...e.de
Subject: Re: [PATCH] writeback: Fix broken sync writeback

On Mon 15-02-10 16:41:17, Jan Engelhardt wrote:
> 
> On Monday 2010-02-15 15:49, Jan Kara wrote:
> >On Sat 13-02-10 13:58:19, Jan Engelhardt wrote:
> >> >> 
> >> >> This fixes it by using the passed in page writeback count, instead of
> >> >> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
> >> >> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
> >> >> finish properly even when new pages are being dirted.
>
> >> It seems so. Jens, Jan Kara, your patch does not entirely fix this.
> >> While there is no sync/fsync to be seen in these traces, I can
> >> tell there's a livelock, without Dirty decreasing at all.
> >
> >  I don't think this is directly connected with my / Jens' patch.
> 
> I start to think so too.
> 
> >Similar traces happen even without the patch (see e.g.
> >http://bugzilla.kernel.org/show_bug.cgi?id=14830). But maybe the patch
> >makes it worse... So are you able to reproduce these warnings and
> >without the patch they did not happen?
> 
> Your patch speeds up the slow sync; without the patch, there was
> no real chance to observ the hard lockup, as the slow sync would
> take up all time.
> 
> So far, no reproduction. It seems to be just as you say.
> 
> >  Where in the code is jbd2_journal_commit_transaction+0x218/0x15e0?
> 
> 0000000000569554 <jbd2_journal_commit_transaction>:
>   56976c:       40 04 ee 62     call  6a50f4 <schedule>
> 
> Since there is an obvious schedule() call in jbd2_journal_commit_transaction's
> C code, I think that's where it is.
  OK. Thanks. It seems some process is spending excessive time with a
transaction open (jbd2_journal_commit_transaction waits for all handles of
a transaction to be dropped). If you see the traces again, try to obtain
stack traces of all the other processes and maybe we can catch the process
and see whether it's doing something unexpected.
  The patch can have an influence on this because we now pass larger
nr_to_write to ext4_writepages so maybe that makes some corner case more
likely.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/