lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 15 Feb 2010 16:58:33 +0100
From:	Jan Kara <jack@...e.cz>
To:	Jan Engelhardt <jengelh@...ozas.de>
Cc:	Jan Kara <jack@...e.cz>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, stable@...nel.org,
	gregkh@...e.de
Subject: Re: [PATCH] writeback: Fix broken sync writeback

On Mon 15-02-10 16:41:17, Jan Engelhardt wrote:
> 
> On Monday 2010-02-15 15:49, Jan Kara wrote:
> >On Sat 13-02-10 13:58:19, Jan Engelhardt wrote:
> >> >> 
> >> >> This fixes it by using the passed in page writeback count, instead of
> >> >> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
> >> >> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
> >> >> finish properly even when new pages are being dirted.
>
> >> It seems so. Jens, Jan Kara, your patch does not entirely fix this.
> >> While there is no sync/fsync to be seen in these traces, I can
> >> tell there's a livelock, without Dirty decreasing at all.
> >
> >  I don't think this is directly connected with my / Jens' patch.
> 
> I start to think so too.
> 
> >Similar traces happen even without the patch (see e.g.
> >http://bugzilla.kernel.org/show_bug.cgi?id=14830). But maybe the patch
> >makes it worse... So are you able to reproduce these warnings and
> >without the patch they did not happen?
> 
> Your patch speeds up the slow sync; without the patch, there was
> no real chance to observ the hard lockup, as the slow sync would
> take up all time.
> 
> So far, no reproduction. It seems to be just as you say.
> 
> >  Where in the code is jbd2_journal_commit_transaction+0x218/0x15e0?
> 
> 0000000000569554 <jbd2_journal_commit_transaction>:
>   56976c:       40 04 ee 62     call  6a50f4 <schedule>
> 
> Since there is an obvious schedule() call in jbd2_journal_commit_transaction's
> C code, I think that's where it is.
  OK. Thanks. It seems some process is spending excessive time with a
transaction open (jbd2_journal_commit_transaction waits for all handles of
a transaction to be dropped). If you see the traces again, try to obtain
stack traces of all the other processes and maybe we can catch the process
and see whether it's doing something unexpected.
  The patch can have an influence on this because we now pass larger
nr_to_write to ext4_writepages so maybe that makes some corner case more
likely.

								Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ