linux-kernel - Re: Sync writeback still broken

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101031122437.GA6296@quack.suse.cz>
Date:	Sun, 31 Oct 2010 13:24:37 +0100
From:	Jan Kara <jack@...e.cz>
To:	Jan Engelhardt <jengelh@...ozas.de>
Cc:	Jan Kara <jack@...e.cz>, Andrew Morton <akpm@...ux-foundation.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <jens.axboe@...cle.com>,
	Linux Kernel <linux-kernel@...r.kernel.org>, stable@...nel.org,
	gregkh@...e.de
Subject: Re: Sync writeback still broken

On Mon 25-10-10 01:41:48, Jan Engelhardt wrote:
> On Sunday 2010-06-27 18:44, Jan Engelhardt wrote:
> >On Monday 2010-02-15 16:41, Jan Engelhardt wrote:
> >>On Monday 2010-02-15 15:49, Jan Kara wrote:
> >>>On Sat 13-02-10 13:58:19, Jan Engelhardt wrote:
> >>>> >> 
> >>>> >> This fixes it by using the passed in page writeback count, instead of
> >>>> >> doing MAX_WRITEBACK_PAGES batches, which gets us much better performance
> >>>> >> (Jan reports it's up from ~400KB/sec to 10MB/sec) and makes sync(1)
> >>>> >> finish properly even when new pages are being dirted.
> >>>> >
> >>>> >This seems broken.
> >>>> 
> >>>> It seems so. Jens, Jan Kara, your patch does not entirely fix this.
> >>>> While there is no sync/fsync to be seen in these traces, I can
> >>>> tell there's a livelock, without Dirty decreasing at all.
> >
> >What ultimately became of the discussion and/or the patch? 
> >
> >Your original ad-hoc patch certainly still does its job; had no need to 
> >reboot in 86 days and still counting.
> 
> I still observe this behavior on 2.6.36-rc8. This is starting to 
> get frustrating, so I will be happily following akpm's advise to 
> poke people.
  Yes, that's a good way :)

> Thread entrypoint: http://lkml.org/lkml/2010/2/12/41
> 
> Previously, many concurrent extractions of tarballs and so on have been 
> one way to trigger the issue; I now also have a rather small testcase 
> (below) that freezes the box here (which has 24G RAM, so even if I'm 
> lacking to call msync, I should be fine) sometime after memset finishes.
  I've tried your test but didn't succeed in freezing my laptop.
Everything was running smooth, the machine even felt reasonably responsive
although constantly reading and writing to disk. Also sync(1) finished in a
couple of seconds as one would expect in an optimistic case.
  Needless to say that my laptop has only 1G of ram so I had to downsize
the hash table from 16G to 1G to be able to run the test and the disk is
Intel SSD so the performance of the backing storage compared to the amount
of needed IO is much in my favor.
  OK, so I've taken a machine with standard rotational drive and 28GB of
ram and there I can see sync(1) hanging (but otherwise the machine looks
OK). Investigating further...

									Honza
-- 
Jan Kara <jack@...e.cz>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/