lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Pine.LNX.4.64.0810052320480.3074@hs20-bc2-1.build.redhat.com>
Date:	Sun, 5 Oct 2008 23:30:51 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Arjan van de Ven <arjan@...radead.org>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, agk@...hat.com, mbroz@...hat.com,
	chris@...chsys.com
Subject: Re: [PATCH 2/3] Fix fsync livelock



On Sun, 5 Oct 2008, Arjan van de Ven wrote:

> On Sun, 5 Oct 2008 20:01:46 -0400 (EDT)
> Mikulas Patocka <mpatocka@...hat.com> wrote:
> 
> > I assume that if very few people complained about the livelock till
> > now, very few people will see degraded write performance. My patch
> > blocks the writes only if the livelock happens, so if the livelock
> > doesn't happen in unpatched kernel for most people, the patch won't
> > make it worse.
> 
> I object to calling this a livelock. It's not. 

It unlocks itself when the whole disk is written, and it can be several 
hours (or days, if you have many-terabyte array). So formally it is not 
livelock, from the user experience it is --- he sees unkillable process in 
'D' state for many hours.

> And yes, fsync is slow and lots of people are seeing that.
> It's not helped by how ext3 is implemented (where fsync is effectively
> equivalent of a sync for many cases).
> But again, moving the latency to "innocent" parties is not acceptable.
> 
> > 
> > > If the fsync() implementation isn't smart enough, sure, lets improve
> > > it. But not by shifting latency around... lets make it more
> > > efficient at submitting IO.
> > > If we need to invent something like "chained IO" where if you wait
> > > on the last of the chain, you wait on the entirely chain, so be it.
> > 
> > This looks madly complicated. And ineffective, because if some page
> > was submitted before fsync() was invoked, and is under writeback
> > while fsync() is called, fsync() still has to wait on it.
> 
> so?
> just make a chain per inode always...

The point is that many fsync()s may run in parallel and you have just one 
inode and just one chain. And if you add two-word list_head to a page, to 
link it on this list, many developers will hate it for increasing its 
size.

See the work dobe by Nick Piggin somewhere in this thread. He uses just 
one bit in radix tree to mark pages to process. But he needs to serialize 
all syncs on a given file, they no longer run in parallel.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ