linux-kernel - Re: [PATCH] Memory management livelock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0810030716310.4875@hs20-bc2-1.build.redhat.com>
Date:	Fri, 3 Oct 2008 07:26:07 -0400 (EDT)
From:	Mikulas Patocka <mpatocka@...hat.com>
To:	Nick Piggin <nickpiggin@...oo.com.au>
cc:	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, linux-mm@...r.kernel.org,
	agk@...hat.com, mbroz@...hat.com, chris@...chsys.com
Subject: Re: [PATCH] Memory management livelock

> > *What* is, forever? Data integrity syncs should have pages operated on
> > in-order, until we get to the end of the range. Circular writeback could
> > go through again, possibly, but no more than once.
> 
> OK, I have been able to reproduce it somewhat. It is not a livelock,
> but what is happening is that direct IO read basically does an fsync
> on the file before performing the IO. The fsync gets stuck behind the
> dd that is dirtying the pages, and ends up following behind it and
> doing all its IO for it.
> 
> The following patch avoids the issue for direct IO, by using the range
> syncs rather than trying to sync the whole file.
> 
> The underlying problem I guess is unchanged. Is it really a problem,
> though? The way I'd love to solve it is actually by adding another bit
> or two to the pagecache radix tree,  that can be used to transiently tag
> the tree for future operations. That way we could record the dirty and
> writeback pages up front, and then only bother with operating on them.
> 
> That's *if* it really is a problem. I don't have much pity for someone
> doing buffered IO and direct IO to the same pages of the same file :)

LVM does (that is where the bug was discovered). Basically, it scans all 
the block devices with direct IO and if someone else does buffered IO on 
any device simultaneously, it locks up.

That fsync-vs-write livelock is quite improbably (why would some 
application do it?) --- although it could be used as a DoS --- getting 
unkillable process.

But there is another possible real-world problem --- sync-vs-write --- 
i.e. admin plugs in two disks and copies data from one to the other. 
Meanwhile, some unrelated server process executes sync(). The server goes 
into coma until the copy finishes.

Mikulas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/