lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 24 Mar 2013 06:12:41 +0100 (CET)
From:	Fredrik Tolf <fredrik@...da2000.com>
To:	linux-kernel@...r.kernel.org
Subject: I/O blocked while dirty pages are being flushed

Dear list,

I've got an mmapped file (a Berkeley DB region file) with an access 
pattern such that it gets some 10-40 MBs of dirtied pages a couple of 
times per minute. When the VM comes around to flush these pages to disk, 
that causes loads of problems. Since the dirty pages are rather 
interspersed in the file, the flusher posts batches of some 3000-5000 
write requests to the disk queue, and since I'm using normal hard drives, 
this might sometimes take 10-30 seconds to complete.

While this flush is running, I find that many a process goes into disk 
sleep waiting for the flush to complete. This includes the process 
manipulating the mmapped file whenever it tries to redirty a page 
currently waiting to be flushed, but also, for instance, programs that 
write() to log files (since, I guess, the buffer page backing the last 
written portion of the log file is being flushed). The common culprits, 
then, are sleep_on_page and sleep_on_buffer. All these processes commonly 
block for up to several tens of seconds, then, which gets me all kind of 
trouble, as I'm sure you can see.

I'd like to hear your opinion on this case. Is Berkeley DB at fault for 
causing these kinds of access patterns? Is the kernel at fault for 
blocking all these processes needlessly? Is the hardware at fault for 
being so hopelessly slow and I should get with the times and get me some 
SSDs? Or am I at fault for not finding the obvious configuration settings 
to avoid the problem? :)

I'm inclined to think that the kernel is at fault for blocking the 
processes needlessly. If the contents of the pages being flushed need to 
be preserved until the write is completed, shouldn't they be copied when 
written to, rather than blocking the writer for who-knows-how-long? It 
seems that if the kernel doesn't do this, then I'm always put at the mercy 
of the hardware, and as long as I have free memory, I shouldn't have to 
be.

However, I could also see that Berkeley DB is somehow at fault for this 
kind of access, causing such massive disk writes, and that perhaps it 
should be using SysV SHM regions or such instead of disk-backed files? 
Would it be possible, perhaps, to get these files treated more like 
anonymous memory, their contents not being flushed back to disk unless 
necessary?

It is worth noting, also, that this seems to be a situation introduced 
somewhere between 2.6.26 and 2.6.32, because I started noticing it when I 
upgraded from Debian 5.0 to 6.0. I've since tried it on 3.2.0, 3.5.4 and 
3.7.1, and it appears in every version. However, I can't easily go back 
and bisect, because the new init scripts don't support kernels older than 
2.6.32, unfortunately.

I'm sorry, also, if this is the completely wrong list for such 
discussions, but I couldn't find another one to match better.

Thanks for reading my wall of text!

--

Fredrik Tolf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ