lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 31 Oct 2008 13:54:14 -0700
From:	Chad Talbott <>
Cc:	Andrew Morton <>, Michael Rubin <>
Subject: Metadata in sys_sync_file_range and fadvise(DONTNEED)

We are looking at adding calls to posix_fadvise(DONTNEED) to various
data logging routines.  This has two benefits:

  - frequent write-out -> shorter queues give lower latency, also disk
    is more utilized as writeout begins immediately

  - less useless stuff in page cache

One problem with fadvise() (and ext2, at least) is that associated
metadata isn't scheduled with the data.  So, for a large log file with
a high append rate, hundreds of indirect blocks are left to be written
out by periodic writeback.  This metadata consists of single blocks
spaced by 4MB, leading to spikes of very inefficient disk utilization,
deep queues and high latency.

Andrew suggests a new SYNC_FILE_RANGE_METADATA flag for
sys_sync_file_range(), and leaving posix_fadvise() alone.  That will
work for my purposes, but it seems like it leaves
posix_fadvise(DONTNEED) with a performance bug on ext2 (or any other
filesystem with interleaved data/metadata).  Andrew's argument is that
people have expectations about posix_fadvise() behavior as it's been
around for years in Linux.

I'd like to get a consensus on what The Right Thing is, so I can move
toward implementing it and moving the logging code onto that

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

Powered by blists - more mailing lists