[<prev] [next>] [day] [month] [year] [list]
Message-Id: <201104141417.10748.ms@teamix.de>
Date: Thu, 14 Apr 2011 14:16:55 +0200
From: Martin Steigerwald <ms@...mix.de>
To: linux-kernel@...r.kernel.org
Cc: linux-mm@...r.kernel.org, Mega Maddin <maddin@...amaddin.de>
Subject: Understanding buffers / buffer cache
Please keep either linux-kernel or my address as cc, as I am only subscribed
to linux-kernel, not linux-mm.
Hi!
In this weeks Linux performance analysis and tuning course that I hold there
have been detailed questions about what the Linux kernel uses the memory for
that free displays under "buffers".
I know as much:
- it is for buffers that have to be written to disk at some time (opposed to
caches which are for reads)
- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use
these (but uses xfsbufd / xfsyncd) instead
- observation is, that it doesn't increase much on a simple dd, but does
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 >
/proc/sys/vm/drop_caches)
- the data to be written via dd instead displays with Dirty: and then
Writeback and /proc/meminfo
Thus I thought buffers were mainly related to metadata stuff.
But one course member (on cc) digged into the kernel source and found it with:
- fs/block_dev.c:
- long nr_blockdev_pages(void)
{
struct block_device *bdev;
long ret = 0;
spin_lock(&bdev_lock);
list_for_each_entry(bdev, &all_bdevs, bd_list) {
ret += bdev->bd_inode->i_mapping->nrpages;
}
spin_unlock(&bdev_lock);
return ret;
}
- include/fs.h:
struct block_device {
dev_t bd_dev; /* not a kdev_t - it's a search key
*/
struct inode * bd_inode; /* will die */
[...]
struct inode {
/* RCU path lookup touches following: */
[...]
struct address_space *i_mapping;
- And then this in lots of places:
martin@...mbhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h: struct address_space *i_mapping;
./include/linux/fs.h: invalidate_mapping_pages(inode->i_mapping, 0,
-1);
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h: __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c: inode->i_mapping->backing_dev_info =
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->local_store =
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c: ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c: ipi_mappings[cpu] =
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c: ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]
including various filesystems where it seems to be used related to metadata
*and* file I/O as well as "journal" / cow I/O. For example:
./fs/btrfs/inode.c: page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c: inode->i_mapping,
start,
./fs/btrfs/inode.c: inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c: inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c: !mapping_tagged(inode->i_mapping,
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c: filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawrite_range(inode->i_mapping,
start, orig_end);
./fs/btrfs/ordered-data.c: filemap_fdatawait_range(inode->i_mapping,
start, orig_end);
[...]
./fs/btrfs/file.c: pages[i] = grab_cache_page(inode->i_mapping,
index + i);
./fs/btrfs/file.c: current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c: filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c: inode-
>i_mapping,
./fs/btrfs/file.c: invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c: filemap_flush(inode->i_mapping);
So what exactly are buffers used for? Is there any up-to-date and detailed
documentation or howto or explaination available? Most hits I found on search
engine are either quite short and vague or relate to really old kernel
versions.
Is there any detailed explaination available on how - as in which steps - the
Linux kernel writes certain kinds of data like
- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse
it)
- I/O from processes by using system functions like write()
- direct i/o
Or do you have any hints on what source files to read in order to understand
more regarding these questions?
Thanks,
--
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90
Download attachment "signature.asc " of type "application/pgp-signature" (199 bytes)
Powered by blists - more mailing lists