lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Date:	Thu, 14 Apr 2011 14:16:55 +0200
From:	Martin Steigerwald <ms@...mix.de>
To:	linux-kernel@...r.kernel.org
Cc:	linux-mm@...r.kernel.org, Mega Maddin <maddin@...amaddin.de>
Subject: Understanding buffers / buffer cache

Please keep either linux-kernel or my address as cc, as I am only subscribed 
to linux-kernel, not linux-mm.


Hi!

In this weeks Linux performance analysis and tuning course that I hold there 
have been detailed questions about what the Linux kernel uses the memory for 
that free displays under "buffers".

I know as much:

- it is for buffers that have to be written to disk at some time (opposed to 
caches which are for reads)

- it is somewhat related to pdflush / flush-major:minor threads, XFS doesn't use 
these (but uses xfsbufd / xfsyncd) instead

- observation is, that it doesn't increase much on a simple dd, but does 
increase much more on a tar -xf linux-x.y.tar.gz (after a echo 3 > 
/proc/sys/vm/drop_caches)

- the data to be written via dd instead displays with Dirty: and then 
Writeback and /proc/meminfo


Thus I thought buffers were mainly related to metadata stuff.


But one course member (on cc) digged into the kernel source and found it with:

- fs/block_dev.c:

- long nr_blockdev_pages(void)
{
        struct block_device *bdev;
        long ret = 0;
        spin_lock(&bdev_lock);
        list_for_each_entry(bdev, &all_bdevs, bd_list) {
                ret += bdev->bd_inode->i_mapping->nrpages;
        }
        spin_unlock(&bdev_lock);
        return ret;
}

- include/fs.h:

struct block_device {
        dev_t                   bd_dev;  /* not a kdev_t - it's a search key 
*/
        struct inode *          bd_inode;       /* will die */

[...]

struct inode {
        /* RCU path lookup touches following: */
[...]
        struct address_space    *i_mapping;


- And then this in lots of places:

martin@...mbhala:~/Computer/Shambhala/Kernel/2.6.38/linux-2.6.38.y> find -name 
"*.c" -or -name "*.h" | xargs grep i_mapping
./include/linux/fs.h:   struct address_space    *i_mapping;
./include/linux/fs.h:           invalidate_mapping_pages(inode->i_mapping, 0, 
-1);
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./include/trace/events/ext4.h:          __entry->writeback_index = inode-
>i_mapping->writeback_index;
./kernel/cgroup.c:              inode->i_mapping->backing_dev_info = 
&cgroup_backing_dev_info;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->local_store = 
inode->i_mapping;
./arch/powerpc/platforms/cell/spufs/file.c:             ctx->cntl = inode-
>i_mapping;
[...]
./arch/tile/kernel/smp.c:static unsigned long __iomem *ipi_mappings[NR_CPUS];
./arch/tile/kernel/smp.c:               ipi_mappings[cpu] = 
ioremap_prot(offset, PAGE_SIZE, pte);
./arch/tile/kernel/smp.c:       ((unsigned long __force *)ipi_mappings[cpu])
[IRQ_RESCHEDULE] = 0;
[...]

including various filesystems where it seems to be used related to metadata 
*and* file I/O as well as "journal" / cow I/O. For example:

./fs/btrfs/inode.c:             page = find_get_page(inode->i_mapping,
./fs/btrfs/inode.c:                                        inode->i_mapping, 
start,
./fs/btrfs/inode.c:             inode->i_mapping->a_ops = &btrfs_aops;
./fs/btrfs/inode.c:             inode->i_mapping->backing_dev_info = &root-
>fs_info->bdi;
[...]
./fs/btrfs/ordered-data.c:          !mapping_tagged(inode->i_mapping, 
PAGECACHE_TAG_DIRTY)) {
./fs/btrfs/ordered-data.c:                              filemap_flush(inode-
>i_mapping);
./fs/btrfs/ordered-data.c:              filemap_fdatawrite_range(inode-
>i_mapping, start, end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawrite_range(inode->i_mapping, 
start, orig_end);
./fs/btrfs/ordered-data.c:      filemap_fdatawait_range(inode->i_mapping, 
start, orig_end);
[...]
./fs/btrfs/file.c:              pages[i] = grab_cache_page(inode->i_mapping, 
index + i);
./fs/btrfs/file.c:      current->backing_dev_info = inode->i_mapping-
>backing_dev_info;
./fs/btrfs/file.c:                              filemap_fdatawrite_range(inode-
>i_mapping, pos,
./fs/btrfs/file.c:                                                      inode-
>i_mapping,
./fs/btrfs/file.c:                      invalidate_mapping_pages(inode-
>i_mapping,
./fs/btrfs/file.c:                      filemap_flush(inode->i_mapping);




So what exactly are buffers used for? Is there any up-to-date and detailed 
documentation or howto or explaination available? Most hits I found on search 
engine are either quite short and vague or relate to really old kernel 
versions.

Is there any detailed explaination available on how - as in which steps - the 
Linux kernel writes certain kinds of data like

- inode / metadata traffic
- dirty pages (ok, via pdlush / flush, as long as one process doesn't overuse 
it)
- I/O from processes by using system functions like write()
- direct i/o

Or do you have any hints on what source files to read in order to understand 
more regarding these questions?

Thanks,
-- 
Martin Steigerwald - team(ix) GmbH - http://www.teamix.de
gpg: 19E3 8D42 896F D004 08AC A0CA 1E10 C593 0399 AE90

Download attachment "signature.asc " of type "application/pgp-signature" (199 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ