lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 29 Mar 2007 11:27:45 +0200
From:	Jan Kara <jack@...e.cz>
To:	Xin Zhao <uszhaoxin@...il.com>
Cc:	Dave Kleikamp <shaggy@...ux.vnet.ibm.com>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>
Subject: Re: Linux page cache issue?

  Hello,

> Now I want to explain the problem that leads me to explore the Linux
> disk cache management.  This is actually from my project. In a file
> system I am working on, two files may have different inodes, but share
> the same data blocks. Of course additional block-level reference
> counting and copy-on-write mechanisms are needed to prevent operations
> on one file from disrupting the other file. But the point is, the two
> files share the same data blocks.
> 
> I hope that consequential reads to the two files can benefit from disk
> cache, since they have the same data blocks. But I noticed that Linux
> splits disk buffer cache into many small parts and associate a file's
> data with its mapping object. Linux determines whether a data page is
> cached or not by lookup the file's mapping radix tree. So this is a
> per-file radix tree. This design obviously makes each tree smaller and
> faster to look up. But this design eliminates the possibility of
> sharing disk cache across two files. For example, if a process reads
> file 2 right after file 1 (both file 1 and 2 share the same data block
> set). Even if the data blocks are already loaded in memory, but they
> can only be located via file 1's mapping object. When Linux reads file
> 2, it still think the data is not present in memory.  So the process
> still needs to load the data from disk again.
  Actually, there is one inode - the device inode - whose mapping can
contain all the blocks of the filesystem. That is basically the radix
tree you are looking for. ext3 for example uses it for accessing its
metadata (indirect blocks etc.). But you have to be really careful to
avoid aliasing issues and such when you'd like to map copies of those
pages into mappings of several different inodes (BTW ext3cow filesystem
may be interesting for you www.ext3cow.com).

								Honza

> On 3/28/07, Dave Kleikamp <shaggy@...ux.vnet.ibm.com> wrote:
> >On Wed, 2007-03-28 at 02:45 -0400, Xin Zhao wrote:
> >> Hi,
> >>
> >> If a Linux process opens and reads a file A, then it closes the file.
> >> Will Linux keep the file A's data in cache for a while in case another
> >> process opens and reads the same in a short time? I think that is what
> >> I heard before.
> >
> >Yes.
> >
> >> But after I digged into the kernel code, I am confused.
> >>
> >> When a process closes the file A, iput() will be called, which in turn
> >> calls the follows two functions:
> >> iput_final()->generic_drop_inode()
> >
> >A comment from the top of fs/dcache.c:
> >
> >/*
> > * Notes on the allocation strategy:
> > *
> > * The dcache is a master of the icache - whenever a dcache entry
> > * exists, the inode will always exist. "iput()" is done either when
> > * the dcache entry is deleted or garbage collected.
> > */
> >
> >Basically, as long a a dentry is present, iput_final won't be called on
> >the inode.
> >
> >> But from the following calling chain, we can see that file close will
> >> eventually lead to evict and free all cached pages. Actually in
> >> truncate_complete_page(), the pages will be freed.  This seems to
> >> imply that Linux has to re-read the same data from disk even if
> >> another process B read the same file right after process A closes the
> >> file. That does not make sense to me.
> >>
> >> /***calling chain ***/
> >> generic_delete_inode/generic_forget_inode()->
> >> truncate_inode_pages()->truncate_inode_pages_range()->
> >> truncate_complete_page()->remove_from_page_cache()->
> >> __remove_from_page_cache()->radix_tree_delete()
> >>
> >> Am I missing something? Can someone please provide some advise?
> >>
> >> Thanks a lot
> >> -x
> >
> >Shaggy
> >--
> >David Kleikamp
> >IBM Linux Technology Center
> >
> >
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
-- 
Jan Kara <jack@...e.cz>
SuSE CR Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ