lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 18 Oct 2009 15:01:46 +0530
From:	Viji V Nair <viji@...oraproject.org>
To:	Theodore Tso <tytso@....edu>
Cc:	Eric Sandeen <sandeen@...hat.com>, ext3-users@...hat.com,
	linux-ext4@...r.kernel.org
Subject: Re: optimising filesystem for many small files

On Sun, Oct 18, 2009 at 3:56 AM, Theodore Tso <tytso@....edu> wrote:
> On Sat, Oct 17, 2009 at 11:26:04PM +0530, Viji V Nair wrote:
>> these files are not in a single directory, this is a pyramid
>> structure. There are total 15 pyramids and coming down from top to
>> bottom the sub directories and files  are multiplied by a factor of 4.
>>
>> The IO is scattered all over!!!! and this is a single disk file system.
>>
>> Since the python application is creating files, it is creating
>> multiple files to multiple sub directories at a time.
>
> What is the application trying to do, at a high level?  Sometimes it's
> not possible to optimize a filesystem against a badly designed
> application.  :-(

The application is reading the gis data from a data source and
plotting the map tiles (256x256, png images) for different zoom
levels. The tree output of the first zoom level is as follows

/tiles/00
`-- 000
    `-- 000
        |-- 000
        |   `-- 000
        |       `-- 000
        |           |-- 000.png
        |           `-- 001.png
        |-- 001
        |   `-- 000
        |       `-- 000
        |           |-- 000.png
        |           `-- 001.png
        `-- 002
            `-- 000
                `-- 000
                    |-- 000.png
                    `-- 001.png

in each zoom level the fourth level directories are multiplied by a
factor of four. Also the number of png images are multiplied by the
same number.
>
> It sounds like it is generating files distributed in subdirectories in
> a completely random order.  How are the files going to be read
> afterwards?  In the order they were created, or some other order
> different from the order in which they were read?

The application which we are using are modified versions of mapnik and
tilecache, these are single threaded so we are running 4 process at a
time. We can say only four images are created at a single point of
time. Some times a single image is taking around 20 sec to create. I
can see lots of system resources are free, memory, processors etc
(these are 4G, 2 x 5420 XEON)

I have checked the delay in the backend data source, it is on a 12Gbps
LAN and no delay at all.

These images are also read in the same manner.

>
> With a sufficiently bad access patterns, there may not be a lot you
> can do, other than (a) throw hardware at the problem, or (b) fix or
> redesign the application to be more intelligent (if possible).
>
>                                                    - Ted
>

The file system is crated with "-i 1024 -b 1024" for larger inode
number, 50% of the total images are less than 10KB. I have disabled
access time and given a large value to the commit also. Do you have
any other recommendation of the file system creation?

Viji
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ