lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <84c89ac10910180933p3ddb9947ye464a19ba29e4ccc@mail.gmail.com>
Date:	Sun, 18 Oct 2009 22:03:42 +0530
From:	Viji V Nair <viji@...oraproject.org>
To:	Eric Sandeen <sandeen@...hat.com>
Cc:	Theodore Tso <tytso@....edu>, ext3-users@...hat.com,
	linux-ext4@...r.kernel.org
Subject: Re: optimising filesystem for many small files

On Sun, Oct 18, 2009 at 9:04 PM, Eric Sandeen <sandeen@...hat.com> wrote:
> Viji V Nair wrote:
>>
>> On Sun, Oct 18, 2009 at 3:56 AM, Theodore Tso <tytso@....edu> wrote:
>>>
>>> On Sat, Oct 17, 2009 at 11:26:04PM +0530, Viji V Nair wrote:
>>>>
>>>> these files are not in a single directory, this is a pyramid
>>>> structure. There are total 15 pyramids and coming down from top to
>>>> bottom the sub directories and files  are multiplied by a factor of 4.
>>>>
>>>> The IO is scattered all over!!!! and this is a single disk file system.
>>>>
>>>> Since the python application is creating files, it is creating
>>>> multiple files to multiple sub directories at a time.
>>>
>>> What is the application trying to do, at a high level?  Sometimes it's
>>> not possible to optimize a filesystem against a badly designed
>>> application.  :-(
>>
>> The application is reading the gis data from a data source and
>> plotting the map tiles (256x256, png images) for different zoom
>> levels. The tree output of the first zoom level is as follows
>>
>> /tiles/00
>> `-- 000
>>    `-- 000
>>        |-- 000
>>        |   `-- 000
>>        |       `-- 000
>>        |           |-- 000.png
>>        |           `-- 001.png
>>        |-- 001
>>        |   `-- 000
>>        |       `-- 000
>>        |           |-- 000.png
>>        |           `-- 001.png
>>        `-- 002
>>            `-- 000
>>                `-- 000
>>                    |-- 000.png
>>                    `-- 001.png
>>
>> in each zoom level the fourth level directories are multiplied by a
>> factor of four. Also the number of png images are multiplied by the
>> same number.
>>>
>>> It sounds like it is generating files distributed in subdirectories in
>>> a completely random order.  How are the files going to be read
>>> afterwards?  In the order they were created, or some other order
>>> different from the order in which they were read?
>>
>> The application which we are using are modified versions of mapnik and
>> tilecache, these are single threaded so we are running 4 process at a
>> time. We can say only four images are created at a single point of
>> time. Some times a single image is taking around 20 sec to create. I
>> can see lots of system resources are free, memory, processors etc
>> (these are 4G, 2 x 5420 XEON)
>>
>> I have checked the delay in the backend data source, it is on a 12Gbps
>> LAN and no delay at all.
>
> The delays are almost certainly due to the drive heads seeking like mad as
> they attempt to write data all over the disk; most filesystems are designed
> so that files in subdirectories are kept together, and new subdirectories
> are placed at relatively distant locations to make room for the files they
> will contain.
>
> In the past I've seen similar applications also slow down due to new inode
> searching heuristics in the inode allocator, but that was on ext3 and ext4
> is significantly different in that regard...
>
>> These images are also read in the same manner.
>>
>>> With a sufficiently bad access patterns, there may not be a lot you
>>> can do, other than (a) throw hardware at the problem, or (b) fix or
>>> redesign the application to be more intelligent (if possible).
>>>
>>>                                                   - Ted
>>>
>>
>> The file system is crated with "-i 1024 -b 1024" for larger inode
>> number, 50% of the total images are less than 10KB. I have disabled
>> access time and given a large value to the commit also. Do you have
>> any other recommendation of the file system creation?
>
> I think you'd do better to change, if possible, how the application behaves.
>
> I probably don't know enough about the app but rather than:
>
> /tiles/00
> `-- 000
>    `-- 000
>        |-- 000
>        |   `-- 000
>        |       `-- 000
>        |           |-- 000.png
>        |           `-- 001.png
>
> could it do:
>
> /tiles/00/000000000000000000.png
> /tiles/00/000000000000000001.png
>
> ...
>
> for example?  (or something similar)
>
> -Eric

The tilecache application is creating these directory structure, we
need to change it and our application for a new directory tree.

>
>> Viji
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ