linux-ext4 - Re: [PATCH] libfs: Fix DIO mode aligment

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1492F08F-A8BD-4F81-B857-99D342031949@hpe.com>
Date:   Sat, 19 Dec 2020 04:31:54 +0000
From:   "Lyashkov, Alexey" <alexey.lyashkov@....com>
To:     Andreas Dilger <adilger@...ger.ca>
CC:     "Theodore Y. Ts'o" <tytso@....edu>,
        Благодаренко Артём 
        <artem.blagodarenko@...il.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] libfs: Fix DIO mode aligment



On 19/12/2020, 01:03, "Andreas Dilger" <adilger@...ger.ca> wrote:

    On Nov 19, 2020, at 5:26 AM, Lyashkov, Alexey <alexey.lyashkov@....com> wrote:
    > 
    > Tso,
    > 
    > This situation hit with modern hdd with 4k block size and e2image changed to use DIRECT IO instead of buffered.

   >  It would be useful to include this patch for e2image as part of this submission,
    > so that this can be tested.  I suspect that O_DIRECT would be useful for other
    > tools (e.g. e2fsck, debugfs, etc.) since the IO manager would avoid double
    > buffering the data in both the kernel and userspace.

debugfs have a -D option already. As about e2fsck have run in single user and several loops over FS exist.
So caching is good to have there. Don't forget - caching permits an readahead works - which is very usefull for the large filesystem open.



    > e2fsprogs tries to read a super lock on offset 1k and it caused to set FS block size to 1k and second block reading.
    > (many other places exist, but it simplest).

 >    Are there actually other places where it is doing sub-block-size reads from disk?
Many places. 

bash-3.2$ grep -rn io_channel_set_blksize * | grep SUPERBLOCK
lib/ext2fs/undo_io.c:223:	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
lib/ext2fs/undo_io.c:506:	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
lib/ext2fs/closefs.c:201:		io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET);
lib/ext2fs/openfs.c:218:		io_channel_set_blksize(fs->io, SUPERBLOCK_OFFSET);
misc/mke2fs.c:2573:	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);
misc/e2undo.c:168:	io_channel_set_blksize(channel, SUPERBLOCK_OFFSET);

and some places where set_blksize was called with other size different than block device size.
In theory we can create an FS with 1K block size, and tools should able to work with it.


>    It seems simpler to fix the superblock read at open to always read the first 4KB
>    into a buffer (and to make it easy to extend to 16KB or 64KB if sector sizes get
>    even larger), then find the superblock within the buffer to decide the blocksize.

And make it on many places including an metadata reading in case FS block size is 1k.