linux-kernel - Re: [PATCH v3 2/2] zonefs: Add documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN8PR04MB5812F5201C29A24CAB8C093DE7280@BN8PR04MB5812.namprd04.prod.outlook.com>
Date:   Wed, 25 Dec 2019 02:57:58 +0000
From:   Damien Le Moal <Damien.LeMoal@....com>
To:     Randy Dunlap <rdunlap@...radead.org>,
        "linux-fsdevel@...r.kernel.org" <linux-fsdevel@...r.kernel.org>,
        "linux-xfs@...r.kernel.org" <linux-xfs@...r.kernel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>
CC:     Johannes Thumshirn <jth@...nel.org>,
        Naohiro Aota <Naohiro.Aota@....com>,
        "Darrick J . Wong" <darrick.wong@...cle.com>,
        Hannes Reinecke <hare@...e.de>
Subject: Re: [PATCH v3 2/2] zonefs: Add documentation

Randy,

On 2019/12/25 10:33, Randy Dunlap wrote:
[...]
>> +Sequential zone files can only be written sequentially, starting from the file
>> +end, that is, write operations can only be append writes. Zonefs makes no
>> +attempt at accepting random writes and will fail any write request that has a
>> +start offset not corresponding to the end of the last issued write.
>> +
>> +In order to give guarantees regarding write ordering, zonefs also prevents
>> +buffered writes and mmap writes for sequential files. Only direct IO writes are
>> +accepted. There are no restrictions on read operations nor on the type of IO
>> +used to request reads (buffered IOs, direct IOs and mmap reads are all
>> +accepted).
>> +
>> +Truncating sequential zone files is allowed only down to 0, in wich case, the
> 
>                                                                   which
> 
>> +zone is reset to rewind the file zone write pointer position to the start of
>> +the zone, or up to the zone size, in which case the file's zone is transitioned
>> +to the FULL state (finish zone operation).
> 
> Just to clarify, truncate can be done to zero or the the zone size, but nothing else.
> Is that correct?

Yes, that is correct. That matches the drive processing of the
REQ_OP_ZONE_RESET and REQ_OP_ZONE_FINISH requests which respectively
reset the zone (file size down to 0) and transition the zone to full
state (file size becomes zone size).

[...]
>> +# dd if=/dev/zero of=/mnt/seq/0 bs=4096 count=1 conv=notrunc oflag=direct
>> +1+0 records in
>> +1+0 records out
>> +4096 bytes (4.1 kB, 4.0 KiB) copied, 1.05112 s, 3.9 kB/s
> 
> why so slow?

Indeed, that is really slow. I missed it :)
The SMR drive I used for running this was probably in low power mode
when I ran dd and needed waking up first, hence the slow response time.
Running the same again, I get:

dd if=/dev/zero of=/mnt/seq/0 count=1 bs=4096 oflag=direct conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.000482601 s, 8.5 MB/s

0.5ms for a 4K direct write on an HDD, that looks OK to me (write cache
is enabled on the HDD side).

The same on a zoned null_blk device gives:

dd if=/dev/zero of=/mnt/seq/0 count=1 bs=4096 oflag=direct conv=notrunc
1+0 records in
1+0 records out
4096 bytes (4.1 kB, 4.0 KiB) copied, 0.00017558 s, 23.3 MB/s

175us for a single 4K direct write. Looks OK too.

Thank you for all the typo & nit pointers. I will fix everything and
post a v4.

-- 
Damien Le Moal
Western Digital Research