linux-ext4 - Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cfeade24-81fc-ab73-1fd9-89f12a402486@kernel.dk>
Date:   Sat, 15 Apr 2023 07:15:49 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     "Darrick J. Wong" <djwong@...nel.org>,
        Christoph Hellwig <hch@...radead.org>
Cc:     Miklos Szeredi <miklos@...redi.hu>,
        Bernd Schubert <bschubert@....com>, io-uring@...r.kernel.org,
        linux-ext4@...r.kernel.org, linux-fsdevel@...r.kernel.org,
        linux-xfs@...r.kernel.org, dsingh@....com
Subject: Re: [PATCH 1/2] fs: add FMODE_DIO_PARALLEL_WRITE flag

On 4/14/23 9:36?AM, Darrick J. Wong wrote:
> On Thu, Apr 13, 2023 at 10:11:28PM -0700, Christoph Hellwig wrote:
>> On Thu, Apr 13, 2023 at 09:40:29AM +0200, Miklos Szeredi wrote:
>>> fuse_direct_write_iter():
>>>
>>> bool exclusive_lock =
>>>     !(ff->open_flags & FOPEN_PARALLEL_DIRECT_WRITES) ||
>>>     iocb->ki_flags & IOCB_APPEND ||
>>>     fuse_direct_write_extending_i_size(iocb, from);
>>>
>>> If the write is size extending, then it will take the lock exclusive.
>>> OTOH, I guess that it would be unusual for lots of  size extending
>>> writes to be done in parallel.
>>>
>>> What would be the effect of giving the  FMODE_DIO_PARALLEL_WRITE hint
>>> and then still serializing the writes?
>>
>> I have no idea how this flags work, but XFS also takes i_rwsem
>> exclusively for appends, when the positions and size aren't aligned to
>> the block size, and a few other cases.
> 
> IIUC uring wants to avoid the situation where someone sends 300 writes
> to the same file, all of which end up in background workers, and all of
> which then contend on exclusive i_rwsem.  Hence it has some hashing
> scheme that executes io requests serially if they hash to the same value
> (which iirc is the inode number?) to prevent resource waste.
> 
> This flag turns off that hashing behavior on the assumption that each of
> those 300 writes won't serialize on the other 299 writes, hence it's ok
> to start up 300 workers.
> 
> (apologies for precoffee garbled response)

Yep, that is pretty much it. If all writes to that inode are serialized
by a lock on the fs side, then we'll get a lot of contention on that
mutex. And since, originally, nothing supported async writes, everything
would get punted to the io-wq workers. io_uring added per-inode hashing
for this, so that any punt to io-wq of a write would get serialized.

IOW, it's an efficiency thing, not a correctness thing.

-- 
Jens Axboe