linux-kernel - Re: [PATCH RFC 0/9] nxt propagation + locking optimisation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2f974d5e-8aa4-5b12-d70e-668384b4a94c@kernel.dk>
Date:   Mon, 2 Mar 2020 07:39:08 -0700
From:   Jens Axboe <axboe@...nel.dk>
To:     Pavel Begunkov <asml.silence@...il.com>, io-uring@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC 0/9] nxt propagation + locking optimisation

On 3/1/20 1:33 PM, Pavel Begunkov wrote:
> On 01/03/2020 22:14, Jens Axboe wrote:
>> On 3/1/20 9:18 AM, Pavel Begunkov wrote:
>>> There are several independent parts in the patchset, but bundled
>>> to make a point.
>>> 1-2: random stuff, that implicitly used later.
>>> 3-5: restore @nxt propagation
>>> 6-8: optimise locking in io_worker_handle_work()
>>> 9: optimise io_uring refcounting
>>>
>>> The next propagation bits are done similarly as it was before, but
>>> - nxt stealing is now at top-level, but not hidden in handlers
>>> - ensure there is no with REQ_F_DONT_STEAL_NEXT
>>>
>>> [6-8] is the reason to dismiss the previous @nxt propagation appoach,
>>> I didn't found a good way to do the same. Even though it looked
>>> clearer and without new flag.
>>>
>>> Performance tested it with link-of-nops + IOSQE_ASYNC:
>>>
>>> link size: 100
>>> orig:  501 (ns per nop)
>>> 0-8:   446
>>> 0-9:   416
>>>
>>> link size: 10
>>> orig:  826
>>> 0-8:   776
>>> 0-9:   756
>>
>> This looks nice, I'll take a closer look tomorrow or later today. Seems
>> that at least patch 2 should go into 5.6 however, so may make sense to
>> order the series like that.
> 
> It's the first one modifying io-wq.c, so should be fine to pick from
> the middle as is.

Yep, just did.

>> BTW, Andres brought up a good point, and that's hashed file write works.
>> Generally they complete super fast (just copying into the page cache),
>> which means that that worker will be hammering the wq lock a lot. Since
>> work N+1 can't make any progress before N completes (since that's how
>> hashed work works), we should pull a bigger batch of these work items
>> instead of just one at the time. I think that'd potentially make a huge
>> difference for the performance of buffered writes.
> 
> Flashed the same thought. It should be easy enough for hashed
> requests. Though, general batching would make us to think about
> fairness, work stealing, etc.

There's only the one list anyway, so the work is doing to be processed
in order to begin with. Hence I don't think there's a lot of fairness to
be worried about here, we're just going to be processing the existing
work in the same order, but more efficiently. We should be getting both
better throughput and fairness if we remove all items hashed to the same
key for that one worker, only stopping if we encounter a non-hashed work
or work hashed to a different key. Because if we do, if any of that
hashed work ever needs to sleep, the next independent work can proceed
in a different worker.

> BTW, what's the point of hashing only heads of a link? Sounds like it
> can lead to the write-write collisions, which it tries to avoid.

Yeah, the linked items should be hashed as well, not sure why that isn't
done.

>> Just throwing it out there, since you're working in that space anyway
>> and the rewards will be much larger.
> 
> I will take a look, but not sure when, I yet have some hunches myself.

Thanks!

-- 
Jens Axboe