lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 5 Nov 2016 14:49:57 -0600
From:   Jens Axboe <axboe@...nel.dk>
To:     Ming Lei <tom.leiming@...il.com>
Cc:     Jens Axboe <axboe@...nel.dk>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        linux-block <linux-block@...r.kernel.org>,
        Christoph Hellwig <hch@....de>
Subject: Re: [PATCH 1/4] block: add scalable completion tracking of requests

On 11/04/2016 05:13 PM, Ming Lei wrote:
>>> Even though it is true, the statistics still may become a mess with rare
>>> collisons.
>>
>>
>> How so? Not saying we could not improve it, but we're trading off
>> precision for scalability. My claim is that the existing code is good
>> enough. I've run a TON of testing on it, since I've used it for multiple
>> projects, and it's been solid.
>
> +static void blk_stat_flush_batch(struct blk_rq_stat *stat)
> +{
> +       if (!stat->nr_batch)
> +               return;
> +       if (!stat->nr_samples)
> +               stat->mean = div64_s64(stat->batch, stat->nr_batch);
>
> For example, two reqs(A & B) are completed at the same time, and A is
> on CPU0, and B is on CPU1.
>
> If the two last writting in the function is reordered observed from
> CPU1, for B, CPU1 runs the above branch when it just sees stat->batch
> is set as zero, but nr_samples isn't updated yet, then div_zero is
> triggered.

We should probably just have the nr_batch be a READ_ONCE(). I'm fine
with the stats being a bit off in the rare case of a collision, but we
can't have a divide-by-zero, obviously.

>
> +       else {
> +               stat->mean = div64_s64((stat->mean * stat->nr_samples) +
> +                                       stat->batch,
> +                                       stat->nr_samples + stat->nr_batch);
> +       }
>
> BTW, the above 'if else' can be removed, and 'stat->mean' can be computed
> in the 2nd way.

True, they could be collapsed.

>> Yes, that might be a good idea, since it doesn't cost us anything. For
>> the mq case, I'm hard pressed to think of areas where we could complete
>> IO in parallel on the same software queue. You'll never have a software
>> queue mapped to multiple hardware queues. So we should essentially be
>> serialized.
>
> For blk-mq, blk_mq_stat_add() is called in __blk_mq_complete_request()
> which is often run from interrupt handler, and the CPU serving the interrupt
> can be different with the submitting CPU for rq->mq_ctx. And there can be
> several CPUs handling the interrupts originating from same sw queue.
>
> BTW, I don't object to this patch actually, but maybe we should add
> comment about this kind of race. Cause in the future someone might find
> the statistics becomes not accurate, and they may understand or
> try to improve.

I'm fine with documenting it. For the two use cases I have, I'm fine
with it not being 100% stable. For by far the majority of the windows,
it'll be just fine. I'll fix the divide-by-zero, though.

-- 
Jens Axboe

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ