linux-kernel - Re: aio: questions with ioctx_alloc() and large num_possible

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <965fc993-97c2-48b9-82e3-6c3444d0ffe5@linux.vnet.ibm.com>
Date:   Wed, 5 Oct 2016 14:21:27 -0300
From:   Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>
To:     Kent Overstreet <kent.overstreet@...il.com>
Cc:     linux-fsdevel@...r.kernel.org, linux-aio@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: aio: questions with ioctx_alloc() and large num_possible_cpus()

Hi Kent,

Thanks for commenting. I understood more of the code in trying to make
sense of your point, but there are some things still unclear about it;
if you could help a bit more, please.

Can you describe how a single thread might not be able to use all the
slots because 'up to about half of the reqs_available slots might
be on other percpu reqs_available' ?

I see that the thread might be scheduled on different CPUs (say, only
2 possible CPUs) and perform get_reqs_available() on both -- but that
only gives one req_batch to each CPU, and for req_batch to be half of
reqs_available its denominator needs to be 2, which doesn't happen w/
num_possible_cpus() * 4  -- which is 8.  So I'm a bit confused here.

     atomic_set(&ctx->reqs_available, ctx->nr_events - 1);
     ctx->req_batch = (ctx->nr_events - 1) / (num_possible_cpus() * 4);

On 10/05/2016 03:34 AM, Kent Overstreet wrote:
>> - why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?

> For the scheme to work - percpu allocation of slots - we have to ensure that
> there aren't too many unused slots stranded on other CPUs. The stranding is
> limited to 1/4th of the slots [snip]

By 'unused slots' you mean the slots included in the batch allocated
to a particular cpu but not actually used by a thread in that cpu?
(e.g., get_reqs_available() called once, unused_slots == req_batch - 1)

Can you please detail a bit more how the limit to 1/4th of the slots is
ensured because of "num_possible_cpus() * 4", and what is the scenario
where the math is based on?  I've been thinking and assuming values for
a while now, and didn't figure out the point where / how it occurs.

Thanks for your support,

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center