lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20161005063435.mtw2keukyxwbwo2k@kmo-pixel>
Date:   Tue, 4 Oct 2016 22:34:36 -0800
From:   Kent Overstreet <kent.overstreet@...il.com>
To:     Mauricio Faria de Oliveira <mauricfo@...ux.vnet.ibm.com>
Cc:     Benjamin LaHaise <bcrl@...ck.org>,
        Alexander Viro <viro@...iv.linux.org.uk>,
        linux-fsdevel@...r.kernel.org, linux-aio@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: aio: questions with ioctx_alloc() and large num_possible_cpus()

On Tue, Oct 04, 2016 at 07:55:12PM -0300, Mauricio Faria de Oliveira wrote:
> Hi Benjamin, Kent, and others,
> 
> Would you please comment / answer about this possible problem?
> Any feedback is appreciated.
> 
> Since commit e1bdd5f27a5b ("aio: percpu reqs_available") the maximum
> number of aio nr_events may be a function of num_possible_cpus() and
> actually be /inversely proportional/ to it (i.e., more CPUs lead to
> less system-wide aio nr_events). This is a problem on larger systems.
> 
> That's because if "nr_events < num_possible_cpus() * 4" (for example
> nr_events == 1) that counts as "num_possible_cpus() * 4" into aio_nr
> and against aio_max_nr
> 
>     static struct kioctx *ioctx_alloc(unsigned nr_events)
>     ...
>         nr_events = max(nr_events, num_possible_cpus() * 4);
>         nr_events *= 2;
>     ...
>         /* limit the number of system wide aios */
>     ....
>         if (aio_nr + nr_events > (aio_max_nr * 2UL) ||
>     ...
>             err = -EAGAIN;
>     ...
>         aio_nr += ctx->max_reqs;
>     ...
> 
> That problem is easily noticeable on a common POWER8 system:  160 CPUs
> (2 sockets * 10 cores/socket * 8 threads/core = 160 CPUs) limits the max
> AIO contexts with "io_setup(1, )" to 102 out of 64k (default ax_aio_nr):
> 
>     # cat /sys/devices/system/cpu/possible
>     0-159
> 
>     # cat /proc/sys/fs/aio-max-nr
>     65536
> 
>     # echo $(( 65536 / (160 * 4) ))
>     102
> 
> test-case snippet & output:
> 
>     for (i = 0; i < 65536; i++)
>         if (rc = io_setup(1, &ioctx[i]))
>             break;
> 
>     printf("rc = %d, i = %d\n", rc, i);
> 
>     > rc = -11, i = 102
> 
> (another problem is that the sysctl aio-nr grows larger than aio-max-nr,
> since it's checked against "aio_max_nr * 2")
> 
> So,
> 
> I've been trying to understand/fix this, but soon got stuck on options
> as I didn't quite get a few points.. if you could provide some insight,
> please, that would be really helpful:
> 
> - why "num_possible_cpus() * 4", and why "max(nr_events, <it>)" ?

For the scheme to work - percpu allocation of slots - we have to ensure that
there aren't too many unused slots stranded on other CPUs. The stranding is
limited to 1/4th of the slots as I figured any more than that could be too
unpredictable - the effective maximum number of in flight iocbs would vary too
much.

For systems with large numbers of CPUs, what I'd prefer to do is make it per
core or numa node or somesuch. But we don't have any infrastructure for that
equivilant to the alloc_percpu() stuff, so that's why I didn't do it at the
time.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ