lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180426222814.GA10301@localhost.localdomain>
Date:   Thu, 26 Apr 2018 19:28:14 -0300
From:   Marcelo Ricardo Leitner <marcelo.leitner@...il.com>
To:     Oleg Babin <obabin@...tuozzo.com>
Cc:     netdev@...r.kernel.org, linux-sctp@...r.kernel.org,
        "David S. Miller" <davem@...emloft.net>,
        Vlad Yasevich <vyasevich@...il.com>,
        Neil Horman <nhorman@...driver.com>,
        Xin Long <lucien.xin@...il.com>,
        Andrey Ryabinin <aryabinin@...tuozzo.com>
Subject: Re: [PATCH net-next 0/2] net/sctp: Avoid allocating high order
 memory with kmalloc()

On Fri, Apr 27, 2018 at 01:14:56AM +0300, Oleg Babin wrote:
> Hi Marcelo,
>
> On 04/24/2018 12:33 AM, Marcelo Ricardo Leitner wrote:
> > Hi,
> >
> > On Mon, Apr 23, 2018 at 09:41:04PM +0300, Oleg Babin wrote:
> >> Each SCTP association can have up to 65535 input and output streams.
> >> For each stream type an array of sctp_stream_in or sctp_stream_out
> >> structures is allocated using kmalloc_array() function. This function
> >> allocates physically contiguous memory regions, so this can lead
> >> to allocation of memory regions of very high order, i.e.:
> >>
> >>   sizeof(struct sctp_stream_out) == 24,
> >>   ((65535 * 24) / 4096) == 383 memory pages (4096 byte per page),
> >>   which means 9th memory order.
> >>
> >> This can lead to a memory allocation failures on the systems
> >> under a memory stress.
> >
> > Did you do performance tests while actually using these 65k streams
> > and with 256 (so it gets 2 pages)?
> >
> > This will introduce another deref on each access to an element, but
> > I'm not expecting any impact due to it.
> >
>
> No, I didn't do such tests. Could you please tell me what methodology
> do you usually use to measure performance properly?
>
> I'm trying to do measurements with iperf3 on unmodified kernel and get
> very strange results like this:
...

I've been trying to fight this fluctuation for some time now but
couldn't really fix it yet. One thing that usually helps (quite a lot)
is increasing the socket buffer sizes and/or using smaller messages,
so there is more cushion in the buffers.

What I have seen in my tests is that when it floats like this, is
because socket buffers floats between 0 and full and don't get into a
steady state. I believe this is because of socket buffer size is used
for limiting the amount of memory used by the socket, instead of being
the amount of payload that the buffer can hold. This causes some
discrepancy, especially because in SCTP we don't defrag the buffer (as
TCP does, it's the collapse operation), and the announced rwnd may
turn up being a lie in the end, which triggers rx drops, then tx cwnd
reduction, and so on. SCTP min_rto of 1s also doesn't help much on
this situation.

On netperf, you may use -S 200000,200000 -s 200000,200000. That should
help it.

Cheers,
Marcelo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ