lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <B5EC0909-54CE-47D6-8930-8C9CFC243180@alertlogic.com>
Date:   Wed, 13 Nov 2019 17:03:44 +0000
From:   "Harris, Robert" <robert.harris@...rtlogic.com>
To:     Mikael Pettersson <mikpelinux@...il.com>
CC:     "tglx@...utronix.de" <tglx@...utronix.de>,
        "mingo@...hat.com" <mingo@...hat.com>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "dvhart@...radead.org" <dvhart@...radead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        "Harris, Robert" <robert.harris@...rtlogic.com>
Subject: Re: Help requested: futex(..., FUTEX_WAIT_PRIVATE, ...) returns EPERM



> On 13 Nov 2019, at 13:29, Mikael Pettersson <mikpelinux@...il.com> wrote:
>
> On Tue, Nov 12, 2019 at 6:43 PM Harris, Robert
> <robert.harris@...rtlogic.com> wrote:
>>
>> I am investigating an issue on 4.9.184 in which futex() returns EPERM
>> intermittently for
>>
>> futex(uaddr, FUTEX_WAIT_PRIVATE, val, &timeout, NULL, 0)
>>
>> The failure affects an application in an AWS lambda;  traditional
>> debugging approaches vary from difficult to impossible.  I cannot
>> reproduce the problem at will, instrument the kernel, install a new
>> kernel or get an application core dump.
>>
>> Understanding the circumstances under which EPERM can be returned for
>> FUTEX_WAIT_PRIVATE would be useful but it is not a documented failure
>> mode.  I have spent some time looking through futex.c but have not
>> found anything yet.  I would be grateful for a hint from someone more
>> knowledgeable.
>
>
> I just wanted to add that a colleague of mine reported the exact same
> issue to me two days ago: a highly threaded application (the Erlang
> VM) running in AWS lambda, futex wait calls occasionally failing with
> EPERM.  I don't have more specifics than that, I've asked for kernel
> version and the exact parameters in the failed futex call.

Thanks, that's a great data point.  One of my outstanding questions had
been "why does this happen to only us?"

When I look at the timings I can say with some confidence that the
problem stopped for us minutes after

2017 on 2019-10-23 in us-east-1
2030 on 2019-10-24 in eu-west-1
1817 on 2019-10-25 in us-west-2

(all times UTC).  I've logged a ticket with Amazon to find out what
changed.

Robert
Confidentiality Notice | This email and any included attachments may be privileged, confidential and/or otherwise protected from disclosure. Access to this email by anyone other than the intended recipient is unauthorized. If you believe you have received this email in error, please contact the sender immediately and delete all copies. If you are not the intended recipient, you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ