linux-kernel - Re: Change in functionality of futex() system call.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BANLkTinS-c2Z=LD_tJfVCEqkMaAd4KPgwg@mail.gmail.com>
Date:	Tue, 7 Jun 2011 16:12:28 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	David Oliver <david@...advisors.com>
Cc:	Eric Dumazet <eric.dumazet@...il.com>,
	Darren Hart <dvhart@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-kernel@...r.kernel.org,
	Shawn Bohrer <sbohrer@...advisors.com>,
	Zachary Vonler <zvonler@...advisors.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Hugh Dickins <hughd@...gle.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...e.hu>
Subject: Re: Change in functionality of futex() system call.

On Tue, Jun 7, 2011 at 4:04 PM, David Oliver <david@...advisors.com> wrote:
> On Tue, Jun 7, 2011 at 2:53 PM, Andrew Lutomirski <luto@....edu> wrote:
>> On Tue, Jun 7, 2011 at 3:33 PM, David Oliver <david@...advisors.com> wrote:
>>> On Tue, Jun 7, 2011 at 2:19 PM, Andrew Lutomirski <luto@....edu> wrote:
>>>> On Tue, Jun 7, 2011 at 3:10 PM, David Oliver <david@...advisors.com> wrote:
>>>>> On Tue, Jun 7, 2011 at 1:43 PM, Andrew Lutomirski <luto@....edu> wrote:
>>>>>> On Tue, Jun 7, 2011 at 11:58 AM, Eric Dumazet <eric.dumazet@...il.com> wrote:
>>>>>>> Le mardi 07 juin 2011 à 10:44 -0400, Andy Lutomirski a écrit :
>>>>>>>> On 06/06/2011 11:13 PM, Darren Hart wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 06/06/2011 11:11 AM, Eric Dumazet wrote:
>>>>>>>> >> Le lundi 06 juin 2011 à 10:53 -0700, Darren Hart a écrit :
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >>> If I understand the problem correctly, RO private mapping really doesn't
>>>>>>>> >>> make any sense and we should probably explicitly not support it, while
>>>>>>>> >>> special casing the RO shared mapping in support of David's scenario.
>>>>>>>> >>>
>>>>>>>> >>
>>>>>>>> >> We supported them in 2.6.18 kernels, apparently. This might sounds
>>>>>>>> >> stupid but who knows ?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > I guess this is actually the key point we need to agree on to provide a
>>>>>>>> > solution. This particular case "worked" in 2.6.18 kernels, but that
>>>>>>>> > doesn't necessarily mean it was supported, or even intentional.
>>>>>>>> >
>>>>>>>> > It sounds to me that we agree that we should support RO shared mappings.
>>>>>>>> > The question remains about whether we should introduce deliberate
>>>>>>>> > support of RO private mappings, and if so, if the forced COW approach is
>>>>>>>> > appropriate or not.
>>>>>>>> >
>>>>>>>>
>>>>>>>> I disagree.
>>>>>>>>
>>>>>>>> FUTEX_WAIT has side-effects.  Specifically, it eats one wakeup sent by
>>>>>>>> FUTEX_WAKE.  So if something uses futexes on a file mapping, then a
>>>>>>>> process with only read access could (if the semantics were changed) DoS
>>>>>>>> the other processes by spawning a bunch of threads and FUTEX_WAITing
>>>>>>>> from each of them.
>>>>>>>>
>>>>>>>> If there were a FUTEX_WAIT_NOCONSUME that did not consume a wakeup and
>>>>>>>> worked on RO mappings, I would drop my objection.
>>>>>>>
>>>>>>> If a group of cooperating processes uses a memory segment to exchange
>>>>>>> critical information, do you really think this memory segment will be
>>>>>>> readable by other unrelated processes on the machine ?
>>>>>>
>>>>>> Depends on the design.
>>>>>>
>>>>>> I have some software I'm working on that uses shared files and could
>>>>>> easily use futexes.
>>>>>>
>>>>> I have software which currently uses shared files for a one way
>>>>> transfer of information, which is modeled precisely by the futex (as
>>>>> contrasted to the mutex) model. In this case, the number of receivers
>>>>> is undetermined, so the number of wakeups is set to maxint.
>>>>>
>>>>> The receivers are minimally trusted: they have read access to the
>>>>> files, so they cannot accidentally affect other processes use of the
>>>>> data. Requiring my files to be writeable by all clients would require
>>>>> a serious increase in the amount of software needing to be trusted.
>>>>
>>>> What's wrong with adding a FUTEX_WAIT_NOCONSUME flag then?  Your
>>>> program can use it to get exactly the semantics it wants and my
>>>> program can use it or not depending on which semantics it wants.
>>>>
>>> 1. I would prefer not to require my programs have to check for kernel
>>> version (code named "working", "regressed", and "altered") to decide
>>> which parameters need to be sent to the futex call.
>>
>> You don't have to check for kernel version.  Just try
>> FUTEX_WAIT_NOCONSUME first and retry with FUTEX_WAIT if it returns
>> -EINVAL.
>>
> ... and punt if that gives me an EFAULT. Possible but clumsy.
> Fortunately, I'm not writing code for general consumption.
>
>> I think you've already lost on regressed kernels regardless :-/
>>
>>> 2. Doing FUTEX_WAIT_NOCONSUME would change the semantics of
>>> futex_wake() between the "working" and "altered" kernels, as it would
>>> no longer return the number of processes woken.
>>
>> True, but that change couldn't affect old code because old code
>> wouldn't use FUTEX_WAIT_NOCONSUME.
>>
> So, how would I find out the number of processes awakened by the
> futex_wake() - I only care for statistical purposes.

Add a FUTEX_WAKE_COUNT_NOCONSUME or some such magic flag.  Yeah, not so pretty.

>
>>>
>>> It seems that FUTEX_WAIT_NOCONSUME would be rather like a
>>> non-consuming read on a pipe.
>>
>> More like a nonconsuming read on an eventfd, which sounds very useful.
>>  (Actually, I'm porting code from Windows to Linux right now that
>> wants that feature...)
>>
>> The reason I bring this up now is that I've been annoyed that
>> FUTEX_WAIT can be used on an R/O mapping to interfere with futexes in
>> that mapping.  Under the original semantics this would have been
>> pretty much impossible to fix, but the regression has been there for
>> long enough that we have the option right now to fix it better instead
>> of restoring the original behavior.
>>
> Not being a kernel developer, the change seems very recent - about
> when I started finding my code failing with EFAULTs.
>
> From my perspective, that's a real case of my futexes being interfered with :).

Fair enough.  But it's a little late to prevent the regression.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/