lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAJt8pk97CM6pFr7O_ms8biCRsjkM6Mreh1XgjWhf1jgzzs7AZA@mail.gmail.com>
Date:	Fri, 20 Mar 2015 18:53:26 +0000
From:	Pavel Labath <labath@...gle.com>
To:	unlisted-recipients:; (no To-header on input)
Cc:	linux-kernel@...r.kernel.org
Subject: Re: A peculiarity in ptrace/waitpid behavior

Sending again, this time as plain text (I hope)...

On 20 March 2015 at 18:46, Pavel Labath <labath@...gle.com> wrote:
>
> Hi,
>
> thanks for the super quick response. :)
>
> I am at home now, so I don't have access to the same machine to run the test. I will run it on monday and let you know.
>
> Meanwhile, I have tried running your test on my home machine, and it is indeed reporting "unexpected wait: stat=57f". If I understand correctly, that means the wait has reported sigtrap even though the tracee was in ptrace-stop.
>
> I can imagine that something similar is happening in our case. Since PTRACE_CONT and waitpid calls are happening in different threads, I can't positively say which one has occurred sooner. So far I have assumed the sequence was PTRACE_CONT -> waitpid -> PTRACE_SIGINFO. However, if wait can return even though the process is stopped then a possible sequence of events is waitpid -> PTRACE_CONT -> PTRACE_SIGINFO, in which case it is not surprising that the last call fails. One difference I see though is that in our test, we are not sending any additional signals to the thread in question (at least we shouldn't be sending them, but we are sending some signals to other threads in the same process). Do you think it could still be the same issue?
>
> I would be happy to test your patch. I don't think I can patch the kernel on my work machine directly, but I think I might be able to set up some sort of a test environment to try it out.
>
> regards,
> pavel
>
>
> On 20 March 2015 at 16:25, Oleg Nesterov <oleg@...hat.com> wrote:
>>
>> Hi Pavel,
>>
>> let me add lkml, we should not discuss this offlist.
>>
>> On 03/20, Pavel Labath wrote:
>> >
>> > 1) we get a waitpid() notification that the tracee got SIGUSR1
>> > 2) we do a ptrace(GETSIGINFO) to get more info
>> > 3) eventually we decide to restart the tracee with PTRACE_CONT, passing it
>> > SIGUSR1
>> > 4) immediately after that we get another waitpid notification, again with
>> > SIGUSR1, even though the thread had received no additional signals
>> > 5) we again try to a GETSIGINFO, however this time it fails with ESRCH.
>> > Therefore, we assume that the thread has died
>>
>> I found a similar bug by code inspection some time ago. I even have
>> a fix, but I need to think more... And I even wrote the test-case ;)
>> see below.
>>
>> But so far I can't say if you hit the same problem or not. If you can
>> reproduce the problem, perhaps I can send you debugging patch?
>>
>> Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ