[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANaxB-wGTnLX=s9aVG4ziXVtsVn672Q4BmfEX12=398_XZdmqA@mail.gmail.com>
Date: Thu, 28 Jan 2016 00:31:41 -0800
From: Andrey Wagin <avagin@...il.com>
To: LKML <linux-kernel@...r.kernel.org>,
Andy Lutomirski <luto@...capital.net>,
Cyrill Gorcunov <gorcunov@...nvz.org>,
Oleg Nesterov <oleg@...hat.com>,
"criu@...nvz.org" <criu@...nvz.org>, kvm@...r.kernel.org
Subject: x86: Hardware breakpoints are not always triggered
Hi,
We use hardware breakpoints in CRIU and we found that sometimes we set
a break-point, but a process doesn't stop on it.
I write a small reproducer for this bug. It create two processes,
where a parent process traces a child. The parent process sets a
break-point and each time when the child stop on it, the parent sets
the variable "xxx" to A in a child process. The child runs an infinite
loop, where it check the variable "xxx" and sets it to B. If a child
process finds that xxx is equal to B, it exits with a non-zero code,
what means that a break-point was not triggered. The source code is
attached.
The reproducer uses a different break-point address if it is executed
with arguments than when it executed without arguments.
Then I made a few experiments. The bug is triggered, if we execute
this program a few times in a KVM virtual machine.
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[3] 4088
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[4] 4091
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[5] 4094
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[6] 4097
[8] 4103
[root@...2-vm ptrace]# 0087: exit - 5
0131: exited, status=1
0126: wait: No child processes
FAIL - 3
I tried to execute the reproducer on the host (where kvm VM-s are
running), but the bug was not triggered during one hour.
When I executed the reproducer in VM without stopping processes on the
host, I found that a bug is triggered much faster in this case.
[root@...2-vm ptrace]# ./ptrace_breakpoint 1
....
stop 24675
cont
child2 1
stop 24676
cont
child2 1
child2 5
0088: exit - 5
stop 24677
0132: exited, status=1
cont
0127: wait: No child processes
I know that this bug can be reproduced starting with the 4.2 kernel. I
haven't test older versions of the kernel.
I tried to print drX registers after a break-point. Looks like they
are set correctly.
Maybe someone has any ideas where a problem is or how it can be investigated.
Here is a criu issue for this problem:
https://github.com/xemul/criu/issues/107
Thanks,
Andrew
View attachment "ptrace_breakpoint.c" of type "text/x-csrc" (3521 bytes)
Powered by blists - more mailing lists