linux-kernel - x86: Hardware breakpoints are not always triggered

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANaxB-wGTnLX=s9aVG4ziXVtsVn672Q4BmfEX12=398_XZdmqA@mail.gmail.com>
Date:	Thu, 28 Jan 2016 00:31:41 -0800
From:	Andrey Wagin <avagin@...il.com>
To:	LKML <linux-kernel@...r.kernel.org>,
	Andy Lutomirski <luto@...capital.net>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Oleg Nesterov <oleg@...hat.com>,
	"criu@...nvz.org" <criu@...nvz.org>, kvm@...r.kernel.org
Subject: x86: Hardware breakpoints are not always triggered

Hi,

We use hardware breakpoints in CRIU and we found that sometimes we set
a break-point, but a process doesn't stop on it.

I write a small reproducer for this bug. It create two processes,
where a parent process traces a child. The parent process sets a
break-point and each time when the child stop on it, the parent sets
the variable "xxx" to A in a child process. The child runs an infinite
loop, where it check the variable "xxx" and sets it to B. If a child
process finds that xxx is equal to B, it exits with a non-zero code,
what means that a break-point was not triggered. The source code is
attached.

The reproducer uses a different break-point address if it is executed
with arguments than when it executed without arguments.

Then I made a few experiments. The bug is triggered, if we execute
this program a few times in a KVM virtual machine.

[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[3] 4088
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint > /dev/null
|| { echo "FAIL - $?"; break; }; done ) &
[4] 4091
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[5] 4094
[root@...2-vm ptrace]# ( while :; do ./ptrace_breakpoint 1 2 >
/dev/null || { echo "FAIL - $?"; break; }; done ) &
[6] 4097
[8] 4103
[root@...2-vm ptrace]# 0087: exit - 5
0131: exited, status=1
0126: wait: No child processes
FAIL - 3

I tried to execute the reproducer on the host (where kvm VM-s are
running), but the bug was not triggered during one hour.

When I executed the reproducer in VM without stopping processes on the
host, I found that a bug is triggered much faster in this case.

[root@...2-vm ptrace]# ./ptrace_breakpoint 1
....
stop 24675
cont
child2 1
stop 24676
cont
child2 1
child2 5
0088: exit - 5
stop 24677
0132: exited, status=1
cont
0127: wait: No child processes

I know that this bug can be reproduced starting with the 4.2 kernel. I
haven't test older versions of the kernel.

I tried to print drX registers after a break-point. Looks like they
are set correctly.

Maybe someone has any ideas where a problem is or how it can be investigated.

Here is a criu issue for this problem:
https://github.com/xemul/criu/issues/107

Thanks,
Andrew

View attachment "ptrace_breakpoint.c" of type "text/x-csrc" (3521 bytes)