lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190520194915.GB1646@sventech.com>
Date:   Mon, 20 May 2019 12:49:15 -0700
From:   Johannes Erdfelt <johannes@...felt.com>
To:     Josh Poimboeuf <jpoimboe@...hat.com>, Jessica Yu <jeyu@...hat.com>,
        Jiri Kosina <jikos@...nel.org>,
        Miroslav Benes <mbenes@...e.cz>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ingo Molnar <mingo@...hat.com>, live-patching@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Oops caused by race between livepatch and ftrace

There exists a race condition between livepatch and ftrace that can
cause an oops similar to this one:

  BUG: unable to handle page fault for address: ffffffffc005b1d9
  #PF: supervisor write access in kernel mode
  #PF: error_code(0x0003) - permissions violation
  PGD 3ea0c067 P4D 3ea0c067 PUD 3ea0e067 PMD 3cc13067 PTE 3b8a1061
  Oops: 0003 [#1] PREEMPT SMP PTI
  CPU: 1 PID: 453 Comm: insmod Tainted: G           O  K   5.2.0-rc1-a188339ca5 #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-20181126_142135-anatol 04/01/2014
  RIP: 0010:apply_relocate_add+0xbe/0x14c
  Code: fa 0b 74 21 48 83 fa 18 74 38 48 83 fa 0a 75 40 eb 08 48 83 38 00 74 33 eb 53 83 38 00 75 4e 89 08 89 c8 eb 0a 83 38 00 75 43 <89> 08 48 63 c1 48 39 c8 74 2e eb 48 83 38 00 75 32 48 29 c1 89 08
  RSP: 0018:ffffb223c00dbb10 EFLAGS: 00010246
  RAX: ffffffffc005b1d9 RBX: 0000000000000000 RCX: ffffffff8b200060
  RDX: 000000000000000b RSI: 0000004b0000000b RDI: ffff96bdfcd33000
  RBP: ffffb223c00dbb38 R08: ffffffffc005d040 R09: ffffffffc005c1f0
  R10: ffff96bdfcd33c40 R11: ffff96bdfcd33b80 R12: 0000000000000018
  R13: ffffffffc005c1f0 R14: ffffffffc005e708 R15: ffffffff8b2fbc74
  FS:  00007f5f447beba8(0000) GS:ffff96bdff900000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: ffffffffc005b1d9 CR3: 000000003cedc002 CR4: 0000000000360ea0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   klp_init_object_loaded+0x10f/0x219
   ? preempt_latency_start+0x21/0x57
   klp_enable_patch+0x662/0x809
   ? virt_to_head_page+0x3a/0x3c
   ? kfree+0x8c/0x126
   patch_init+0x2ed/0x1000 [livepatch_test02]
   ? 0xffffffffc0060000
   do_one_initcall+0x9f/0x1c5
   ? kmem_cache_alloc_trace+0xc4/0xd4
   ? do_init_module+0x27/0x210
   do_init_module+0x5f/0x210
   load_module+0x1c41/0x2290
   ? fsnotify_path+0x3b/0x42
   ? strstarts+0x2b/0x2b
   ? kernel_read+0x58/0x65
   __do_sys_finit_module+0x9f/0xc3
   ? __do_sys_finit_module+0x9f/0xc3
   __x64_sys_finit_module+0x1a/0x1c
   do_syscall_64+0x52/0x61
   entry_SYSCALL_64_after_hwframe+0x44/0xa9
  RIP: 0033:0x7f5f44764881
  Code: 0e 4c 8b 44 24 10 4d 8d 48 08 4c 89 4c 24 10 44 8b 4c 24 08 4d 8b 00 4c 01 c9 41 83 f9 2f 76 05 48 8b 4c 24 10 4c 8b 09 0f 05 <48> 89 c7 e8 f7 e9 fd ff 48 83 c4 58 c3 56 31 d2 be 02 00 08 00 bf
  RSP: 002b:00007ffcf1e64b80 EFLAGS: 00000212 ORIG_RAX: 0000000000000139
  RAX: ffffffffffffffda RBX: 00000000006229a0 RCX: 00007f5f44764881
  RDX: 0000000000000000 RSI: 000000000041a506 RDI: 0000000000000003
  RBP: 000000000041a506 R08: 0000000000000000 R09: 00007ffcf1e64c48
  R10: 0000000000000003 R11: 0000000000000212 R12: 0000000000000000
  R13: 00000000006228c0 R14: 0000000000000000 R15: 0000000000000000
  Modules linked in: livepatch_test03(OK+) livepatch_test01(OK+) livepatch_test11(OK+) livepatch_test06(OK+) livepatch_test04(OK+) livepatch_test02(OK+) livepatch_test10(OK+) livepatch_test16(OK+) livepatch_test18(OK-) livepatch_test12(OK)
  CR2: ffffffffc005b1d9
  ---[ end trace 52fee0aa635dd5a1 ]---

The oops occurs because both livepatch and ftrace remap the module text
section at the same. After ftrace is done, it can leave the module text
text section mapped RO when the livepatch code is expecting it to be RW.

CPU A			CPU B
--------		--------
			set_all_modules_text_rw		(module is now RW)
module_disable_ro					(module is still RW)
			set_all_modules_text_ro		(module is now RO)
apply_relocate_add					(oops)

I've reproduced it from the latest code in git (a188339ca5a3), 5.0.17,
4.19.44, 4.14.120, 4.9.177 and some kernels in between. I haven't tried
reproducing on any kernel older than 4.9. From looking at the older
versions in git, it will likely be harder to reproduce since the kernel
switched in 4.5 to remapping RW once from remapping on every write.

The oops will only occur if CONFIG_STRICT_MODULE_RWX is enabled
(previously called CONFIG_DEBUG_SET_MODULE_RONX).

I've found two ways the race condition can be reproduced:
1) loading multiple livepatches at the same time
2) loading a livepatch at the same time ftrace is patching code

They are both ultimately the same root cause since livepatch uses ftrace
to perform the patching.

I have put together a test case that can reproduce the crash using
KVM. The tarball includes a minimal kernel and initramfs, along with
a script to run qemu and the .config used to build the kernel. By
default it will attempt to reproduce by loading multiple livepatches
at the same time. Passing 'test=ftrace' to the script will attempt to
reproduce by racing with ftrace.

My test setup reproduces the race and oops more reliably by loading
multiple livepatches at the same time than with the ftrace method. It's
not 100% reproducible, so the test case may need to be run multiple
times.

It can be found here (not attached because of its size):
http://johannes.erdfelt.com/5.2.0-rc1-a188339ca5-livepatch-race.tar.gz

The simple patch of extending the module_mutex lock over the entirety
of klp_init_object_loaded fixes it from the livepatch side. This
mostly works because set_all_modules_text_{rw,ro} acquires module_mutex
as well, but it still leaves a hole in the ftrace code. A lock should
probably be held over the entirety of remapping the text sections RW.

This is complicated by the fact that remapping the text section in
ftrace is handled by arch specific code. I'm not sure what a good
solution to this is yet.

JE

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ