lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <58C08535.3070000@iogearbox.net>
Date:   Wed, 08 Mar 2017 23:27:01 +0100
From:   Daniel Borkmann <daniel@...earbox.net>
To:     Linus Torvalds <torvalds@...ux-foundation.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...nel.org>, Peter Anvin <hpa@...or.com>
CC:     Fengguang Wu <fengguang.wu@...el.com>,
        Network Development <netdev@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>, LKP <lkp@...org>,
        ast@...com, the arch/x86 maintainers <x86@...nel.org>,
        keescook@...omium.org, labbott@...hat.com, davem@...emloft.net
Subject: Re: [net/bpf] 3051bf36c2 BUG: unable to handle kernel paging request
 at 0000a7cf

[ + Kees, Laura, and Dave ]

On 03/08/2017 08:25 PM, Linus Torvalds wrote:
> Adding x86 people too, since this seems to be something off about
> ARCH_HAS_SET_MEMORY for x86-32.
>
> The code seems to be shared between x86-32 and 64, I'm not seeing why
> set_memory_r[ow]() should fail on one but not the other.
>
> Considering that it seems to be flaky even on 32-bit, maybe it's
> timing-related, or possibly related to TLB sizes or whatever (ie more
> likely hidden by a larger TLB on more modern hardware?)
>
> Anyway, just looking at change_page_attr_set_clr(), I notice that the
> page alias checking treats NX specially:
>
>          /* No alias checking for _NX bit modifications */
>          checkalias = (pgprot_val(mask_set) | pgprot_val(mask_clr)) != _PAGE_NX;
>
> which seems insane. Why would NX be different from other protection
> bits (like _PAGE_RW)?
>
> But that doesn't explain why the bpf code would have issues with this
> all only on x86-32.
>
> Maybe somebody else can see why ARCH_HAS_SET_MEMORY would depend on
> 64-bit only..

Last days I tried to narrow this down a bit, I have a dirty hack I
used just for the sake of experimenting when booting into Fengguang's
x86-32 test kernel via qemu:

  lib/Makefile      |  3 ++-
  lib/test_setmem.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++
  2 files changed, 52 insertions(+), 1 deletion(-)
  create mode 100644 lib/test_setmem.c

diff --git a/lib/Makefile b/lib/Makefile
index 320ac46a..702af98 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -41,7 +41,8 @@ obj-y += bcd.o div64.o sort.o parser.o debug_locks.o random32.o \
  	 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
  	 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
  	 percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o \
-	 once.o refcount.o
+	 once.o refcount.o test_setmem.o
+
  obj-y += string_helpers.o
  obj-$(CONFIG_TEST_STRING_HELPERS) += test-string_helpers.o
  obj-y += hexdump.o
diff --git a/lib/test_setmem.c b/lib/test_setmem.c
new file mode 100644
index 0000000..82b08db
--- /dev/null
+++ b/lib/test_setmem.c
@@ -0,0 +1,50 @@
+#include <linux/init.h>
+#include <linux/uaccess.h>
+#include <linux/vmalloc.h>
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/delay.h>
+
+#include <asm/cacheflush.h>
+
+static int __init test_setmem(void)
+{
+        const gfp_t gfp_flags = GFP_KERNEL | __GFP_HIGHMEM | __GFP_ZERO;
+        int ret, i, good = 0, max_runs = 1000, one = 1;
+        u32 pages, size;
+        u8 *buff;
+
+        for (i = 0; i < max_runs; i++) {
+                size  = round_up(4096*8, PAGE_SIZE);
+                pages = size / PAGE_SIZE;
+
+                buff  = __vmalloc(size, gfp_flags, PAGE_KERNEL);
+                if (buff == NULL) {
+                        printk("%s: allocation failed!\n", __func__);
+                        return 0;
+                }
+#if 1
+                buff[0] = 41;
+                buff[10] = 42;
+                buff[100] = 43;
+                buff[1000] = 44;
+		// can also be just read access, f.e. if (buff[0] + ...)
+#endif
+                WARN_ON_ONCE(set_memory_ro((unsigned long)buff, pages));
+//              msleep(0);
+
+                ret = probe_kernel_write(&buff[1], &one, sizeof(one));
+                if (!ret)
+                        printk("%s: write succeeded despite read-only!\n", __func__);
+                else
+                        good++;
+
+                WARN_ON_ONCE(set_memory_rw((unsigned long)buff, pages));
+                vfree(buff);
+        }
+
+        printk("%s: write failed due to read-only: %d/%d\n", __func__, good, max_runs);
+        return 0;
+}
+
+late_initcall(test_setmem);
-- 
2.9.3

The issue seems to be accessing buff first (can be read or write access)
and then doing set_memory_ro() doesn't make it read-only immediately,
meaning the subsequent call into probe_kernel_write() will succeed without
error.

Then, if I don't touch buff first and only do the set_memory_ro() seems
to work and probe_kernel_write() will then fail as expected due to pages
being read-only now.

Now, if I access buff, do the set_memory_ro() and then a msleep(0), for
example, it "kind of" works most of the time (see last log extract below),
and probe_kernel_write() will fail.

None of this seems an issue with x86_64 and the test_setmem runs fine all
the time, same for the actual BPF stuff.

I've also added rodata_test from the log (perhaps related).

x86_64 (access data before set_memory_ro()):

[...]
[    1.357834] Key type big_key registered
[    1.418542] test_setmem: write failed due to read-only: 1000/1000
[    1.419545]   Magic number: 9:497:653
[    1.419584] acpi LNXCPU:06: hash matches
[    1.419750] rtc_cmos 00:02: setting system clock to 2017-03-08 22:37:44 UTC (1489012664)
[    1.419777] PM: Hibernation image not present or could not be loaded.
[    1.420643] Freeing unused kernel memory: 1720K
[    1.420643] Write protecting the kernel read-only data: 14336k
[    1.420997] Freeing unused kernel memory: 1644K
[    1.421844] Freeing unused kernel memory: 652K
[    1.423054] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[    1.423057] rodata_test: all tests were successful
[...]

x86-32 (access data before set_memory_ro()):

[...]
[    8.551454] Key type trusted registered
[    8.551792] Key type encrypted registered
[    8.552274] test_setmem: write succeeded despite read-only!
[    8.552740] test_setmem: write succeeded despite read-only!
[    8.553205] test_setmem: write succeeded despite read-only!
[    8.553686] test_setmem: write succeeded despite read-only!
[    8.554179] test_setmem: write succeeded despite read-only!
[    8.554666] test_setmem: write succeeded despite read-only!
[    8.555157] test_setmem: write succeeded despite read-only!
[...]
[    9.042271] test_setmem: write succeeded despite read-only!
[    9.042741] test_setmem: write failed due to read-only: 0/1000
[    9.043343] hd: no drives specified - use hd=cyl,head,sectors on kernel command line
[    9.044034] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    9.044597] EDD information not available.
[    9.045233] Freeing unused kernel memory: 612K
[    9.045732] Write protecting the kernel text: 12604k
[    9.046234] Write protecting the kernel read-only data: 5124k
[    9.046781] rodata_test: test data was not read only
[...]

x86-32 (no access data before set_memory_ro()):

[...]
[    8.537275] Key type encrypted registered
[    8.541771] test_setmem: write failed due to read-only: 1000/1000
[    8.542481] hd: no drives specified - use hd=cyl,head,sectors on kernel command line
[    8.543265] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    8.543824] EDD information not available.
[    8.544467] Freeing unused kernel memory: 612K
[    8.544909] Write protecting the kernel text: 12604k
[    8.545399] Write protecting the kernel read-only data: 5128k
[    8.545903] rodata_test: test data was not read only
[...]

x86-32 (access data + msleep(0) before set_memory_ro()):

* run 1:

[...]
[    8.478233] Key type encrypted registered
[   20.334254] test_setmem: write succeeded despite read-only!
[   28.474980] test_setmem: write failed due to read-only: 999/1000
[   28.478187] hd: no drives specified - use hd=cyl,head,sectors on kernel command line
[   28.480476] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[   28.481342] EDD information not available.
[   28.482192] Freeing unused kernel memory: 612K
[   28.482841] Write protecting the kernel text: 12604k
[   28.483568] Write protecting the kernel read-only data: 5128k
[   28.484364] rodata_test: test data was not read only
[...]

* run 2:

[...]
[    8.472088] Key type encrypted registered
[   28.466359] test_setmem: write failed due to read-only: 1000/1000
[   28.469162] hd: no drives specified - use hd=cyl,head,sectors on kernel command line
[   28.470838] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[   28.471590] EDD information not available.
[   28.472346] Freeing unused kernel memory: 612K
[   28.472910] Write protecting the kernel text: 12604k
[   28.473542] Write protecting the kernel read-only data: 5128k
[   28.474232] rodata_test: test data was not read only
[...]

I've attached the .config and Fengguang's test script. Qemu version is
2.7.1 from Fedora. Can be run as:

./reproduce-quantal-ivb41-127-20170301175555-i386-randconfig-b0-02262304-4.10.0-04456-g3051bf3-1 arch/x86/boot/bzImage

Thanks,
Daniel

>                 Linus
>
> On Thu, Mar 2, 2017 at 12:40 PM, Daniel Borkmann <daniel@...earbox.net> wrote:
>> On 03/02/2017 09:23 PM, Fengguang Wu wrote:
>> [...]
>>>
>>> I confirm that the below patch provided by Daniel fixes the above
>>> issues on mainline kernel, too. Where should this patch be sent to?
>>
>>
>> If nobody objects, I could send it to -net tree via Dave due to being
>> BPF related, but I don't mind sending it elsewhere too (f.e. Linus
>> directly?) in order to stop your bot from continuing to send such mails.
>>
>> The issue seems only related to i386 and doesn't trigger each time with
>> Fengguang's kernel config and qemu image when I try to reproduce it.
>> set_memory_ro()/set_memory_rw() on i386 seems to work in general, but
>> when it's used/reproduced, from time to time (perhaps some corner-case?)
>> it looks like that memory area can have issues much later on after being
>> fed back to the allocator which then causes a GPF from random locations.
>> Gut feeling, it might be an issue in set_memory_*() that my commit
>> uncovered. Still looking into it, but mean-time I could just send the
>> below, sure.
>>
>> Thanks,
>> Daniel
>>
>>
>>> It'd be very noisy if all these Oops hit the upcoming RC1 kernel.
>>>
>>> Daniel thinks there may be deeper problem in i386 set_memory_rw().
>>> However that could take much longer time to debug.
>>>
>>> Thanks,
>>> Fengguang
>>> ---
>>>
>>> Re: [bpf] 9d876e79df:  BUG: unable to handle kernel paging request at
>>> 653a8346
>>>
>>>> On Tue, Feb 28, 2017 at 04:39:36PM +0100, Daniel Borkmann wrote:
>>>
>>>
>>> I have a rough feeling what it is, but I didn't have cycles to work on
>>> it yet (due to travel, sorry about that). The issue is likely shut down
>>> by just doing:
>>>
>>> ---
>>> arch/x86/Kconfig |    2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> --- linux.orig/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
>>> +++ linux/arch/x86/Kconfig    2017-03-03 03:44:35.962022996 +0800
>>> @@ -54,7 +54,7 @@ config X86
>>>       select ARCH_HAS_KCOV            if X86_64
>>>       select ARCH_HAS_MMIO_FLUSH
>>>       select ARCH_HAS_PMEM_API        if X86_64
>>> -    select ARCH_HAS_SET_MEMORY
>>> +    select ARCH_HAS_SET_MEMORY        if X86_64
>>>       select ARCH_HAS_SG_CHAIN
>>>       select ARCH_HAS_STRICT_KERNEL_RWX
>>>       select ARCH_HAS_STRICT_MODULE_RWX
>>
>>


View attachment "config.bak" of type "text/plain" (97576 bytes)

View attachment "reproduce-quantal-ivb41-127-20170301175555-i386-randconfig-b0-02262304-4.10.0-04456-g3051bf3-1" of type "text/plain" (888 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ