lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <tzvhz2bdmnuhcz6o6tqcdvkdbhjvptsaqzcv5acxqyhjevztvd@grkw6pqthovb>
Date: Wed, 16 Oct 2024 12:45:49 -0400
From: "Liam R. Howlett" <Liam.Howlett@...cle.com>
To: Jann Horn <jannh@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH fix 6.12] mm: mark mas allocation in
 vms_abort_munmap_vmas as __GFP_NOFAIL

* Jann Horn <jannh@...gle.com> [241016 12:04]:
> On Wed, Oct 16, 2024 at 5:40 PM Liam R. Howlett <Liam.Howlett@...cle.com> wrote:
> > * Jann Horn <jannh@...gle.com> [241016 11:08]:
> > > vms_abort_munmap_vmas() is a recovery path where, on entry, some VMAs
> > > have already been torn down halfway (in a way we can't undo) but are
> > > still present in the maple tree.
> > >
> > > At this point, we *must* remove the VMAs from the VMA tree, otherwise
> > > we get UAF.
> >
> > Wait, the vma should be re-attached without PTEs and files closed in
> > this case.  I don't see how we are hitting the UAF in your example - we
> > shouldn't unless there is something with the driver itself and the file
> > close?
> >
> > My concern is that I am missing an error scenario.
> 
> Where are the files supposed to be closed? vms_clean_up_area() unlinks
> the VMA from the file and calls ->close() but doesn't unlink the file,
> right?

Correct.

> 
> FWIW, I tested on commit eca631b8fe80 (current mainline head), I
> didn't check whether anything in the MM tree already addresses this.
> (I probably should have...)

I have not addressed this.

> 
> > But since this is under a lock that allows sleeping, shouldn't the
> > shrinker kick in?
> 
> Yeah, I think without error injection, you'd basically only fail this
> allocation if the OOM killer has already decided to kill your task and
> the system is entirely out of memory.
> 
> OOM kills IIRC have two effects on the page allocator:
> 
> 1. they allow allocating with no watermarks (ALLOC_OOM) (based on the
> theory that you might need to allocate some memory in order to be able
> to exit and free more memory)
> 2. they allow giving up on GFP_KERNEL allocations (see the "/* Avoid
> allocations with no watermarks from looping endlessly */" part of
> __alloc_pages_slowpath())
> 
> > Should we just use __GFP_NOFAIL for the first store
> > instead of the error path?
> 
> Which first store? Do you mean get rid of vms_abort_munmap_vmas()
> entirely somehow?

Yes.. but there are other failures throughout the function so I think
this is the best plan.

> 
> > > Fixes: 4f87153e82c4 ("mm: change failure of MAP_FIXED to restoring the gap on failure")
> > > Signed-off-by: Jann Horn <jannh@...gle.com>

Thanks for fixing this.

Reviewed-by: Liam R. Howlett <Liam.Howlett@...cle.com>

> > > ---
> > > This can be tested with the following reproducer (on a kernel built with
> > > CONFIG_KASAN=y, CONFIG_FAILSLAB=y, CONFIG_FAULT_INJECTION_DEBUG_FS=y,
> > > with the reproducer running as root):
> > >
> > > ```
> > >
> > >   typeof(x) __res = (x);      \
> > >   if (__res == (typeof(x))-1) \
> > >     err(1, "SYSCHK(" #x ")"); \
> > >   __res;                      \
> > > })
> > >
> > > static void write_file(char *name, char *buf) {
> > >   int fd = open(name, O_WRONLY);
> > >   if (fd == -1)
> > >     err(1, "unable to open for writing: %s", name);
> > >   if (SYSCHK(write(fd, buf, strlen(buf))) != strlen(buf))
> > >     errx(1, "write %s", name);
> > >   SYSCHK(close(fd));
> > > }
> > >
> > > int main(void) {
> > >   // make a large area with a bunch of VMAs
> > >   char *area = SYSCHK(mmap(NULL, AREA_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0));
> > >   for (int off=0; off<AREA_SIZE; off += 0x2000)
> > >     SYSCHK(mmap(area+off, 0x1000, PROT_READ, MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0));
> > >
> > >   // open a file whose mappings use usbdev_vm_ops, and map it in part of this area
> > >   int map_fd = SYSCHK(open("/dev/bus/usb/001/001", O_RDONLY));
> > >   void *map = SYSCHK(mmap(area, 0x1000, PROT_READ, MAP_SHARED|MAP_FIXED, map_fd, 0));
> > >   close(map_fd);
> > >
> > >   // make RWX mapping request fail late
> > >   SYSCHK(prctl(65/*PR_SET_MDWE*/, 1/*PR_MDWE_REFUSE_EXEC_GAIN*/, 0, 0, 0));
> > >
> > >   // some random other file
> > >   int fd = SYSCHK(open("/etc/passwd", O_RDONLY));
> > >
> > >   /* disarm for now */
> > >   write_file("/sys/kernel/debug/failslab/probability", "0");
> > >
> > >   /* fail allocations of maple_node... */
> > >   write_file("/sys/kernel/debug/failslab/cache-filter", "Y");
> > >   write_file("/sys/kernel/slab/maple_node/failslab", "1");
> > >   /* ... even though it's a sleepable allocation... */
> > >   write_file("/sys/kernel/debug/failslab/ignore-gfp-wait", "N");
> > >   /* ... in this task... */
> > >   write_file("/sys/kernel/debug/failslab/task-filter", "Y");
> > >   write_file("/proc/self/make-it-fail", "1");
> > >   /* ... every time... */
> > >   write_file("/sys/kernel/debug/failslab/times", "-1");
> > >   /* ... after 8 successful allocations (value chosen experimentally)... */
> > >   write_file("/sys/kernel/debug/failslab/space", "2048"); // one object is 256
> > >   /* ... and print where the allocations actually failed... */
> > >   write_file("/sys/kernel/debug/failslab/verbose", "2");
> > >   /* ... and that's it, arm it */
> > >   write_file("/sys/kernel/debug/failslab/probability", "100");
> > >
> > >   // This mmap request will fail late due to MDWE.
> > >   // The error recovery path will attempt to clear out the VMA pointers with a
> > >   // spanning_store (which will be blocked by error injection).
> > >   mmap(area, AREA_SIZE, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, fd, 0);
> > >
> > >   /* disarm */
> > >   write_file("/sys/kernel/debug/failslab/probability", "0");
> > >
> > >   SYSCHK(munmap(map, 0x1000)); // UAF expected here
> > >   system("cat /proc/$PPID/maps");
> > > }
> > > ```
> > >
> > > Result:
> > > ```
> > > FAULT_INJECTION: forcing a failure.
> > > name failslab, interval 1, probability 100, space 256, times -1
> > > CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-geca631b8fe80 #518
> > > [...]
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x80/0xa0
> > >  should_fail_ex+0x4d3/0x5c0
> > > [...]
> > >  should_failslab+0xc7/0x130
> > >  kmem_cache_alloc_noprof+0x73/0x3a0
> > > [...]
> > >  mas_alloc_nodes+0x3a3/0x690
> > >  mas_nomem+0xaa/0x1d0
> > >  mas_store_gfp+0x515/0xa80
> > > [...]
> > >  mmap_region+0xa96/0x2590
> > > [...]
> > >  do_mmap+0x71e/0xfe0
> > > [...]
> > >  vm_mmap_pgoff+0x17a/0x2f0
> > > [...]
> > >  ksys_mmap_pgoff+0x2ee/0x460
> > >  do_syscall_64+0x68/0x140
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [...]
> > >  </TASK>
> > > mmap: unmap-fail: (607) Unable to abort munmap() operation
> > > ==================================================================
> > > BUG: KASAN: slab-use-after-free in dec_usb_memory_use_count+0x365/0x430
> > > Read of size 8 at addr ffff88810e9ba8b8 by task unmap-fail/607
> >
> > What was this pointer?
> 
> Should be the "struct usb_memory *usbm".
> 
> > >
> > > CPU: 3 UID: 0 PID: 607 Comm: unmap-fail Not tainted 6.12.0-rc3-00013-geca631b8fe80 #518
> > > [...]
> > > Call Trace:
> > >  <TASK>
> > >  dump_stack_lvl+0x66/0xa0
> > >  print_report+0xce/0x670
> > > [...]
> > >  kasan_report+0xf7/0x130
> > > [...]
> > >  dec_usb_memory_use_count+0x365/0x430
> > >  remove_vma+0x76/0x120
> > >  vms_complete_munmap_vmas+0x447/0x750
> > >  do_vmi_align_munmap+0x4b9/0x700
> > > [...]
> > >  do_vmi_munmap+0x164/0x2e0
> > >  __vm_munmap+0x128/0x2a0
> > > [...]
> > >  __x64_sys_munmap+0x59/0x80
> > >  do_syscall_64+0x68/0x140
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [...]
> > >  </TASK>
> > >
> > > Allocated by task 607:
> > >  kasan_save_stack+0x33/0x60
> > >  kasan_save_track+0x14/0x30
> > >  __kasan_kmalloc+0xaa/0xb0
> > >  usbdev_mmap+0x1a0/0xaf0
> > >  mmap_region+0xf6e/0x2590
> > >  do_mmap+0x71e/0xfe0
> > >  vm_mmap_pgoff+0x17a/0x2f0
> > >  ksys_mmap_pgoff+0x2ee/0x460
> > >  do_syscall_64+0x68/0x140
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > >
> > > Freed by task 607:
> > >  kasan_save_stack+0x33/0x60
> > >  kasan_save_track+0x14/0x30
> > >  kasan_save_free_info+0x3b/0x60
> > >  __kasan_slab_free+0x4f/0x70
> > >  kfree+0x148/0x450
> > >  vms_clean_up_area+0x188/0x220
> >
> > What line is this?
> 
> Should be the vma->vm_ops->close(vma) call. (That would call
> dec_usb_memory_use_count(), which tail-calls kfree(), so the
> dec_usb_memory_use_count() wouldn't show up in a stack trace.)
> 
> I don't have this kernel build anymore though, sorry. If you want I'll
> rebuild and get a properly symbolized trace.

Right, so it's the driver.  That makes sense.

Your patch is the best plan.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ