linux-kernel - Re: [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out vm_map

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y4dVWsmEUcjPBeYc@pc636>
Date:   Wed, 30 Nov 2022 14:06:34 +0100
From:   Uladzislau Rezki <urezki@...il.com>
To:     Baoquan He <bhe@...hat.com>
Cc:     Matthew Wilcox <willy@...radead.org>, linux-kernel@...r.kernel.org,
        linux-mm@...ck.org, akpm@...ux-foundation.org,
        stephen.s.brennan@...cle.com, urezki@...il.com, hch@...radead.org
Subject: Re: [PATCH RFC 3/3] mm/vmalloc.c: allow vread() to read out
 vm_map_ram areas

On Thu, Nov 24, 2022 at 05:52:34PM +0800, Baoquan He wrote:
> On 11/23/22 at 01:24pm, Matthew Wilcox wrote:
> > On Wed, Nov 23, 2022 at 11:38:54AM +0800, Baoquan He wrote:
> > > On 11/18/22 at 08:01am, Matthew Wilcox wrote:
> > > > On Wed, Nov 09, 2022 at 11:35:34AM +0800, Baoquan He wrote:
> > > > > Currently, vread() can read out vmalloc areas which is associated with
> > > > > a vm_struct. While this doesn't work for areas created by vm_map_ram()
> > > > > interface because it doesn't allocate a vm_struct. Then in vread(),
> > > > > these areas will be skipped.
> > > > > 
> > > > > Here, add a new function vb_vread() to read out areas managed by
> > > > > vmap_block specifically. Then recognize vm_map_ram areas via vmap->flags
> > > > > and handle  them respectively.
> > > > 
> > > > i don't understand how this deals with the original problem identified,
> > > > that the vread() can race with an unmap.
> > > 
> > > Thanks for checking.
> > > 
> > > I wrote a paragraph, then realized I misunderstood your concern. You are
> > > saying the comment from Uladzislau about my original draft patch, right?
> > > Paste the link of Uladzislau's reply here in case other people want to
> > > know the background:
> > > https://lore.kernel.org/all/Y1uKSmgURNEa3nQu@pc636/T/#u
> > > 
> > > When Stephen raised the issue originally, I posted a draft patch as
> > > below trying to fix it:
> > > https://lore.kernel.org/all/Y1pHTj2wuhoWmeV3@MiWiFi-R3L-srv/T/#u
> > > 
> > > In above draft patch, I tried to differentiate normal vmalloc area and
> > > vm_map_ram area with the fact that vmalloc area is associated with a
> > > vm_struct, while vm_map_ram area has ->vm as NULL. And I thought their
> > > only difference is normal vmalloc area has guard page, so its size need
> > > consider the guard page; while vm_map_ram area has no guard page, only
> > > consider its own actual size. Uladzislau's comment reminded me I was
> > > wrong. And the things we need handle are beyond that.
> > > 
> > > Currently there are three kinds of vmalloc areas in kernel:
> > > 
> > > 1) normal vmalloc areas, associated with a vm_struct, this is allocated 
> > > in __get_vm_area_node(). When freeing, it set ->vm to NULL
> > > firstly, then unmap and free vmap_area, see remove_vm_area().
> > > 
> > > 2) areas allocated via vm_map_ram() and size is larger than
> > > VMAP_MAX_ALLOC. The entire area is not associated with vm_struct, and
> > > freed at one time in vm_unmap_ram() with unmapping and freeing vmap_area;
> > > 
> > > 3) areas allocated via vm_map_ram(), then delegate to vb_alloc() when
> > > size <= VMAP_MAX_ALLOC. Its vmap_area is allocated at one time with
> > > VMAP_BLOCK_SIZE big, and split and used later through vb_alloc(), freed
> > > via vb_free(). When the entire area is dirty, it will be unmapped and
> > > freed.
> > > 
> > > Based on above facts, we need add flags to differentiate the normal
> > > vmalloc area from the vm_map_ram area, namely area 1) and 2). And we
> > > also need flags to differentiate the area 2) and 3). Because area 3) are
> > > pieces of a entire vmap_area, vb_free() will unmap the piece of area and
> > > set the part dirty, but the entire vmap_area will kept there. So when we
> > > will read area 3), we need take vb->lock and only read out the still
> > > mapped part, but not dirty or free part of the vmap_area.
> > 
> > I don't think you understand the problem.
> > 
> > Task A:			Task B:		Task C:
> > p = vm_map_ram()
> > 			vread(p);
> > 			... preempted ...
> > vm_unmap_ram(p);
> > 					q = vm_map_ram();
> > 			vread continues
> 
> 
> 
> > 
> > If C has reused the address space allocated by A, task B is now reading
> > the memory mapped by task C instead of task A.  If it hasn't, it's now
> > trying to read from unmapped, and quite possibly freed memory.  Which
> > might have been allocated by task D.
> 
> Hmm, it may not be like that.
> 
> Firstly, I would remind that vread() takes vmap_area_lock during the
> whole reading process. Before this patchset, the vread() and other area
> manipulation will have below status:
> 1) __get_vm_area_node() could only finish insert_vmap_area(), then free
> the vmap_area_lock, then warting;
> 2) __get_vm_area_node() finishs setup_vmalloc_vm()
>   2.1) doing mapping but not finished;
>   2.2) clear_vm_uninitialized_flag() is called after mapping is done;
> 3) remove_vm_area() is called to set -> = NULL, then free vmap_area_lock;
> 
> Task A:			   Task B:		     Task C:
> p = __get_vm_area_node()
> remove_vm_area(p);
> 			   vread(p);
> 
> 			   vread end 
> 					     q = __get_vm_area_node();
> 
> So, as you can see, the checking "if (!va->vm)" in vread() will filter
> out vmap_area:
> a) areas if only insert_vmap_area() is called, but ->vm is still NULL; 
> b) areas if remove_vm_area() is called to clear ->vm to NULL;
> c) areas created through vm_map_ram() since its ->vm is always NULL;
> 
> Means vread() will read out vmap_area:
> d) areas if setup_vmalloc_vm() is called;
>   1) mapping is done on areas, e.g clear_vm_uninitialized_flag() is
>        called;
>   2) mapping is being handled, just after returning from setup_vmalloc_vm();
> 
> 
> ******* after this patchset applied:
> 
> Task A:			Task B:		Task C:
> p = vm_map_ram()
> vm_unmap_ram(p);
> 			vread(p);
>                          vb_vread()
> 			vread end 
> 
> 					q = vm_map_ram();
> 
> With this patchset applied, other than normal areas, for the
> vm_map_ram() areas:
> 1) In vm_map_ram(), set vmap_area->flags = VMAP_RAM when vmap_area_lock
>    is taken; In vm_unmap_ram(), clear it wiht "va->flags &= ~VMAP_RAM"
>    when vmap_area_lock is taken;
> 2) If vmap_block, set va->flags = VMAP_RAM|VMAP_BLOCK; And set
>    vmap_block->used_map to track the used region, filter out the dirty
>    and free region;
> 3) In vb_vread(), we take vb->lock to avoid reading out dirty regions.
> 
> Please help point out what is wrong or I missed.
> 
One thing is we still can read-out un-mapped pages, i.e. a text instead:

<snip>
static void vb_free(unsigned long addr, unsigned long size)
{
	unsigned long offset;
	unsigned int order;
	struct vmap_block *vb;

	BUG_ON(offset_in_page(size));
	BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);

	flush_cache_vunmap(addr, addr + size);

	order = get_order(size);
	offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
	vb = xa_load(&vmap_blocks, addr_to_vb_idx(addr));

	vunmap_range_noflush(addr, addr + size);

	if (debug_pagealloc_enabled_static())
		flush_tlb_kernel_range(addr, addr + size);

	spin_lock(&vb->lock);
...
<snip>

or am i missing something? Is it a problem? It might be. Another thing
it would be good if you upload a new patchset so it is easier to review
it.

Thanks!

--
Uladzislau Rezki