linux-kernel - Re: [PATCH 2/5] zram: partial IO refactoring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 4 Apr 2017 13:50:14 +0900
From:   Minchan Kim <minchan@...nel.org>
To:     Sergey Senozhatsky <sergey.senozhatsky.work@...il.com>
CC:     Andrew Morton <akpm@...ux-foundation.org>,
        <linux-kernel@...r.kernel.org>,
        Sergey Senozhatsky <sergey.senozhatsky@...il.com>,
        <kernel-team@....com>
Subject: Re: [PATCH 2/5] zram: partial IO refactoring

Hi Sergey,

On Tue, Apr 04, 2017 at 11:17:06AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> On (04/03/17 14:17), Minchan Kim wrote:
> > +static bool zram_special_page_read(struct zram *zram, u32 index,
> > +				struct page *page,
> > +				unsigned int offset, unsigned int len)
> > +{
> > +	struct zram_meta *meta = zram->meta;
> > +
> > +	bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value);
> > +	if (unlikely(!meta->table[index].handle) ||
> > +			zram_test_flag(meta, index, ZRAM_SAME)) {
> > +		void *mem;
> > +
> > +		bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> > +		mem = kmap_atomic(page);
> > +		zram_fill_page(mem + offset, len, meta->table[index].element);
> > +		kunmap_atomic(mem);
> > +		return true;
> > +	}
> > +	bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> > +
> > +	return false;
> > +}
> > +
> > +static bool zram_special_page_write(struct zram *zram, u32 index,
> > +					struct page *page)
> > +{
> > +	unsigned long element;
> > +	void *mem = kmap_atomic(page);
> > +
> > +	if (page_same_filled(mem, &element)) {
> > +		struct zram_meta *meta = zram->meta;
> > +
> > +		kunmap_atomic(mem);
> > +		/* Free memory associated with this sector now. */
> > +		bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value);
> > +		zram_free_page(zram, index);
> > +		zram_set_flag(meta, index, ZRAM_SAME);
> > +		zram_set_element(meta, index, element);
> > +		bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> > +
> > +		atomic64_inc(&zram->stats.same_pages);
> > +		return true;
> > +	}
> > +	kunmap_atomic(mem);
> > +
> > +	return false;
> > +}
> 
> zram_special_page_read() and zram_special_page_write() have a slightly
> different locking semantics.
> 
> zram_special_page_read() copy-out ZRAM_SAME page having slot unlocked
> (can the slot got overwritten in the meantime?), while

IMHO, yes, it can be overwritten but it doesn't make corruption of kernel.
I mean if such race happens, it's user fault who should protect the race.
zRAM is dumb block device so it can read/write block user request but
one thing we should keep the promise is it shouldn't corrupt the kernel.
Such pov, zram_special_page_read wouldn't be a problem to return
stale data, I think.

> zram_special_page_write() keeps the slot locked through out the entire
> operation.

zram_special_page_write is something different because it updates
zram_table's slot via zram_set_[flag|element] so it should be protected
by zram.

> 
> >  static void zram_meta_free(struct zram_meta *meta, u64 disksize)
> >  {
> >  	size_t num_pages = disksize >> PAGE_SHIFT;
> > @@ -504,169 +548,104 @@ static void zram_free_page(struct zram *zram, size_t index)
> >  	zram_set_obj_size(meta, index, 0);
> >  }
> >  
> > -static int zram_decompress_page(struct zram *zram, char *mem, u32 index)
> > +static int zram_decompress_page(struct zram *zram, struct page *page, u32 index)
> >  {
> > -	int ret = 0;
> > -	unsigned char *cmem;
> > -	struct zram_meta *meta = zram->meta;
> > +	int ret;
> >  	unsigned long handle;
> >  	unsigned int size;
> > +	void *src, *dst;
> > +	struct zram_meta *meta = zram->meta;
> > +
> > +	if (zram_special_page_read(zram, index, page, 0, PAGE_SIZE))
> > +		return 0;
> >  
> >  	bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value);
> >  	handle = meta->table[index].handle;
> >  	size = zram_get_obj_size(meta, index);
> >  
> > -	if (!handle || zram_test_flag(meta, index, ZRAM_SAME)) {
> > -		bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> > -		zram_fill_page(mem, PAGE_SIZE, meta->table[index].element);
> > -		return 0;
> > -	}
> > -
> > -	cmem = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
> > +	src = zs_map_object(meta->mem_pool, handle, ZS_MM_RO);
> >  	if (size == PAGE_SIZE) {
> > -		copy_page(mem, cmem);
> > +		dst = kmap_atomic(page);
> > +		copy_page(dst, src);
> > +		kunmap_atomic(dst);
> > +		ret = 0;
> >  	} else {
> >  		struct zcomp_strm *zstrm = zcomp_stream_get(zram->comp);
> >  
> > -		ret = zcomp_decompress(zstrm, cmem, size, mem);
> > +		dst = kmap_atomic(page);
> > +		ret = zcomp_decompress(zstrm, src, size, dst);
> > +		kunmap_atomic(dst);
> >  		zcomp_stream_put(zram->comp);
> >  	}
> >  	zs_unmap_object(meta->mem_pool, handle);
> >  	bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> >  
> >  	/* Should NEVER happen. Return bio error if it does. */
> > -	if (unlikely(ret)) {
> > +	if (unlikely(ret))
> >  		pr_err("Decompression failed! err=%d, page=%u\n", ret, index);
> > -		return ret;
> > -	}
> >  
> > -	return 0;
> > +	return ret;
> >  }
> >  
> >  static int zram_bvec_read(struct zram *zram, struct bio_vec *bvec,
> > -			  u32 index, int offset)
> > +				u32 index, int offset)
> >  {
> >  	int ret;
> >  	struct page *page;
> > -	unsigned char *user_mem, *uncmem = NULL;
> > -	struct zram_meta *meta = zram->meta;
> > -	page = bvec->bv_page;
> >  
> > -	bit_spin_lock(ZRAM_ACCESS, &meta->table[index].value);
> > -	if (unlikely(!meta->table[index].handle) ||
> > -			zram_test_flag(meta, index, ZRAM_SAME)) {
> > -		bit_spin_unlock(ZRAM_ACCESS, &meta->table[index].value);
> > -		handle_same_page(bvec, meta->table[index].element);
> > +	page = bvec->bv_page;
> > +	if (zram_special_page_read(zram, index, page, bvec->bv_offset,
> > +				bvec->bv_len))
> 
> so, I think zram_bvec_read() path calls zram_special_page_read() twice:
> 
>   a) direct zram_special_page_read() call
> 
>   b) zram_decompress_page()->zram_special_page_read()
> 
> is it supposed to be so?

Yes, Because zram_decompress_page is called by zram_bvec_write
in case of partial IO. Maybe, we makes it simple with removing
zram_special_page_read in zram_bvec_read. I will look.