linux-kernel - Re: [RFC 1/3] zsmalloc: move pages_allocated to zs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140814000959.GG9227@bbox>
Date:	Thu, 14 Aug 2014 09:09:59 +0900
From:	Minchan Kim <minchan@...nel.org>
To:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
Cc:	Dan Streetman <ddstreet@...e.org>, Linux-MM <linux-mm@...ck.org>,
	linux-kernel <linux-kernel@...r.kernel.org>,
	Jerome Marchand <jmarchan@...hat.com>, juno.choi@....com,
	seungho1.park@....com, Luigi Semenzato <semenzato@...gle.com>,
	Nitin Gupta <ngupta@...are.org>
Subject: Re: [RFC 1/3] zsmalloc: move pages_allocated to zs_pool

On Thu, Aug 14, 2014 at 12:25:04AM +0900, Sergey Senozhatsky wrote:
> On (08/14/14 00:13), Sergey Senozhatsky wrote:
> > > On Wed, Aug 13, 2014 at 10:14 AM, Sergey Senozhatsky
> > > <sergey.senozhatsky@...il.com> wrote:
> > > > On (08/13/14 09:59), Dan Streetman wrote:
> > > >> On Tue, Aug 5, 2014 at 4:02 AM, Minchan Kim <minchan@...nel.org> wrote:
> > > >> > Pages_allocated has counted in size_class structure and when user
> > > >> > want to see total_size_bytes, it gathers all of value from each
> > > >> > size_class to report the sum.
> > > >> >
> > > >> > It's not bad if user don't see the value often but if user start
> > > >> > to see the value frequently, it would be not a good deal for
> > > >> > performance POV.
> > > >> >
> > > >> > This patch moves the variable from size_class to zs_pool so it would
> > > >> > reduce memory footprint (from [255 * 8byte] to [sizeof(atomic_t)])
> > > >> > but it adds new locking overhead but it wouldn't be severe because
> > > >> > it's not a hot path in zs_malloc(ie, it is called only when new
> > > >> > zspage is created, not a object).
> > > >>
> > > >> Would using an atomic64_t without locking be simpler?
> > > >
> > > > it would be racy.
> > > 
> > > oh.  atomic operations aren't smp safe?  is that because other
> > > processors might use a stale value, and barriers must be added?  I
> > > guess I don't quite understand the value of atomic then. :-/
> > 
> > pool not only set the value, it also read it and make some decisions
> > based on that value:
> > 
> > 	pages_allocated += X
> > 	if (pages_allocated >= max_pages_allocated)
> > 		return 0;
> 
> 
> I mean, suppose this happens on two CPUs
> 
> max_pages_allocated is 10; current pages_allocated is 8. now you have 2 zs_malloc()
> happenning on two CPUs. each of them will do `pages_allocated += 1'. the problem is
> that both will see 10 at `if (pages_allocated >= max_pages_allocated)', so we will
> fail 2 operations, while we only were supposed to fail one.

Exactly speaking, you're saying not max_pages_allocated but pages_limited.
But I admit the race could affect max_pages_allocated, too.

I think it would be not severe if we move the feature into zram because
zram's requirement is not strict and the gap is just bounded by the number
of CPU so we could remove both spinlock and atomic.


> 
> 	-ss
> 
> > 
> > > >>
> > > >> >
> > > >> > Signed-off-by: Minchan Kim <minchan@...nel.org>
> > > >> > ---
> > > >> >  mm/zsmalloc.c | 30 ++++++++++++++++--------------
> > > >> >  1 file changed, 16 insertions(+), 14 deletions(-)
> > > >> >
> > > >> > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > > >> > index fe78189624cf..a6089bd26621 100644
> > > >> > --- a/mm/zsmalloc.c
> > > >> > +++ b/mm/zsmalloc.c
> > > >> > @@ -198,9 +198,6 @@ struct size_class {
> > > >> >
> > > >> >         spinlock_t lock;
> > > >> >
> > > >> > -       /* stats */
> > > >> > -       u64 pages_allocated;
> > > >> > -
> > > >> >         struct page *fullness_list[_ZS_NR_FULLNESS_GROUPS];
> > > >> >  };
> > > >> >
> > > >> > @@ -216,9 +213,12 @@ struct link_free {
> > > >> >  };
> > > >> >
> > > >> >  struct zs_pool {
> > > >> > +       spinlock_t stat_lock;
> > > >> > +
> > > >> >         struct size_class size_class[ZS_SIZE_CLASSES];
> > > >> >
> > > >> >         gfp_t flags;    /* allocation flags used when growing pool */
> > > >> > +       unsigned long pages_allocated;
> > > >> >  };
> > > >> >
> > > >> >  /*
> > > >> > @@ -882,6 +882,7 @@ struct zs_pool *zs_create_pool(gfp_t flags)
> > > >> >
> > > >> >         }
> > > >> >
> > > >> > +       spin_lock_init(&pool->stat_lock);
> > > >> >         pool->flags = flags;
> > > >> >
> > > >> >         return pool;
> > > >> > @@ -943,8 +944,10 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
> > > >> >                         return 0;
> > > >> >
> > > >> >                 set_zspage_mapping(first_page, class->index, ZS_EMPTY);
> > > >> > +               spin_lock(&pool->stat_lock);
> > > >> > +               pool->pages_allocated += class->pages_per_zspage;
> > > >> > +               spin_unlock(&pool->stat_lock);
> > > >> >                 spin_lock(&class->lock);
> > > >> > -               class->pages_allocated += class->pages_per_zspage;
> > > >> >         }
> > > >> >
> > > >> >         obj = (unsigned long)first_page->freelist;
> > > >> > @@ -997,14 +1000,14 @@ void zs_free(struct zs_pool *pool, unsigned long obj)
> > > >> >
> > > >> >         first_page->inuse--;
> > > >> >         fullness = fix_fullness_group(pool, first_page);
> > > >> > -
> > > >> > -       if (fullness == ZS_EMPTY)
> > > >> > -               class->pages_allocated -= class->pages_per_zspage;
> > > >> > -
> > > >> >         spin_unlock(&class->lock);
> > > >> >
> > > >> > -       if (fullness == ZS_EMPTY)
> > > >> > +       if (fullness == ZS_EMPTY) {
> > > >> > +               spin_lock(&pool->stat_lock);
> > > >> > +               pool->pages_allocated -= class->pages_per_zspage;
> > > >> > +               spin_unlock(&pool->stat_lock);
> > > >> >                 free_zspage(first_page);
> > > >> > +       }
> > > >> >  }
> > > >> >  EXPORT_SYMBOL_GPL(zs_free);
> > > >> >
> > > >> > @@ -1100,12 +1103,11 @@ EXPORT_SYMBOL_GPL(zs_unmap_object);
> > > >> >
> > > >> >  u64 zs_get_total_size_bytes(struct zs_pool *pool)
> > > >> >  {
> > > >> > -       int i;
> > > >> > -       u64 npages = 0;
> > > >> > -
> > > >> > -       for (i = 0; i < ZS_SIZE_CLASSES; i++)
> > > >> > -               npages += pool->size_class[i].pages_allocated;
> > > >> > +       u64 npages;
> > > >> >
> > > >> > +       spin_lock(&pool->stat_lock);
> > > >> > +       npages = pool->pages_allocated;
> > > >> > +       spin_unlock(&pool->stat_lock);
> > > >> >         return npages << PAGE_SHIFT;
> > > >> >  }
> > > >> >  EXPORT_SYMBOL_GPL(zs_get_total_size_bytes);
> > > >> > --
> > > >> > 2.0.0
> > > >> >
> > > >> > --
> > > >> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > > >> > the body to majordomo@...ck.org.  For more info on Linux MM,
> > > >> > see: http://www.linux-mm.org/ .
> > > >> > Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>
> > > >>
> > > 
> > 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@...ck.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@...ck.org"> email@...ck.org </a>

-- 
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/