lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <000301d02e36$13a163c0$3ae42b40$@samsung.com>
Date:	Mon, 12 Jan 2015 15:04:12 +0800
From:	Chao Yu <chao2.yu@...sung.com>
To:	'Jaegeuk Kim' <jaegeuk@...nel.org>
Cc:	'Changman Lee' <cm224.lee@...sung.com>,
	linux-f2fs-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org
Subject: RE: [RFC PATCH] f2fs: add extent cache base on rb-tree

Hi Jaegeuk,

> -----Original Message-----
> From: Jaegeuk Kim [mailto:jaegeuk@...nel.org]
> Sent: Wednesday, January 07, 2015 7:01 AM
> To: Chao Yu
> Cc: 'Changman Lee'; linux-f2fs-devel@...ts.sourceforge.net; linux-kernel@...r.kernel.org
> Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> 
> Hi Chao,
> 
> On Sun, Jan 04, 2015 at 04:24:10PM +0800, Chao Yu wrote:
> > Hi Jaegeuk,
> >
> > > -----Original Message-----
> > > From: Jaegeuk Kim [mailto:jaegeuk@...nel.org]
> > > Sent: Wednesday, December 31, 2014 4:26 PM
> > > To: Chao Yu
> > > Cc: 'Changman Lee'; linux-f2fs-devel@...ts.sourceforge.net; linux-kernel@...r.kernel.org
> > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > >
> > > Hi Chao,
> > >
> > > On Tue, Dec 30, 2014 at 06:10:21PM +0800, Chao Yu wrote:
> > > > Hi Jaegeuk,
> > > >
> > > > > -----Original Message-----
> > > > > From: Jaegeuk Kim [mailto:jaegeuk@...nel.org]
> > > > > Sent: Tuesday, December 30, 2014 5:23 AM
> > > > > To: Chao Yu
> > > > > Cc: 'Changman Lee'; linux-f2fs-devel@...ts.sourceforge.net;
> linux-kernel@...r.kernel.org
> > > > > Subject: Re: [RFC PATCH] f2fs: add extent cache base on rb-tree
> > > > >
> > > > > Hi Chao,
> > > > >
> > > > > On Mon, Dec 29, 2014 at 03:19:18PM +0800, Chao Yu wrote:
> > > > >
> > > > > [snip]
> > > > >
> > > > > Nice draft. :)
> > > >
> > > > Thanks! :)
> > > >
> > > > >
> > > > > >
> > > > > > Please see the draft below.
> > > > > >
> > > > > > 1) Extent management:
> > > > > > If we use global management that managing all extents which are from different
> > > > > > inodes in sbi, we will face with serious lock contention when we access these
> > > > > > extents belong to different inodes concurrently, the loss may outweights the
> > > > > > gain.
> > > > >
> > > > > Agreed.
> > > > >
> > > > > > So we choose a local management for extent which means all extents are
> > > > > > managed by inode itself to avoid above lock contention. Addtionlly, we manage
> > > > > > all extents globally by linking all inode into a global lru list for extent
> > > > > > cache shrinker.
> > > > > > Approach:
> > > > > > 	a) build extent tree/rwlock/lru list/extent count in each inode.
> > > > > > 		*extent tree: link all extent in rb-tree;
> > > > > > 		*rwlock: protect fields when accessing extent cache concurrently;
> > > > > > 		*lru list: sort all extents in accessing time order;
> > > > > > 		*extent count: record total count of extents in cache.
> > > > > > 	b) use lru shrink list in sbi to manage all inode which cached extents.
> > > > > > 		*inode will be added or repostioned in this global list whenever
> > > > > > 		extent is being access in this inode.
> > > > > > 		*use spinlock to protect this shrink list.
> > > > >
> > > > > 1. How about adding a data structure with inode number instead of referring
> > > > > inode pointer?
> > > > >
> > > > > 2. How about managing extent entries globally and setting an upper bound to
> > > > > the number of extent entries instead of limiting them per each inode?
> > > > > (The rb-tree will handle many extents per inode.)
> > > >
> > > > [assumption]
> > > > Approach A: global lru list;
> > > > Approach B: lru list per inode.
> > > >
> > > > I have considered Approach A before writing the draft, although Approach A could
> > > > provide us higher ratio hit due to global lru, it may make lock contention more
> > > > intensively and has more memory overhead descripted below. So I choose more
> > > > conservative Approach B.
> > > > 1) as extent_entry will split in Cow, we may lock extent_list more times when
> > > > move these separated extent entries in extent_list, unless we have batch mode
> > > > interface.
> > > > 2) lock overhead between shrinker and lookuper and updater.
> > > > e.g. extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
> > > > (#1) has A, C, E, G in rb-tree
> > > > (#2) has B, D, F in rb-tree
> > > > shrinker op: lock list -> trylock #1 -> unlock list -> free A -> unlock #1
> > > > 	lock list -> trylock #2 -> unlock list -> free B -> unlock #2
> > > > 	lock list -> trylock #1 -> unlock list -> free C -> unlock #1
> > > > 	...
> > >
> > > lock_list -> list_del(A) -> unlock_list -> lock #1 -> remove_tree(A) -> unlock #1
> > > -> free A.
> >
> > If unlock_list before lock #1, (A) could be modified by other thread between ->unlock_list
> > and lock #1, so one more "lookup_tree(A)" is needed under lock #1:
> > lock #1 -> lookup_tree(A) -> remove_tree(A) -> unlock #1
> >
> > >
> > > Or,
> > > lock_list -> list_del(A, B, C) -> unlock_list -> lock #1 -> remove_tree(A, C) ->
> > > unlock #1 -> free A, C -> lock #2 -> remove_tree(B) -> unlock #2
> >
> > ditto.
> > To avoid unneeded lookup_tree(A), maybe we can use this series:
> >
> > lock list -> get A from head -> trylock #1 -> try to get more node(C, E, G) of #1 in list
> > -> unlock list -> remove_tree&free(A, C, E, G) -> unlock #1
> >
> > >
> > > I think a couple of list operations do not cause severe lock contention.
> >
> > Yeah, batch operation is better.
> > But sometimes random write break extent into everywhere in shrinker's list,
> > so batch operation can gain less in this condition.
> >
> > > And, if we assing minimum length for extent caches, the contention would be
> > > distributed.
> > >
> > > This means that, it needs to manage each of all the extents in the cache should
> > > have the length larger than minimum value.
> >
> > If I do not misunderstand, what you mean is that if extent number in one inode cache
> > is bigger than minimum value, then, we will start to add all extents of this inode
> > into shrinker's list, and reposition them in the list when f2fs_{lookup,update}_extent_cache?
> >
> > >
> > > > *trylock/unlock* for all free extent entries belong to one inode will be separated
> > > > to lots of parts, resulting in more contention.
> > > > 3) it has more memory overhead in each extent entry:
> > > > Approach A: need add inode number for backref and list_head *
> > >
> > > It doesn't need to add ino. Just remain them, and shrinker can remove empty
> > > ino_entry_for_extents periodically.
> >
> > I do not understand why we don't need ino, :(.
> > Without ino, how can we get the rb-tree root pointer for rb_erase(node, root)?
> 
> Let me describe in more detail.
> 
> For example,
> 
> - global radix tree in sbi
>   --> ino #1 --> rb_tree (rb_tree, rb_tree_lock, refcount=0)
>   --> ino #2 --> rb_tree
>   --> ino #3 --> rb_tree
>                     `-> extent A
> 		    `-> extent B (fofs, blkaddr, length, *list_head=empty)
> 
> - global extent list in sbi
> 
>   extent_list: (LRU) A -> B -> C -> D -> E -> F -> G (MRU)
>   (#1) has A, C, E, G in rb-tree
>   (#2) has B, D, F in rb-tree
> 
> 1) update extents (#1)
> 
>  - lock_radix_tree
>      if (notfound #1)
>          allocate #1;
>      #1's refcount++;
>  - unlock_radix_tree
> 
>  - lock_rb_tree (#1)
>      update A
>      rb_delete E, if E->length is smaller than MIN_LEN.
>      allocate & rb_add K, if K->length is larget than MIN_LEN.
> 
>      list_lock
>      list_move(A)
>      list_del(E)
>      list_add_tail(K)
>      list_unlock
>  - unlock_rb_tree (#1)
> 
>  - #1's rafcount--;
> 
> 2) shrinker
> 
>  - list_lock
>      list_del A, B, C, ...
>  - list_unlock
> 
>  - gang_look_up_radix_tree_with_lock {
> 
>      #N's refcount++;
> 
>      lock_rb_tree (#N)
>          for_each_rb_node {
>              if (list_empty(extent->list_head)) {
>                  remove_rb_node(extent);
>                  free(extent);
>              }
> 	 }
>      unlock_rb_tree (#N)
> 
>      #N's refcount--;
>  - }
> 
>  - for_each_radix_tree_with_lock {
>      if (radix_node->refcount == 0 && rb_tree_is_empty)
>          free(radix_node);
>  - }
> 
> 3) lookup extents (#1)
> 
>  - lock_radix_tree
>      if (notfound #1)
>          goto out;
>      #1's refcount++;
>  - unlock_radix_tree
> 
>  - lock_rb_tree (#1)
>      found extent A
> 
>      list_lock
>      list_move(A)
>      list_unlock
>  - unlock_rb_tree (#1)
> 
>  - #1's rafcount--;

Really Good! Thanks for your suggestion and the work.

The following RFC patch set is written based on above detail design.

Please review and comment on it, thank you! :)

Thanks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ