linux-kernel - TOI: introducing Split Lists hybrids of doubly linked lists within the Linux Kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-Id: <077A69BC-2CCB-4EF1-AAED-EFCF53E6D1CA@earthlink.net>
Date:	Tue, 10 Mar 2015 21:18:01 -0700
From:	Mitchell Erblich <erblichs@...thlink.net>
To:	linux-kernel@...r.kernel.org
Subject: TOI: introducing Split Lists hybrids of doubly linked lists within the Linux Kernel


Please note that this proposal is from this engineer and not from the company he works for.

This SHOULD also fulfills any legal notification of work done, but not submitted to the Linux Kernel.


Transfer of Information  : Introduction to what I term as split lists
——————————-

		Double link list support has been available for decades within UNIX OSs.  There is a perceived perception that double link list support is slow because of walking the list from the head or from the tail. This proposal suggests a first a simple change to the logic to average an immediate approximate 2x performance improvement with creation of ordered double link lists. This achievement is for all operations of insertions, deletions, and searches of elements within the link list.  There are increasing difficulty versions that benefit only as the size of the link list significantly increases. The size of the base link list structure increases and additional checks are necessary as to where the start of the search is to begin.   The suggested implementation follow the syscall(system call) layouts, with a split1_ll_xxx, split2_ll_xxx, split3_ll_xxx, where xxx are the operations on the split link lists  and so on.  Only the first two will be mentioned in this TOI, but the logic starts being fairly simple and should end with the well known skip list functionality. The functionality is proposed to first be incorporated into a new header file within the linux kernel with just the 1st order split list support. Later support can be added to support the 2nd order split list and finally a skip list with dynamic / run time support to modify the list conversion from a split linked list to then the skip linked list.

		To keep the terminology simple, split1_ll will be known as just a split link list. The simplest performance change to an ordered doubly link list is to just search half the list. Yes, if it an ordered doubly linked list and the value that we search is fairly large, and we have increasing values from head to tail, with the doubly link list we can search from the tail. However, the distribution of values within the list that we search can be sparse and not equally distributed, thus the value MAY still be closer to the head. Thus we can a level of additional guarantee by increasing the size of the structure and adding additional logic. 

		Thus, for the simplest split linked list is  basically achieved by identifying a middle and then adding support to additionally search from the middle to the head or middle to the tail. This is of course additional to the original searching from the head or from the tail. I term this new ordered double link list as a split doubly link list. There is an implied assumption that almost100% of the elements to be added are not just increasing/decreasing,  else just default and add at the tail or head and walk the list for the very few exceptional unordered items.  This will in general allow us to search 1/2 of the ordered link list and thus a major performance in a guarantee in walking at most just 1/2 of the items. This was originally done by this engineer for a different UNIX OS many years ago.

	Operations are based on first searching for the location for the new element to be added or deleted, thus the approximate maximum list size is important to be known before the set of functions are called. In the past the list was assumed to be anywhere from 64 to 256 elements in size. If one is aware that this list could reach an approximate 4,096 elements in size, then a more complicated version of the split list SHOULD be used to keep major performance improvements. To support this I add additional checks at the 1/4 and 3/4 points to the list and allows the searching to only walk a maximum of 1/4 of the number of elements in the double split list, with 4 sections.

	This split ordered link list could also be implemented with synchronization for each of the separate sections of the split list to minimize contention for the list.  The determination of how to find the middle in the simplest split list is based on the well known skip list algorithm. On every two insertions into the skip list, the middle is adjusted if both insertions fall on 1st or 2nd half of the list. If they are inserted into the first half, then the middle moves one position to the head. The reverse is true for every two deletions from the 1st half. The movement of the middle comparison item follows simple logic.

	If the number of elements are expected or grown beyond say 16k elements , then it makes sense to scale the split list to an actual skip list. The skip list and the split list will stay balanced, without additional overhead and will not decrease in performance as if some periodic order of the elements are inserted into the lists. Trees will need rebalancing, rotations, etc and thus are not being included in this suggestion. For the skeptical, add elements into a tree with the values from 1 to 100. Is the tree balanced without doing extra work? Dynamic upgrades from the split lists into the skip list should be a simple matter and can be configured via a /proc tunable if wanted.  A proper split list or skip list should show 16x, 32x, 64x or more higher performance finding the location to locate/search, add, or delete an element from the currently supported link list.

	Of course, all new functions will co-exist and only engineers who are comfortable with using the new functions will then possibly see performance improvements. Yes, only if these lists are frequently walked (possibly a bottleneck)  will the performance improvements be noticeable. 

	This engineer is willing to submit an initial prototype of the above logic functions to be incorporated within the Linux Kernel if/when there is a request to do so.

	Sincerely, Mitchell Erblich
			Kernel Engineer--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/