linux-kernel - Re: XArray documentation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2627399.jpLCoM7KBo@merkaba>
Date:   Fri, 24 Nov 2017 19:01:31 +0100
From:   Martin Steigerwald <martin@...htvoll.de>
To:     Matthew Wilcox <willy@...radead.org>
Cc:     linux-fsdevel@...r.kernel.org, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org,
        Matthew Wilcox <mawilcox@...rosoft.com>
Subject: Re: XArray documentation

Hi Matthew.

Matthew Wilcox - 24.11.17, 18:03:
> On Fri, Nov 24, 2017 at 05:50:41PM +0100, Martin Steigerwald wrote:
> > Matthew Wilcox - 24.11.17, 02:16:
> > > ======
> > > XArray
> > > ======
> > > 
> > > Overview
> > > ========
> > > 
> > > The XArray is an array of ULONG_MAX entries.  Each entry can be either
> > > a pointer, or an encoded value between 0 and LONG_MAX.  It is efficient
> > > when the indices used are densely clustered; hashing the object and
> > > using the hash as the index will not perform well.  A
> > > freshly-initialised
> > > XArray contains a NULL pointer at every index.  There is no difference
> > > between an entry which has never been stored to and an entry which has
> > > most
> > > recently had NULL stored to it.
> > 
> > I am no kernel developer (just provided a tiny bit of documentation a long
> > time ago)… but on reading into this, I missed:
> > 
> > What is it about? And what is it used for?
> > 
> > "Overview" appears to be already a description of the actual
> > implementation
> > specifics, instead of… well an overview.
> > 
> > Of course, I am sure you all know what it is for… but someone who wants to
> > learn about the kernel is likely to be confused by such a start.
[…]
> Thank you for your comment.  I'm clearly too close to it because even
> after reading your useful critique, I'm not sure what to change.  Please
> help me!

And likely I am too far away to and do not understand enough of it to provide 
more concrete suggestions, but let me try. (I do understand some programming 
stuff like what an array is, what a pointer what an linked list or a tree is 
or… so I am not completely novice here. I think the documentation should not 
cover any of these basics.)

> Maybe it's that I've described the abstraction as if it's the
> implementation and put too much detail into the overview.  This might
> be clearer?
> 
> The XArray is an abstract data type which behaves like an infinitely
> large array of pointers.  The index into the array is an unsigned long.
> A freshly-initialised XArray contains a NULL pointer at every index.

Yes, I think this is clearer already.

Maybe with a few sentences on "Why does the kernel provide this?", "Where is 
it used?" (if already known), "What use case is it suitable for – if I want to 
implement something into the kernel (or in user space?) ?" and probably "How 
does it differ from user data structures the kernel provides?"

I don´t know whether the questions make sense to you. But that were questions 
I had in mind as I read into your documentation. I do not think this needs to 
be long or so… maybe just a few sentences that put XArray into a context, 
before diving into the details. I think that could help new developers who 
want to learn about kernel development when they learn about XArray.

And then as you suggest all the important implementation details.

> ----
> and then move all this information into later paragraphs:
> 
> There is no difference between an entry which has never been stored to
> and an entry which has most recently had NULL stored to it.
> Each entry in the array can be either a pointer, or an
> encoded value between 0 and LONG_MAX.
> While you can use any index, the implementation is efficient when the
> indices used are densely clustered; hashing the object and using the
> hash as the index will not perform well.

Yes.

And I notice now that you have some use case remarks in here… like "efficient 
when densely clustered". I missed these initially.

Thanks,
-- 
Martin