phc-discussions - OT Geeks don't write (was RE: [PHC] Geeks like us...)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <003301cf5671$d2beda50$783c8ef0$@acm.org>
Date: Sat, 12 Apr 2014 10:08:27 -0700
From: "Dennis E. Hamilton" <dennis.hamilton@....org>
To: <discussions@...sword-hashing.net>
Subject: OT Geeks don't write (was RE: [PHC] Geeks like us...)

At some point In a discussion like this, Literate Programming will be mentioned.  I figure we might as well get it over with.  (I am top posting because the exchange between Bill Cox and Alexander triggered this but this isn't a specific reply.)

APOLOGEA: This is written in form (1 below), following my line of thinking at the keyboard.  The next rewrite would be to change the presentation, have the TLDR summary up front, use shorter sentences, fewer adjective, and, simply-put, fewer words.

OK: The TLDR.  

    I think a Literate Programming approach might be useful, even if the 
    code is provided separately rather than mechanically extracted from 
    the literate version.  Extraction should remain possible and a diff
    could be used to ensure agreement.

    The key problem I see with literate programming is that it requires 
    literate programmers.  And, because there is a formal mechanism involved,
    there has to be a community of literate-program-fluent readers to get 
    anywhere with such programs.  Despite that, the critical importance 
    of *demonstrated* care in the development of cryptographic software 
    might make literate approaches more appealing and worthwhile.

 - Dennis

Now what I wrote before that summary.

First, I don't think it is that geeks like us can't write, it is that we don't write in the sense that Bill Cox's spouse might have in mind.  Why we do or don't, and when we do or don't, is perhaps part of what draws geeks to code and not writing the Great American Novel.

I think for many of us, it is lack of training and practice.  And there likely is lack of commitment.  

I write better than I sing, dance, or play any musical instrument.  You can gauge for yourselves how likely I am to be inept at those activities if writing is my strong suite.  

I recently decided to take up the guitar and learn about music.  That has been eye-opening, intellectually intriguing, and challenging.  I am daunted by the amount of practice I will require to get beyond clumsy (too kind) novice and become a practiced musician -- not good or great, simply practiced enough to even attempt any level where I'd be comfortable playing with others or in front of an audience.  

The most important requirement I see to becoming a practiced musician is commitment and determination.  The only motivator I have for finally taking up the study of music at the age of 75 is that I want to be able to employ music and music production in some computer projects I want to undertake.  Paul McCartney says it takes about 10 years to become proficient (about right for software development, too).  I think I might need more than that and that introduces a race with my own aging faculties.

But writing. Writing is another story.  I was fortunate that writing became important very early in my career in computing.  I was determined to become better at it.  That was not necessarily reflected in my code.  And vice versa.  

In both code and in technical writing, I am a serious rewriter.  But it is easy to have the incidentals of code organization distort having a clear way to talk about it and separate the essential from the incidental.  

If we think of software specification and documentation as technical writing, I think there are five or more flavors to be found.

 1. Scientific/academic narrative.  The how we got to our wonderful result, a kind of depth first, down-left tree-walk through the adventure.  This can also be thought of as a bottom-up narrative.  That's not an uncommon approach.  It has a kind of academic appeal.  And to see the forest, you have to inspect each tree.  It is easy to obscure the point with inessential material.

 2. Results-first gentility.  This can be thought of as top-down, breadth-first progressive disclosure.  It has the situation and any conclusion stated early, with supportive deepening in subsequent portions.  In business-speak, it means having the management summary up front.  It has a variety of useful qualities, especially when used to inform someone who would not have done it themselves.  A reader can go as deep as they want, and they have an early exit opportunity when they desire no more or determine the material is not of interest.  If the material is well-organized, one can cherry-pick where to go deeper as well.

The second form is, in many ways, reader friendly.  It can also attract ungrounded claims and a failure to connect the dots, but that is pretty evident to a critical eye.  This also happens in type (1) where all of the build-up reaches unsupportable conclusions.  Failure to connect the dots and unsupportable belief in causality are commonplace.

 3. Narrative in or about code that follows the somewhat arbitrary organizational requirements of code and other components that go into producing an operating program.

 4. Technical narrative that reflects literary pretentions and needs a good editing.  An easy example is the Wikipedia article on Literate Programming, <http://en.wikipedia.org/w/index.php?title=Literate_programming&oldid=598927993>, where I defy you to know what this is about without already knowing.  When grounded examples are introduced, it starts to make some sense.

 5. Anything with too many adjectives, grand claims, and a complete failure to connect the dots between what the claimed objective is and how the possibly chosen-first solution genuinely achieves that aim.  False contrasts are common.  Critical judgment is sacrificed to magical thinking.

 6. Etc.

As much as I hate it, knowing your audience definitely matters.  Also, critical thinking is required.  And, for software, critical thinking requires mastery of the exposition of multiple levels of abstraction.  (4, above, strikes me as an example where the levels of abstraction are collapsed in unfortunate ways.  Compare with how Donald Knuth writes about his Literate Programming baby.)  I always write for myself first in order to express my thoughts at all.  Then I have to rewrite it for an audience other than myself.  I often speak to soon (as here, and also here: < http://orcmid.wordpress.com/2014/04/11/guardian-dictator-manager-mentor-or-collaborator-be/>).

Then there are the various ideas about Literate Programming, where a program is talked about and it is also presented, in its entirety, but in chunks that are stitched into the narrative about the program, what it is for, how its purpose is accomplished, and other qualities, including performance analysis, determination of correctness, and, for crypto-geeks, the security analysis and security arguments.  

In a literate program, the code is presented in a manner that supports the exposition, not in the form in which it is used in construction of the program.  Mechanical tools are used to extract the code in proper form for that construction.  A consequence is that maintenance is at the literate program level and not at the submitted-to-the-compiler level.  This raises a number of barriers to the eager take-up of literate programming.  A significant clash has to do with conventional code-and-fix iterative design/debugging processes.

The fundamental problem with literate programming is that it requires literate programmers.  

As much as I subscribe to the ambition of literate programming, I have found myself unable to let loose of the requisite code structure that much.  It frustrates me that I can't envision the final code composition from the literate narrative.  It may simply be that I haven't simply jumped-in and practiced it enough.  

I also see literate programming as a barrier, in practice, to the cultivation of open-source projects.  Open-source projects of any scale have barriers aplenty, but the shared-culture demands of a literate programming approach is perhaps one barrier too far relative to what programmers already know and tolerate as part of getting their heads around an open-source project.

I still haven't given up on literate programming.  But I think, like musicianship, it takes more commitment and practice than I have given it.  I haven't given up on having it be appealing in open-source projects.  I don't see the key.  So I am going to practice and see what can be revealed about making literate programming accessible in open-source work.

Now, I have no idea if anyone has provided literate programming successfully in presentation of a cryptographic-software reference implementation.  I think presenting a highly-optimized solution would be more difficult.  It seems to me that one needs an abstracted solution to appeal to in getting to a customization around intricate target-system details.  Perhaps a way to practice would be to take apart one of the NIST or RSA specifications, extracting a literate exposition that weaves reference code (and test vectors) into it.

 - Dennis



-----Original Message-----
From: Solar Designer [mailto:solar@...nwall.com] 
Sent: Saturday, April 12, 2014 01:15
To: discussions@...sword-hashing.net
Subject: Re: [PHC] Geeks like us can't write...

On Sat, Apr 12, 2014 at 12:28:33AM -0400, Bill Cox wrote:
[ ... ]
> Here's the sad part.  In the PHC, to varying degrees, good code is
> anti-correlated with good writing.  I may not be able to write a good
> paper, but I can tell when I'm reading one.  I can also tell when I'm
> reading inspired code.

While I have no problem accepting the "can't write" criticism, I think
in case of yescrypt the poor quality of the paper is largely caused by
me allocating a disproportionate amount of time to design and code, and
very little to writing the paper.  

[ ... ]

> My wife is a writer.  She married me anyway :-)  I hear her complain
> all the time that her interns or other writers "buried the lead",
> which is when we don't put the most interesting stuff at the top to
> help those of us who are too lazy to read papers carefully.
> 
> Solar Designer, as well as several of the more amazing coders, buried
> the lead.  I want to fill in the "Strengths" section of the wiki for
> Yescrypt, and I thought just taking a quick peek at his paper would
> help me with the bullet points.  Nope!  The Catena paper made whole
> chapter headings out of bullet-point strengths, but Yescrypt buries
> the goodness in the details.

Yeah.  Basically, I focused on meeting the formal requirements for the
submission, which primarily required the specification (in the case of
yescrypt, it's how it differs from scrypt).  What I think I should do,
if/when I do find time, is re-arrange the text such that I start with a
list of bullet points, then proceed with specification, and then proceed
with analysis.  Right now, I have specification intermixed with bits of
analysis, and as you noted there are no bullet points.  (I also need to
add specification of pwxform.)

> Anyway, without a list of bullet points for Yescrypt to copy (like I
> did for Catena) I'm having to do what I suck at: write them myself!

Sorry about that!

> So, what are the most impressive strengths of Yescript, since Solar
> Designer didn't list them at the top of his paper?  I'm thinking:
> 
> - Most full featured and flexible entry in the PHC

Yes, although some other entries have their unique/exotic features that
are not in yescrypt.  I am thinking of Makwa and PolyPassHash.

BTW, I couldn't have included a bullet point like that before seeing a
full set of other entries.

> - Script upwards compatible

"Scrypt".  Yes.

> - Highly tuned, and very fast

Maybe "Highly tunable, and very efficient"?

> - Suitable for a variety of applications and platforms

This overlaps with "full featured and flexible".

I also like to call yescrypt "scalable", in the sense that it provides
adequate security over a wider range of memory cost settings than many
other password hashing schemes do.  In particular, it's stronger than
scrypt at low memory cost settings, and it allows for higher memory cost
settings than bcrypt does.  Similar comparisons may be made against many
PHC entries as well.

> Broad array of tunable defenses (needs it's own list):
> - High external RAM usage, high DRAM bandwidth, high cache bandwidth,
> any one of which can limit an attacker
> - Tested and proven Bcrypt-like GPU defense, even while busting out of
> cache into main memory
> - Compute time hardened with serial multiplication chains
> - Optimised a wide range of CPUs from older AMD to the latest Intel Haswell
> - ROM optimized for improved authentication server performance and defense

Right.

Regarding the "wide range of CPUs", I have to note that (un)fortunately
the current compile-time defaults for pwxform are tuned for fairly
recent CPUs, which means Bulldozer or better, and Sandy Bridge or better.
On e.g. a Pentium 3, these same defaults result in yescrypt losing to
bcrypt in terms of GPU defense.  But the code is scalable down, with the
bcrypt-like anti-GPU defense intact, even for a Pentium 3 - with
different compile-time settings, which I need to make runtime tunable
via the flags.  (Pentium 3 itself might not be relevant these days, but
unfortunately SIMD-less builds are.  This is where bcrypt may win.)

> So, I'm afraid my effort at listing "Strengths" bullets is going to
> fall short.  Several outstanding coders in this competition have a
> similar problem.

I fully agree that it's better when the authors themselves write good
descriptions.

I appreciate your help with this!

Alexander