linux-kernel - Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87k1dbrziw.fsf@intel.com>
Date:   Mon, 24 Jun 2019 14:30:47 +0300
From:   Jani Nikula <jani.nikula@...ux.intel.com>
To:     Jonathan Corbet <corbet@....net>, linux-doc@...r.kernel.org
Cc:     Matthew Wilcox <willy@...radead.org>,
        Mauro Carvalho Chehab <mchehab+samsung@...nel.org>,
        linux-kernel@...r.kernel.org, Jonathan Corbet <corbet@....net>
Subject: Re: [PATCH 1/3] Docs: An initial automarkup extension for sphinx

On Fri, 21 Jun 2019, Jonathan Corbet <corbet@....net> wrote:
> Rather than fill our text files with :c:func:`function()` syntax, just do
> the markup via a hook into the sphinx build process.
>
> Signed-off-by: Jonathan Corbet <corbet@....net>
> ---
>  Documentation/conf.py              |  3 +-
>  Documentation/sphinx/automarkup.py | 80 ++++++++++++++++++++++++++++++
>  2 files changed, 82 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/sphinx/automarkup.py
>
> diff --git a/Documentation/conf.py b/Documentation/conf.py
> index 7ace3f8852bd..a502baecbb85 100644
> --- a/Documentation/conf.py
> +++ b/Documentation/conf.py
> @@ -34,7 +34,8 @@ needs_sphinx = '1.3'
>  # Add any Sphinx extension module names here, as strings. They can be
>  # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
>  # ones.
> -extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain', 'kfigure', 'sphinx.ext.ifconfig']
> +extensions = ['kerneldoc', 'rstFlatTable', 'kernel_include', 'cdomain',
> +              'kfigure', 'sphinx.ext.ifconfig', 'automarkup']
>  
>  # The name of the math extension changed on Sphinx 1.4
>  if (major == 1 and minor > 3) or (major > 1):
> diff --git a/Documentation/sphinx/automarkup.py b/Documentation/sphinx/automarkup.py
> new file mode 100644
> index 000000000000..14b09b5d145e
> --- /dev/null
> +++ b/Documentation/sphinx/automarkup.py
> @@ -0,0 +1,80 @@
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright 2019 Jonathan Corbet <corbet@....net>
> +#
> +# Apply kernel-specific tweaks after the initial document processing
> +# has been done.
> +#
> +from docutils import nodes
> +from sphinx import addnodes
> +import re
> +
> +#
> +# Regex nastiness.  Of course.
> +# Try to identify "function()" that's not already marked up some
> +# other way.  Sphinx doesn't like a lot of stuff right after a
> +# :c:func: block (i.e. ":c:func:`mmap()`s" flakes out), so the last
> +# bit tries to restrict matches to things that won't create trouble.
> +#
> +RE_function = re.compile(r'([\w_][\w\d_]+\(\))')
> +
> +#
> +# The DVB docs create references for these basic system calls, leading
> +# to lots of confusing links.  So just don't link them.
> +#
> +Skipfuncs = [ 'open', 'close', 'write' ]
> +
> +#
> +# Find all occurrences of function() and try to replace them with
> +# appropriate cross references.
> +#
> +def markup_funcs(docname, app, node):
> +    cdom = app.env.domains['c']
> +    t = node.astext()
> +    done = 0
> +    repl = [ ]
> +    for m in RE_function.finditer(t):
> +        #
> +        # Include any text prior to function() as a normal text node.
> +        #
> +        if m.start() > done:
> +            repl.append(nodes.Text(t[done:m.start()]))
> +        #
> +        # Go through the dance of getting an xref out of the C domain
> +        #
> +        target = m.group(1)[:-2]
> +        target_text = nodes.Text(target + '()')
> +        xref = None
> +        if target not in Skipfuncs:
> +            lit_text = nodes.literal(classes=['xref', 'c', 'c-func'])
> +            lit_text += target_text
> +            pxref = addnodes.pending_xref('', refdomain = 'c',
> +                                          reftype = 'function',
> +                                          reftarget = target, modname = None,
> +                                          classname = None)
> +            xref = cdom.resolve_xref(app.env, docname, app.builder,
> +                                     'function', target, pxref, lit_text)
> +        #
> +        # Toss the xref into the list if we got it; otherwise just put
> +        # the function text.
> +        #
> +        if xref:
> +            repl.append(xref)
> +        else:
> +            repl.append(target_text)
> +        done = m.end()
> +    if done < len(t):
> +        repl.append(nodes.Text(t[done:]))
> +    return repl
> +
> +def auto_markup(app, doctree, name):
> +    for para in doctree.traverse(nodes.paragraph):
> +        for node in para.traverse(nodes.Text):
> +            if not isinstance(node.parent, nodes.literal):
> +                node.parent.replace(node, markup_funcs(name, app, node))

I think overall this is a better approach than preprocessing. Thanks for
doing this!

I toyed with something like this before, and the key difference here
seems to be ignoring literal blocks. The problem seemed to be that
replacing blocks with syntax highlighting also removed the syntax
highlighting, with no way that I could find to bring it back.

Obviously it would be great to be able to add the cross references in
more places than just bulk text nodes, but this is a good start.

BR,
Jani.


> +
> +def setup(app):
> +    app.connect('doctree-resolved', auto_markup)
> +    return {
> +        'parallel_read_safe': True,
> +        'parallel_write_safe': True,
> +        }

-- 
Jani Nikula, Intel Open Source Graphics Center