lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrXn+zwyduvTXR8N0e0-ZgO9=F-BSTPfFmcJYJHJgV2+xQ@mail.gmail.com>
Date:	Fri, 19 Feb 2016 08:40:49 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Luis R. Rodriguez" <mcgrof@...nel.org>
Cc:	"H. Peter Anvin" <hpa@...or.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
	X86 ML <x86@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Boris Ostrovsky <boris.ostrovsky@...cle.com>,
	Rusty Russell <rusty@...tcorp.com.au>,
	David Vrabel <david.vrabel@...rix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
	Michael Brown <mcb30@...e.org>,
	Juergen Gross <jgross@...e.com>,
	Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
	Xen Devel <xen-devel@...ts.xensource.com>
Subject: Re: [RFC v2 2/6] x86/init: use linker tables to simplify x86 init and
 annotate dependencies

On Fri, Feb 19, 2016 at 6:15 AM, Luis R. Rodriguez <mcgrof@...nel.org> wrote:
> Any failure on the x86 init path can be catastrophic.
> A simple shift of a call from one place to another can
> easily break things. Likewise adding a new call to
> one path without considering all x86 requirements
> can make certain x86 run time environments crash.
> We currently account for these requirements through
> peer code review and run time testing. We could do
> much better if we had a clean and simple way to annotate
> strong semantics for run time requirements, init sequence
> dependencies, and detection mechanisms for additions of
> new x86 init sequences.

Please document this in a way that will be useful for people trying to
understand what the code does as opposed to just people who are trying
to understand why you wrote it.  More below.



> +/**
> + * struct x86_init_fn - x86 generic kernel init call
> + *
> + * Linux x86 features vary in complexity, features may require work done at
> + * different levels of the full x86 init sequence. Today there are also two
> + * different possible entry points for Linux on x86, one for bare metal, KVM
> + * and Xen HVM, and another for Xen PV guests / dom0.  Assuming a bootloader
> + * has set up 64-bit mode, roughly the x86 init sequence follows this path:
> + *
> + * Bare metal, KVM, Xen HVM                      Xen PV / dom0
> + *       startup_64()                             startup_xen()
> + *              \                                     /
> + *      x86_64_start_kernel()                 xen_start_kernel()
> + *                           \               /
> + *                      x86_64_start_reservations()
> + *                                   |
> + *                              start_kernel()
> + *                              [   ...        ]
> + *                              [ setup_arch() ]
> + *                              [   ...        ]
> + *                                  init
> + *


I don't think this paragraph below is necessary.  I also think it's
very confusing.  Keep in mind that the reader has no idea what a
"level" is at this point and that the reader also doesn't need to
think about terms like "paravirtualization yielding".

> + * x86_64_start_kernel() and xen_start_kernel() are the respective first C code
> + * entry starting points. The different entry points exist to enable Xen to
> + * skip a lot of hardware setup already done and managed on behalf of the
> + * hypervisor, we refer to this as "paravirtualization yielding". The different
> + * levels of init calls on the x86 init sequence exist to account for these
> + * slight differences and requirements. These different entry points also share
> + * a common entry x86 specific path, x86_64_start_reservations().
> + *

And here, I don't even know what a "feature" is.

> + * A generic x86 feature can have different initialization calls, one on each
> + * of the different main x86 init sequences, but must also address both entry
> + * points in order to work properly across the board on all supported x86
> + * subarchitectures. Since x86 features can also have dependencies on other
> + * setup code or features, x86 features can at times be subordinate to other
> + * x86 features, or conditions. struct x86_init_fn enables feature developers
> + * to annotate dependency relationships to ensure subsequent init calls only
> + * run once a subordinate's dependencies have run. When needed custom
> + * dependency requirements can also be spelled out through a custom dependency
> + * checker. In order to account for the dual entry point nature of x86-64 Linux
> + * for "paravirtualization yielding" and to make annotations for support for
> + * these explicit each struct x86_init_fn must specify supported
> + * subarchitectures. The earliest x86-64 code can read the subarchitecture
> + * though is after load_idt(), as such the earliest we can currently rely on
> + * subarchitecture for semantics and a common init sequences is on the shared
> + * common x86_64_start_reservations().  Each struct x86_init_fn must also
> + * declare a two-digit decimal number to impose an ordering relative to other
> + * features when required.

I'm totally lost in the paragraph above.


> + *
> + * x86_init_fn enables strong semantics and dependencies to be defined and
> + * implemented on the full x86 initialization sequence.

Please try explaining what this is, instead.  For example:

An x86_init_fn represents a function to be called at a certain point
during initialization on a specific set of subarchitectures.

> + *
> + * @order_level: must be set, linker order level, this corresponds to the table
> + *     section sub-table index, we record this only for semantic validation
> + *     purposes.

I read this as "this is purely a debugging feature".  Is it?  I think it's not.

>  Order-level is always required however you typically would
> + *     only use X86_INIT_NORMAL*() and leave ordering to be done by placement
> + *     of code in a C file and the order of objects through a Makefile. Custom
> + *     order-levels can be used when order on C file and order of objects on
> + *     Makfiles does not suffice or much further refinements are needed.


Assuming I understand this correctly, here's how I'd write it:

@order_level: The temporal order of this x86_init_fn.  Lower
order_level numbers are called first.  Ties are broken by order found
in the Makefile and then by order in the C file.

Note, however, that my proposed explanation can't be right because it
appears to conflict with "depend".  Please adjust accordingly.

> + * @supp_hardware_subarch: must be set, it represents the bitmask of supported
> + *     subarchitectures.  We require each struct x86_init_fn to have this set
> + *     to require developer considerations for each supported x86
> + *     subarchitecture and to build strong annotations of different possible
> + *     run time states particularly in consideration for the two main
> + *     different entry points for x86 Linux, to account for paravirtualization
> + *     yielding.
'
Too much motivation, too little documentation.

@supp_hardware_subarch: A bitmask of subarchitectures on which to call
this init function.

--- start big deletion ---

> + *
> + *     The subarchitecture is read by the kernel at early boot from the
> + *     struct boot_params hardware_subarch. Support for the subarchitecture
> + *     exists as of x86 boot protocol 2.07. The bootloader would have set up
> + *     the respective hardware_subarch on the boot sector as per
> + *     Documentation/x86/boot.txt.
> + *
> + *     What x86 entry point is used is determined at run time by the
> + *     bootloader. Linux pv_ops was designed to help enable to build one Linux
> + *     binary to support bare metal and different hypervisors.  pv_ops setup
> + *     code however is limited in that all pv_ops setup code is run late in
> + *     the x86 init sequence, during setup_arch(). In fact cpu_has_hypervisor
> + *     only works after early_cpu_init() during setup_arch(). If an x86
> + *     feature requires an earlier determination of what hypervisor was used,
> + *     or if it needs to annotate only support for certain hypervisors, the
> + *     x86 hardware_subarch should be set by the bootloader and
> + *     @supp_hardware_subarch set by the x86 feature. Using hardware_subarch
> + *     enables x86 features to fill the semantic gap between the Linux x86
> + *     entry point used and what pv_ops has to offer through a hypervisor
> + *     agnostic mechanism.

--- end big deletion ---

> + *
> + *     Each supported subarchitecture is set using the respective
> + *     X86_SUBARCH_* as a bit in the bitmask. For instance if a feature
> + *     is supported on PC and Xen subarchitectures only you would set this
> + *     bitmask to:
> + *
> + *             BIT(X86_SUBARCH_PC) |
> + *             BIT(X86_SUBARCH_XEN);

I like this part, but how about "For instance, if an init function
should be called on PC and Xen subarchitectures only, you would set
the bitmask to..."?

> + *
> + * @detect: optional, if set returns true if the feature has been detected to
> + *     be required, it returns false if the feature has been detected to not
> + *     be required.

I have absolutely no idea what this means.

> + * @depend: optional, if set this set of init routines must be called prior to
> + *     the init routine who's respective detect routine we have set this
> + *     depends callback to. This is only used for sorting purposes given
> + *     all current init callbacks have a void return type. Sorting is
> + *     implemented via x86_init_fn_sort(), it must be called only once,
> + *     however you can delay sorting until you need it if you can ensure
> + *     only @order_level and @supp_hardware_subarch can account for proper
> + *     ordering and dependency requirements for all init sequences prior.
> + *     If you do not have a depend callback set its assumed the order level
> + *     (__x86_init_fn(level)) set by the init routine suffices to set the
> + *     order for when the feature's respective callbacks are called with
> + *     respect to other calls. Sorting of init calls with the same order level
> + *     is determined by linker order, determined by order placement on C code
> + *     and order listed on a Makefile. A routine that depends on another is
> + *     known as being subordinate to the init routine it depends on. Routines
> + *     that are subordinate must have an order-level of lower priority or
> + *     equal priority than the order-level of the init sequence it depends on.

I don't understand this at all.  I assume you're saying that some kind
of topological sorting happens.  This leads to a question: why is
"depend" a function pointer?  Shouldn't it be x86_init_fn*?  Also,
what happens if you depend on something that is disabled on the
running subarch?  And what happens if the depend-implied order is
inconsistent with order_level.

I would hope that order_level breaks ties in the topological sort and
that there's a bit fat warning if the order_level ordering is
inconsistent with the topological ordering.  Preferably the warning
would be at build time, in which case it could be an error.

> + * @early_init: required, routine which will run in x86_64_start_reservations()
> + *     after we ensure boot_params.hdr.hardware_subarch is accessible and
> + *     properly set. Memory is not yet available. This the earliest we can
> + *     currently define a common shared callback since all callbacks need to
> + *     check for boot_params.hdr.hardware_subarch and this becomes accessible
> + *     on x86-64 until after load_idt().

What's this for?  Under what conditions is it called?  What order?
Why is it part of x86_init_fn at all?

> + * @flags: optional, bitmask of enum x86_init_fn_flags

What are these flags?  What do they do?

> + */
> +struct x86_init_fn {
> +       __u32 order_level;
> +       __u32 supp_hardware_subarch;
> +       bool (*detect)(void);
> +       bool (*depend)(void);
> +       void (*early_init)(void);
> +       __u32 flags;
> +};
> +
> +/**
> + * enum x86_init_fn_flags: flags for init sequences
> + *
> + * X86_INIT_FINISH_IF_DETECTED: tells the core that once this init sequence
> + *     has completed it can break out of the loop for init sequences on
> + *     its own level.

What does this mean?

> + * X86_INIT_DETECTED: private flag. Used by the x86 core to annotate that this
> + *     init sequence has been detected and it all of its callbacks
> + *     must be run during initialization.

Please make this an entry in a new field scratch_space or similar and
just document the entire field as "private to the init core code".

> +static struct x86_init_fn *x86_init_fn_find_dep(struct x86_init_fn *start,
> +                                               struct x86_init_fn *finish,
> +                                               struct x86_init_fn *q)
> +{
> +       struct x86_init_fn *p;
> +
> +       if (!q)
> +               return NULL;
> +
> +       for (p = start; p < finish; p++)
> +               if (p->detect == q->depend)
> +                       return p;

That's very strange indeed, and it doesn't seem consistent with my
explanation.  Please fix up the docs to explain what's going on.
Again, as a reviewer and eventual user of this code, I do not need to
know what historical problem it solves.  I need to know what the code
does and how to use it.

> +
> +void __ref x86_init_fn_init_tables(void)

Why is this __ref and not __init?

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ