[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0902072301200.11936@vinegar-pot.mit.edu>
Date: Mon, 9 Feb 2009 20:01:46 -0500 (EST)
From: Tim Abbott <tabbott@....EDU>
To: Rusty Russell <rusty@...tcorp.com.au>
cc: Jeff Arnold <jbarnold@....edu>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org,
Denys Vlasenko <vda.linux@...glemail.com>,
Anders Kaseorg <andersk@....edu>,
Waseem Daher <wdaher@....edu>,
Nikanth Karthikesan <knikanth@...e.de>
Subject: Re: [PATCH 7/7] Ksplice: Support updating x86-32 and x86-64
On Sat, 7 Feb 2009, Rusty Russell wrote:
> The temptation to invent new nomenclature is one we geeks constantly
> wrestle with. But when you have a *genuinely* good idea, it's not necessary.
> Unfortunately, changing names is a PITA, but it's not going to get easier.
>
> ksplice_update is a good name, but the rest of your nomenclature is
> horribly general. So before I become so experienced with it I can't see the
> flaws, let me suggest:
>
> pack -> ksplice_mod_change
> helper module -> old_code
> primary module -> new_code
OK, I agree that these are improvements. We'll make the change for the
next version of the patch series.
> > +unexpected_running_task (P->A, A->R): Ksplice aborted the operation because
> > +Ksplice observed a running task during the kernel stack check, at a time when
> > +Ksplice expected all tasks to be stopped by stop_machine.
>
> I'll have to check the code, but this should now just be a subset of
> "unexpected", no?
The story here is that normally when an unexpected error occurs, we print
something to the Ksplice debugfs file indicating what the actual error
was. unexpected_running_task is special in that it occurs within
stop_machine, where allocating memory with GFP_KERNEL to print to the
debugfs file is not allowed. So, we use a different abort_cause for this
particular unexpected error.
> > +/**
> > + * struct ksplice_symbol - Ksplice's analogue of an ELF symbol
> > + * @name: The ELF name of the symbol
> > + * @label: A unique Ksplice name for the symbol
> > + * @vals: A linked list of possible values for the symbol, or NULL
> > + * @value: The value of the symbol (valid when vals is NULL)
> > + **/
> > +struct ksplice_symbol {
> > + const char *name;
> > + const char *label;
> > +/* private: */
> > + struct list_head *vals;
> > + unsigned long value;
> > +};
>
> I'd have to look at the usage, but might this be neater as a standard
> linked list, eg:
>
> struct ksplice_symval {
> struct list_head list;
> unsigned long val;
> };
>
> struct ksplice_symbol {
> ...
> struct list_head vals;
> /* We always have one, so this saves dynamic alloc in common case? */
> struct ksplice_symval first_val;
> };
>
> (You could link off the first_val.list, but it we don't have iterators for
> such a "headless" list).
ksplice_symbol structures are generated by userspace. Using a pointer to
a list_head rather than a list_head means userspace doesn't need to know
the structure of a list_head.
The story with "vals" and "value" is the following. If we don't yet know
the value of the symbol, vals is a pointer to a list of candidate values
(obtained by lookups in kallsyms and the exported symbol table). Once we
have resolved the symbol using run-pre matching, we store its address in
the value field and set vals to NULL. One checks whether a symbol has
been resolved by checking whether vals is NULL.
The documentation of this struct is a bit terse and the names "vals" and
"value" are confusing; would renaming vals to "candidate_values" and
adding something like the above explanation in a comment improve the
situation?
> > +/**
> > + * struct ksplice_reloc - Ksplice's analogue of an ELF relocation
> > + * @blank_addr: The address of the relocation's storage unit
> > + * @symbol: The ksplice_symbol associated with this relocation
> > + * @howto: The information regarding the relocation type
> > + * @addend: The ELF addend of the relocation
> > + **/
> > +struct ksplice_reloc {
> > + unsigned long blank_addr;
> > + struct ksplice_symbol *symbol;
> > + const struct ksplice_reloc_howto *howto;
>
> howto is an interesting name. I *think* this is what we usually call "ops"?
ksplice_reloc_howto is named after libbfd's reloc_howto_type, which is a
structure containing metadata for how to apply a relocation of that type
(how many bytes long the relocation is, whether it is relative to the
program counter, etc.). ksplice_reloc_howto has essentially the same
purpose and contents as libbfd's reloc_howto_type.
> > +enum ksplice_reloc_howto_type {
> > + KSPLICE_HOWTO_RELOC,
> > + KSPLICE_HOWTO_RELOC_PATCH,
> > + KSPLICE_HOWTO_DATE,
> > + KSPLICE_HOWTO_TIME,
> > + KSPLICE_HOWTO_BUG,
> > + KSPLICE_HOWTO_EXTABLE,
> > +};
>
> OK, there are two problems with really reviewing this code. (1) it does
> everything; it's not in stages.
The actual process is in stages, so I assume you're referring to the fact
that we're submitting it as a single 2500-line patch; I comment on this
below.
> (2) it isn't complete unless you see a ksplice module which interacts
> with it.
Right. For this I recommend creating an update using the Ksplice
"standalone" code available at [1] and a simple patch such as [2], and
actually looking at the modules generated. I recommend the
ksplice-inspect utility included in the Ksplice distribution for seeing
the objects inside the helper and primary modules. This will show you the
data that the update provides to the kernel; the primary and helper
modules themselves just load the data into a ksplice_pack and then call
init/cleanup_ksplice_pack as appropriate.
[1] http://www.ksplice.com/dist/ksplice-0.9.6-src.tar.gz
[2] http://www.ksplice.com/example-update
> Could we start with something really simple and work up? I know it's a
> serious amount of work to split like this. How about a version which
> just patches time and date? Should also be easy to manually create a
> ksplice module to test that as well, no? Then say, substitute a new
> function. Finally work up to what you have now, with handling
> alternatives and everything.
Before I answer your question, I should explain that these are types of
Ksplice relocations, not types of patches. Normal relocations have type
KSPLICE_HOWTO_RELOC; if you want to start simple, just ignore the rest and
look at what happens with that type.
Most of the other Ksplice relocation types are classes of things that
should be handled through the same mechanism as relocations. For example,
KSPLICE_HOWTO_DATE and KSPLICE_HOWTO_TIME are Ksplice relocations that
Ksplice generates in order to run-pre match strings containing __DATE__
and __TIME__. These are challenging for run-pre matching because the date
and time when the pre code was compiled will not be the same as the date
and time when those compilation units in the running kernel were compiled.
Ksplice handles this by generating special Ksplice relocations that
instruct Ksplice to do a date/time format check rather than a byte-by-byte
comparison at a particular location.
Anyway, your question was whether we can strip down Ksplice to just the
"bare essentials" that would be easier to review.
We could save about 200 lines of kernel code by removing support for
altinstructions/parainstructions/smp_locks, KSPLICE_HOWTO_DATE,
KSPLICE_HOWTO_TIME, KSPLICE_HOWTO_BUG and KSPLICE_HOWTO_EXTABLE. One
could perhaps save another 150 lines of code by removing the optimizations
we have in critical pathways, i.e. replacing binary searches and the O(1)
amortized scan of relocations with linear searches, and another 70 or so
if we make it not work on CONFIG_DEBUG_RODATA kernels. I could probably
cut another 100 or so lines of code by removing code that handles some
miscellaneous uncommon situations. The result would be slow, unable to
run-pre match most compilation units, and have safety problems when
patching compilation units containing altinstructions (etc.). So, not
something suitable for merge, but perhaps a bit easier to review.
Is this the sort of thing that you're looking for? I can get you a patch
along those lines fairly quickly.
> Otherwise the important review (design, not spelling and style) is not
> really possible in finite time.
My recommended path for understanding the code is to first read
__apply_patches and the things it calls, which is the code run inside
stop_machine, including the stack check. Then I'd look at finalize_pack,
which is basically the consumer of all the information obtained in run-pre
matching. To understand run-pre matching, I'd start with trying to
understand run_pre_cmp, and then move up the chain of callers from there.
Let us know if there's anything we can do to help with reviewing the code.
We'd be happy to answer any questions about how the code works.
> But trivial comments follow:
>
> > +/* List of all ksplice modules and the module they patch */
> > +extern struct list_head ksplice_module_list;
>
> Just call it ksplice_modules. It's obvious from its usage that it's a list.
Agreed.
> > +int init_ksplice_pack(struct ksplice_pack *pack);
>
> I think would more usually be called "register_ksplice_pack"?
>
> > + * cleanup_ksplice_pack() - Cleans up a pack
>
> unregister... ?
We choose the name cleanup because unlike a typical unregister function,
cleanup_ksplice_pack is currently called twice for each Ksplice update;
once when the helper module is unloaded, and once when the primary module
is unloaded. The extra call is necessary to avoid leaks if you unload the
helper without applying the update.
Perhaps the right answer is to keep the current name and add a comment
documenting this behavior?
> > +typedef int __bitwise__ abort_t;
> > +
> > +#define OK ((__force abort_t) 0)
> > +#define NO_MATCH ((__force abort_t) 1)
> > +#define CODE_BUSY ((__force abort_t) 2)
> > +#define MODULE_BUSY ((__force abort_t) 3)
> > +#define OUT_OF_MEMORY ((__force abort_t) 4)
> > +#define FAILED_TO_FIND ((__force abort_t) 5)
> > +#define ALREADY_REVERSED ((__force abort_t) 6)
> > +#define MISSING_EXPORT ((__force abort_t) 7)
> > +#define UNEXPECTED_RUNNING_TASK ((__force abort_t) 8)
> > +#define UNEXPECTED ((__force abort_t) 9)
> > +#define TARGET_NOT_LOADED ((__force abort_t) 10)
> > +#define CALL_FAILED ((__force abort_t) 11)
>
> Yech... I'm not sold on abort_t; how about standard errno? Yes, you have
> to get a bit creative, say: -ENOENT, -EBUSY, -ETXTBSY, -ENOMEM, -ENOENT,
> -EEXIST, -ENOEXEC, *, *, -ENODEV, (whatever that call returned).
The code actually used to make creative use of errno values as you
suggest. It resulted in frequent confusion about which return value to
use, and mistakes involving the wrong return value were common. After we
switched to abort_t, return value errors essentially stopped happening.
So I think that keeping abort_t might be worthwhile.
We use a typedef so that sparse can check for mistakes involving returning
a number rather than an abort_t; we could probably convert it into an
enum at this point.
> > +static struct update *init_ksplice_update(const char *kid);
> > +static void cleanup_ksplice_update(struct update *update);
> > +static void maybe_cleanup_ksplice_update(struct update *update);
> > +static void add_to_update(struct ksplice_pack *pack, struct update *update);
> > +static int ksplice_sysfs_init(struct update *update);
>
> Please don't talk about what you're going to talk about: order properly
> and don't keep up in suspense :)
>
> [ cut 170 more lines of introduction ]
Heh. We actually sorted the functions in ksplice.c to read well from top
to bottom while preparing this patch series.
I can re-sort our functions to eliminate the prototypes if that would be
preferred. Perhaps there should be a note about use of prototypes in
Documentation/CodingStyle?
> > +#include "ksplice-arch.c"
>
> Hmm, that's a little unclean. include/linux/ksplice_arch.h
> and treat it as a normal .o I'd say.
Are you suggesting that include/linux/ksplice_arch.h would give prototypes
for write_reloc_value and read_reloc_value along with the functions
currently in ksplice_arch.c, and they'd be implemented in a ksplice_arch.c
that would be linked in with the rest of Ksplice?
I believe that this doesn't work because a module under kernel/ can't
depend on objects under arch/ (or vice versa).
I agree that this is a little unclean but don't know how to improve it;
suggestions for how to clean this up are welcome.
> > +int init_ksplice_pack(struct ksplice_pack *pack)
> ...
> > + sort(pack->helper_relocs,
> > + pack->helper_relocs_end - pack->helper_relocs,
> > + sizeof(*pack->helper_relocs), compare_relocs, NULL);
> > + sort(pack->primary_relocs,
> > + pack->primary_relocs_end - pack->primary_relocs,
> > + sizeof(*pack->primary_relocs), compare_relocs, NULL);
> > + sort(pack->helper_sections,
> > + pack->helper_sections_end - pack->helper_sections,
> > + sizeof(*pack->helper_sections), compare_section_labels, NULL);
>
> Can't be sorted in userspace?
helper_sections could be sorted in userspace; the relocs arrays could not.
They are sorted by a value which depends on where the ELF sections of the
helper and primary modules are loaded.
Also, these only need to be sorted because the kernel implementation uses
a binary search; if we were to replace a binary search with a hashing
algorithm, then userspace wouldn't need to sort these arrays. We might
want to keep this implementation detail out of the userspace/kernelspace
API.
By the way, we use binary search a good deal, and I know we're not the
only users of a simple binary search, for example search_extable in
lib/extable.c. Grepping for "binary search" suggests there are a few
dozen of them in the kernel. Should we split out bsearch into a
lib/bsearch.c in the next patch series?
> > +static bool starts_with(const char *str, const char *prefix)
> > +{
> > + return strncmp(str, prefix, strlen(prefix)) == 0;
> > +}
>
> Hmm, I call this strstarts() in CCAN. Turns out it's a fairly common need,
> I'll implement a common version as a separate patch.
strstarts is a good name for this function. I've made the change for the
next version of the patch series.
-Tim Abbott
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists