linux-kernel - Re: [PATCH 7/7] Ksplice: Support updating x86-32 and x86-64

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.0902072301200.11936@vinegar-pot.mit.edu>
Date:	Mon, 9 Feb 2009 20:01:46 -0500 (EST)
From:	Tim Abbott <tabbott@....EDU>
To:	Rusty Russell <rusty@...tcorp.com.au>
cc:	Jeff Arnold <jbarnold@....edu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Denys Vlasenko <vda.linux@...glemail.com>,
	Anders Kaseorg <andersk@....edu>,
	Waseem Daher <wdaher@....edu>,
	Nikanth Karthikesan <knikanth@...e.de>
Subject: Re: [PATCH 7/7] Ksplice: Support updating x86-32 and x86-64

On Sat, 7 Feb 2009, Rusty Russell wrote:

>    The temptation to invent new nomenclature is one we geeks constantly
> wrestle with.  But when you have a *genuinely* good idea, it's not necessary.
> Unfortunately, changing names is a PITA, but it's not going to get easier.
> 
>    ksplice_update is a good name, but the rest of your nomenclature is
> horribly general.  So before I become so experienced with it I can't see the
> flaws, let me suggest:
> 
> 	pack -> ksplice_mod_change
> 	helper module -> old_code
> 	primary module -> new_code

OK, I agree that these are improvements.  We'll make the change for the 
next version of the patch series.

> > +unexpected_running_task (P->A, A->R): Ksplice aborted the operation because
> > +Ksplice observed a running task during the kernel stack check, at a time when
> > +Ksplice expected all tasks to be stopped by stop_machine.
> 
> I'll have to check the code, but this should now just be a subset of
> "unexpected", no?

The story here is that normally when an unexpected error occurs, we print 
something to the Ksplice debugfs file indicating what the actual error 
was.  unexpected_running_task is special in that it occurs within 
stop_machine, where allocating memory with GFP_KERNEL to print to the 
debugfs file is not allowed.  So, we use a different abort_cause for this 
particular unexpected error.

> > +/**
> > + * struct ksplice_symbol - Ksplice's analogue of an ELF symbol
> > + * @name:	The ELF name of the symbol
> > + * @label:	A unique Ksplice name for the symbol
> > + * @vals:	A linked list of possible values for the symbol, or NULL
> > + * @value:	The value of the symbol (valid when vals is NULL)
> > + **/
> > +struct ksplice_symbol {
> > +	const char *name;
> > +	const char *label;
> > +/* private: */
> > +	struct list_head *vals;
> > +	unsigned long value;
> > +};
> 
> I'd have to look at the usage, but might this be neater as a standard
> linked list, eg:
> 
> struct ksplice_symval {
> 	struct list_head list;
> 	unsigned long val;
> };
> 
> struct ksplice_symbol {
> 	...
> 	struct list_head vals;
> 	/* We always have one, so this saves dynamic alloc in common case? */
> 	struct ksplice_symval first_val;
> };
> 
> (You could link off the first_val.list, but it we don't have iterators for
> such a "headless" list).

ksplice_symbol structures are generated by userspace.  Using a pointer to 
a list_head rather than a list_head means userspace doesn't need to know 
the structure of a list_head.

The story with "vals" and "value" is the following.  If we don't yet know 
the value of the symbol, vals is a pointer to a list of candidate values 
(obtained by lookups in kallsyms and the exported symbol table).  Once we 
have resolved the symbol using run-pre matching, we store its address in 
the value field and set vals to NULL.  One checks whether a symbol has 
been resolved by checking whether vals is NULL.

The documentation of this struct is a bit terse and the names "vals" and 
"value" are confusing; would renaming vals to "candidate_values" and 
adding something like the above explanation in a comment improve the 
situation?

> > +/**
> > + * struct ksplice_reloc - Ksplice's analogue of an ELF relocation
> > + * @blank_addr:		The address of the relocation's storage unit
> > + * @symbol:		The ksplice_symbol associated with this relocation
> > + * @howto:		The information regarding the relocation type
> > + * @addend:		The ELF addend of the relocation
> > + **/
> > +struct ksplice_reloc {
> > +	unsigned long blank_addr;
> > +	struct ksplice_symbol *symbol;
> > +	const struct ksplice_reloc_howto *howto;
> 
> howto is an interesting name.  I *think* this is what we usually call "ops"?

ksplice_reloc_howto is named after libbfd's reloc_howto_type, which is a 
structure containing metadata for how to apply a relocation of that type 
(how many bytes long the relocation is, whether it is relative to the 
program counter, etc.).  ksplice_reloc_howto has essentially the same 
purpose and contents as libbfd's reloc_howto_type.  

> > +enum ksplice_reloc_howto_type {
> > +	KSPLICE_HOWTO_RELOC,
> > +	KSPLICE_HOWTO_RELOC_PATCH,
> > +	KSPLICE_HOWTO_DATE,
> > +	KSPLICE_HOWTO_TIME,
> > +	KSPLICE_HOWTO_BUG,
> > +	KSPLICE_HOWTO_EXTABLE,
> > +};
> 
> OK, there are two problems with really reviewing this code.  (1) it does
> everything; it's not in stages.  

The actual process is in stages, so I assume you're referring to the fact 
that we're submitting it as a single 2500-line patch; I comment on this 
below.

> (2) it isn't complete unless you see a ksplice module which interacts 
> with it.

Right.  For this I recommend creating an update using the Ksplice 
"standalone" code available at [1] and a simple patch such as [2], and 
actually looking at the modules generated.  I recommend the 
ksplice-inspect utility included in the Ksplice distribution for seeing 
the objects inside the helper and primary modules.  This will show you the 
data that the update provides to the kernel; the primary and helper 
modules themselves just load the data into a ksplice_pack and then call 
init/cleanup_ksplice_pack as appropriate.

[1] http://www.ksplice.com/dist/ksplice-0.9.6-src.tar.gz 
[2] http://www.ksplice.com/example-update

> Could we start with something really simple and work up?  I know it's a 
> serious amount of work to split like this.  How about a version which 
> just patches time and date?  Should also be easy to manually create a 
> ksplice module to test that as well, no?  Then say, substitute a new 
> function. Finally work up to what you have now, with handling 
> alternatives and everything.

Before I answer your question, I should explain that these are types of 
Ksplice relocations, not types of patches.  Normal relocations have type 
KSPLICE_HOWTO_RELOC; if you want to start simple, just ignore the rest and 
look at what happens with that type.

Most of the other Ksplice relocation types are classes of things that 
should be handled through the same mechanism as relocations.  For example, 
KSPLICE_HOWTO_DATE and KSPLICE_HOWTO_TIME are Ksplice relocations that 
Ksplice generates in order to run-pre match strings containing __DATE__ 
and __TIME__.  These are challenging for run-pre matching because the date 
and time when the pre code was compiled will not be the same as the date 
and time when those compilation units in the running kernel were compiled.  
Ksplice handles this by generating special Ksplice relocations that 
instruct Ksplice to do a date/time format check rather than a byte-by-byte 
comparison at a particular location.


Anyway, your question was whether we can strip down Ksplice to just the 
"bare essentials" that would be easier to review.

We could save about 200 lines of kernel code by removing support for 
altinstructions/parainstructions/smp_locks, KSPLICE_HOWTO_DATE, 
KSPLICE_HOWTO_TIME, KSPLICE_HOWTO_BUG and KSPLICE_HOWTO_EXTABLE.  One 
could perhaps save another 150 lines of code by removing the optimizations 
we have in critical pathways, i.e. replacing binary searches and the O(1) 
amortized scan of relocations with linear searches, and another 70 or so 
if we make it not work on CONFIG_DEBUG_RODATA kernels.  I could probably 
cut another 100 or so lines of code by removing code that handles some 
miscellaneous uncommon situations.  The result would be slow, unable to 
run-pre match most compilation units, and have safety problems when 
patching compilation units containing altinstructions (etc.).  So, not 
something suitable for merge, but perhaps a bit easier to review.

Is this the sort of thing that you're looking for?  I can get you a patch 
along those lines fairly quickly.

> Otherwise the important review (design, not spelling and style) is not
> really possible in finite time.

My recommended path for understanding the code is to first read 
__apply_patches and the things it calls, which is the code run inside 
stop_machine, including the stack check.  Then I'd look at finalize_pack, 
which is basically the consumer of all the information obtained in run-pre 
matching.  To understand run-pre matching, I'd start with trying to 
understand run_pre_cmp, and then move up the chain of callers from there.

Let us know if there's anything we can do to help with reviewing the code.  
We'd be happy to answer any questions about how the code works.

> But trivial comments follow:
> 
> > +/* List of all ksplice modules and the module they patch */
> > +extern struct list_head ksplice_module_list;
> 
> Just call it ksplice_modules.  It's obvious from its usage that it's a list.

Agreed.

> > +int init_ksplice_pack(struct ksplice_pack *pack);
> 
> I think would more usually be called "register_ksplice_pack"?
>
> > + * cleanup_ksplice_pack() - Cleans up a pack
> 
> unregister... ?

We choose the name cleanup because unlike a typical unregister function, 
cleanup_ksplice_pack is currently called twice for each Ksplice update; 
once when the helper module is unloaded, and once when the primary module 
is unloaded.  The extra call is necessary to avoid leaks if you unload the 
helper without applying the update.  

Perhaps the right answer is to keep the current name and add a comment 
documenting this behavior?

> > +typedef int __bitwise__ abort_t;
> > +
> > +#define OK ((__force abort_t) 0)
> > +#define NO_MATCH ((__force abort_t) 1)
> > +#define CODE_BUSY ((__force abort_t) 2)
> > +#define MODULE_BUSY ((__force abort_t) 3)
> > +#define OUT_OF_MEMORY ((__force abort_t) 4)
> > +#define FAILED_TO_FIND ((__force abort_t) 5)
> > +#define ALREADY_REVERSED ((__force abort_t) 6)
> > +#define MISSING_EXPORT ((__force abort_t) 7)
> > +#define UNEXPECTED_RUNNING_TASK ((__force abort_t) 8)
> > +#define UNEXPECTED ((__force abort_t) 9)
> > +#define TARGET_NOT_LOADED ((__force abort_t) 10)
> > +#define CALL_FAILED ((__force abort_t) 11)
> 
> Yech... I'm not sold on abort_t; how about standard errno?  Yes, you have
> to get a bit creative, say: -ENOENT, -EBUSY, -ETXTBSY, -ENOMEM, -ENOENT,
> -EEXIST, -ENOEXEC, *, *, -ENODEV, (whatever that call returned).

The code actually used to make creative use of errno values as you 
suggest.  It resulted in frequent confusion about which return value to 
use, and mistakes involving the wrong return value were common.  After we 
switched to abort_t, return value errors essentially stopped happening.  
So I think that keeping abort_t might be worthwhile.

We use a typedef so that sparse can check for mistakes involving returning 
a number rather than an abort_t; we could probably convert it into an 
enum at this point.

> > +static struct update *init_ksplice_update(const char *kid);
> > +static void cleanup_ksplice_update(struct update *update);
> > +static void maybe_cleanup_ksplice_update(struct update *update);
> > +static void add_to_update(struct ksplice_pack *pack, struct update *update);
> > +static int ksplice_sysfs_init(struct update *update);
> 
> Please don't talk about what you're going to talk about: order properly
> and don't keep up in suspense :)
> 
> [ cut 170 more lines of introduction ]

Heh.  We actually sorted the functions in ksplice.c to read well from top 
to bottom while preparing this patch series.

I can re-sort our functions to eliminate the prototypes if that would be 
preferred.  Perhaps there should be a note about use of prototypes in 
Documentation/CodingStyle?

> > +#include "ksplice-arch.c"
> 
> Hmm, that's a little unclean.  include/linux/ksplice_arch.h
> and treat it as a normal .o I'd say.

Are you suggesting that include/linux/ksplice_arch.h would give prototypes 
for write_reloc_value and read_reloc_value along with the functions 
currently in ksplice_arch.c, and they'd be implemented in a ksplice_arch.c 
that would be linked in with the rest of Ksplice?

I believe that this doesn't work because a module under kernel/ can't 
depend on objects under arch/ (or vice versa).

I agree that this is a little unclean but don't know how to improve it; 
suggestions for how to clean this up are welcome.

> > +int init_ksplice_pack(struct ksplice_pack *pack)
> ...
> > +	sort(pack->helper_relocs,
> > +	     pack->helper_relocs_end - pack->helper_relocs,
> > +	     sizeof(*pack->helper_relocs), compare_relocs, NULL);
> > +	sort(pack->primary_relocs,
> > +	     pack->primary_relocs_end - pack->primary_relocs,
> > +	     sizeof(*pack->primary_relocs), compare_relocs, NULL);
> > +	sort(pack->helper_sections,
> > +	     pack->helper_sections_end - pack->helper_sections,
> > +	     sizeof(*pack->helper_sections), compare_section_labels, NULL);
> 
> Can't be sorted in userspace?

helper_sections could be sorted in userspace; the relocs arrays could not.  
They are sorted by a value which depends on where the ELF sections of the 
helper and primary modules are loaded.

Also, these only need to be sorted because the kernel implementation uses 
a binary search; if we were to replace a binary search with a hashing 
algorithm, then userspace wouldn't need to sort these arrays.  We might 
want to keep this implementation detail out of the userspace/kernelspace 
API.

By the way, we use binary search a good deal, and I know we're not the 
only users of a simple binary search, for example search_extable in 
lib/extable.c.  Grepping for "binary search" suggests there are a few 
dozen of them in the kernel.  Should we split out bsearch into a 
lib/bsearch.c in the next patch series?

> > +static bool starts_with(const char *str, const char *prefix)
> > +{
> > +	return strncmp(str, prefix, strlen(prefix)) == 0;
> > +}
> 
> Hmm, I call this strstarts() in CCAN.  Turns out it's a fairly common need,
> I'll implement a common version as a separate patch.

strstarts is a good name for this function.  I've made the change for the 
next version of the patch series.

	-Tim Abbott

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/