lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151213211842.GR20997@ZenIV.linux.org.uk>
Date:	Sun, 13 Dec 2015 21:18:42 +0000
From:	Al Viro <viro@...IV.linux.org.uk>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	linux-kernel@...r.kernel.org
Subject: [RFC] get_free_pages/free_pages signature change

	All the way back to v0.01 the page allocator had been representing
addresses as unsigned long.  The interface went through several iterations;
the current primitives are __get_free_pages() and free_pages(), with
__get_free_page() and get_zeroed_page() as convenience wrappers for the
former and free_page() - for the latter.  Allocating functions are
returning unsigned long; freeing ones take unsigned long as an argument.

	Unsurprisingly, there are both the call sites that treat an address
as integer and ones that treat it as a pointer.  Whether we use unsigned long
or void *, one of those classes will need to use explicit typecasts.

	The thing is, "pointer" call sites heavily outnumber the "integer"
ones.  The places where we pass to e.g. free_pages() (unsigned long)pointer
account for 80% of all call sites.  For other functions in that set the
percentages vary, but they all are well above 50%.  Moreover, quite a bit of
the rest comes from the places like
	unsigned long buf = get_zeroed_page(GFP_KERNEL);
	...
	code using buf only in form of (char *)buf 
	...
	free_page(buf);
If we count such places as "pointer" ones (they would also be better off
with pointer-returning variant, but the change would have to be slightly
more extensive), the fraction of "pointer" call sites for e.g. free_pages()
goes at least up to 90%.

	In other words, switching to void * for return values of allocating
and argument of freeing functions would reduce the amount of boilerplate
quite nicely.  What's more, fewer casts means better chance for typechecking
to catch more bugs.  Example caught while looking into free_pages() situation:
drivers/media/platform/davinci/dm644x_ccdc.c:ccdc_update_raw_params() has
        fpc_physaddr = (unsigned int *)config_params->fault_pxl.fpc_table_addr;
        fpc_virtaddr = (unsigned int *)phys_to_virt(
                                (unsigned long)fpc_physaddr);   
        /*
         * Allocate memory for FPC table if current
         * FPC table buffer is not big enough to
         * accommodate FPC Number requested
         */
        if (raw_params->fault_pxl.fp_num != config_params->fault_pxl.fp_num) {
                if (fpc_physaddr != NULL) {
                        free_pages((unsigned long)fpc_physaddr,
which is obviously wrong.  The values expected by phys_to_virt() and
by free_pages() are not the same.  In the same file, ccdc_close() does
        fpc_physaddr = (unsigned int *)config_params->fault_pxl.fpc_table_addr;

        if (fpc_physaddr != NULL) {
                fpc_virtaddr = (unsigned int *)
                    phys_to_virt((unsigned long)fpc_physaddr);
                free_pages((unsigned long)fpc_virtaddr,
                           get_order(config_params->fault_pxl.fp_num *
                           FP_NUM_BYTES));
which makes a lot more sense - the virtual address is passed to free_pages(),
not the physical one.  And no, on arm/davinci phys_to_virt() is not a no-op;
the only reason why the ccdc_update_raw_params() does not blow up all the time
is that normally it's only called once, with fpc_table_addr still being zero,
so that free_pages() in there isn't reached at all.  With fewer casts (starting
with "make fpc_physaddr an integer, keep fpc_virtaddr a pointer, don't cast
that crap at all") it would've been caught just fine...

	I wonder if the conversion is feasible; what I've got so far seems
to be doable with relatively little PITA.

1) __get_free_page()/__get_free_pages() is the easiest of the entire bunch -
we can add get_free_page()/get_free_pages() that would return void * at any
point (the former would require renaming local inlines with the same name in
drivers/block/xen-blkback/blkback.c and drivers/xen/xen-scsiback.c,
the latter is completely unused).  Conversion to those could go piecemeal,
the originals becoming something like
#define __get_free_page(gfp) ((unsigned long)get_free_page(gfp))
in the commit introducing void * variant.  Eventually we could drop those
defines, or leave them for as long as we want - doesn't really matter.

2) get_zeroed_page() has 275 call sites at the moment.  205 of them
immediately cast to a pointer.  Out of remaining 70, about a half (29) is in
two files among drivers/net/wireless/*/debugfs.c; both of those would be
better off from switching to seq_file anyway, and that might be worth
doing as preliminary step.  With that done, it would be a matter of
replacing return type of get_zeroed_page with void * and adding casts to
unsigned long in remaining 41 locations.  Most of those can be eliminated
in immediate followups, but those can go into individual subsystems' trees;
the flagday part would have to be at the very end of a merge window and it
can be generated pretty much automatically.

3) free_page() and free_pages() are the trickiest part; the nasty part is
that we have __free_page()/__free_pages(), which *do* take a pointer and
not to the area being freed - to relevant struct page.  I think it's still
worth switching free_page/free_pages to void *, but I'd probably make them
macros along the lines of
extern void __free_page_misspelled(void);
extern void __free_page_pointer(void *);
#define free_page(p) \
	__builtin_choose_expr( \
		__builtin_types_compatible_p(struct page *,typeof(f)), \
			__free_page_misspelled(), \
	__builtin_choose_expr( \
		__builtin_types_compatible_p(const struct page *,typeof(f)), \
			__free_page_misspelled(), \
	__free_page_pointer(p)))
so that errors of that sort would still be caught.

This will have to be a flagday change, preferably right next to the
get_zeroed_page() one.  I'd done such a patch for free_pages(), and it
ended up with 31 call site (of 231) with cast to void * and the rest with
casts removed.  free_page() one will be about three times bigger.  I'm not
sure what's the best tactics here - one solution is to have dumb and
straightforward "add a cast to void * at every call site", immediately
followed by a bunch of "collapse the useless casts" per-subsystem patches
that would go into relevant trees.  That's very hard to fuck up, but more
noisy than it has to be.  Another variant is to have removals of obvious
casts from pointers to unsigned long mixed into the flagday one, but that's
slightly more dangerous - there are places where the boilerplate cast
to unsigned long in free_pages() argument is done even to unsigned long
expression.  Of course that requires the same care in followup patches in the
dumb approach.  And missed cast to void * would immediately show in the
build - it's not as if it would be quiet.

In any case, all followup patches would be in the same cycle; I hadn't done
the full proof-of-concept series yet, but free_pages() part (about a quarter
of the total) had taken one afternoon.  It's certainly feasible.

Impact on the out-of-tree code (including development branches in the trees
that get pulled into mainline) would be trivial - loud noise on build
fixed by a one-liner at the location the warning points to, possibly with
a followup along the lines of "now we can drop those 5 casts simply by
keeping the result of get_zeroed_page/get_free_page/etc. as struct foo *
rather than unsigned long; let's do it".

What do you think of that plan?  If the answer is "hell, no - leave it alone,
we are not going to do that kind of change no matter what"... well, I can
live with that, seeing that we'd lived with that boilerplate for what, 24
years by now?  But if it's "such and such parts of that outline is no-go
because of..." (or, better yet, "post the patch series and we'll see"), I'd
like to try to get it done.  Comments?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ