linux-kernel - Re: [2.6 patch] kernel/sched*: optimize inlining

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Sun, 11 May 2008 13:18:20 +0200
From:	Andreas Mohr <andi@...as.de>
To:	Adrian Bunk <bunk@...nel.org>
Cc:	linux-kernel@...r.kernel.org
Subject: Re: [2.6 patch] kernel/sched*: optimize inlining

[manual reply, no access to original content]

Hi,

rather NACKish here (from my minor side, that is, since there are no
useful explanations, and in the case of a lack of explanations no
backing numbers either which would have been helpful to resolve this ;).

"x86: add optimized inlining"
(http://kerneltrap.org/mailarchive/git-commits-head/2008/4/26/1612644)
does not really say anything relevant to your patch, AFAICS.

That one simply says that previously every inline was force-inlined (ugh),
which now gcc is allowed to properly decide by itself now. This, however,
does _NOT_ imply that it's now somehow fully sufficient for a perfect outcome
to simply remove all open-coded "inline"s.

This part here:

> -static inline int task_current(struct rq *rq, struct task_struct *p)
> +static int task_current(struct rq *rq, struct task_struct *p)
>  {
>         return rq->curr == p;
>  }
>  
>  #ifndef __ARCH_WANT_UNLOCKED_CTXSW
> -static inline int task_running(struct rq *rq, struct task_struct *p)
> +static int task_running(struct rq *rq, struct task_struct *p)
>  {
>         return task_current(rq, p);
>  }
>  

is especially "nice" since it's now a non-inlined multi-call, with 2
parameters each and 6 callsites within sched.c.

Using this admittedly worst-caseish part as a simple compile testcase, I get
67013 vs. previously 66920 bytes for sched.o on x86 (gcc 4.2.3, 2.6.26-rc1).
Removing all "inline"s, we end up at 67891 even.
Size indicates that it's an increase in complexity, but size is
not even the problem here, the problem is that these calls are used in
inner scheduling / migration code, AFAICS, where a simple pointer
comparison now turns into a 2-parametric multi-call.
Examining this thing more closely (objdump, nm), I see that it did
inline task_running and chose to not inline task_current (which it was
before, and that seems like a wrong compiler decision).

OK, I know that it's not "chic" to argue about inlining stuff,
especially since it's such a b**ch to manage (do we inline this or don't
we, and what if the function silently grows to become an inlining
penalty?).
But in this case you seem to have brushed all criticism aside with an
argument (pointing to the other patch) that to me doesn't seem to hold
in the first place.
Also, sched.c isn't architecture-specific, IOW it will be used by
architectures with all kinds of ranges of "implementation dumbness".
Some of those may have lots of issues due to weak instruction caches or
badly-performing function invocation.

So, again,
- non-"inline"d functions sometimes (sufficiently frequently?) do not seem
  to get inlined no matter how small
- sched.c is very central, non-architecture-specific code
- looking at a "nice" size reduction really does _not_ matter (relatively)
  for such a centrally, frequently used code, what matters is performance
- the consequences of this change to me seem just a wee bit too gross
  to swallow

As long as compiler intelligence in the realms of inlining seems
spotty, I'd refrain from cleaning up centrally used, critical code.
Removing inlines in driver code which is rarely used (with the objective
of gaining size reductions, of course) is an entirely different discussion
with easily more positive outcome, I'd say.
If even x86 gcc (4.2.3) seems to do the wrong thing here, then I really don't
want to know what much less prominently compiler-optimized architectures
would end up with.

Just my thoughts (keep it or burn it),

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/