[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <540B6819.9030201@ahsoftware.de>
Date: Sat, 06 Sep 2014 22:01:29 +0200
From: Alexander Holler <holler@...oftware.de>
To: Rob Landley <rob@...dley.net>
CC: Rogelio Serrano <rogelio.serrano@...il.com>,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
Måns Rullgård <mans@...sr.com>,
Steven Rostedt <rostedt@...dmis.org>,
Christopher Barry <christopher.r.barry@...il.com>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: OT: Open letter to the Linux World
Am 05.09.2014 08:31, schrieb Alexander Holler:
> Am 04.09.2014 21:18, schrieb Rob Landley:
>
>> What's actually wrong with C++ at a language design level.
>>
>> Short version:
>
> OMG.
>
> It's better than C. In almost every aspect. Stop. Nothing else. Of
> course, if you want to write something like systemd in Python, Perl,
> Pascal, Modula or Erlang, feel free to so. And if you want more security
> bugs, feel free to still use C for string handling instead of
> std::string, Or still write your sorted list for every structure (or
> just don't and go the slow way, because you don't find the time to do it
> right in C). And ...
>
> You don't have to understand how templates do work to use e.g.
> std::string. Other people do hard stuff for you. So don't panic.
I've brought up the critics about using C in a critical and very
security sensitive piece of software in userland, so I've decided a bit
more explanations might make sense.
First, as you don't seem to have noticed or you don't know or you ignore
the difference, let me repeat that this thread is about a piece of SW
which runs in userland. So, please, keep away with any comments from
Linus where he talks about kernelspace. I'm pretty sure he knows the
difference.
Now let me bring up a very small piece of code which you can find in a
similiar fashion in almost every piece of software which gets in contact
with strings. And not just in one place or function, but in dozens or
even hundred of places (inline, not in functions) in one project.
First in C++:
void foo_bar(const std::string& foo, const std::string& bar,
std::string& foobar)
{
foobar = foo + bar;
}
For those which don't know C++, this concatenates the two strings named
foo and bar and puts the result into foobar.
Now an example how you would have to do that in C:
char *foo_bar(const char *foo, const char *bar)
{
char *foobar = malloc(strlen(foo) + strlen(bar));
strcpy(foobar, foo);
strcat(foobar, bar);
return foobar;
}
Do you see the difference and spot all the problems?
First I've though about not posting the answer to see the response, but
that would just have ended up with a lot of people calling me a fool
and/or assuming I can't write proper C. And it bears the problem that
some inexperienced people might copy and paste and use it.
So at first: THE ABOVE EXAMPLE IN C IS BROKEN.
The very first problem is that foobar is allocated with the wrong size,
because it doesn't take care of the terminating null byte. A very common
problem already found at uncountable places.
But there are several more problems:
- What happens if foo or bar isn't terminated with a null byte?
- What happens if malloc fails?
- Who is the owner of foo, bar and/or foobar? Does the caller still owns
foo and bar afterwards? Will the caller own foobar? (That means who is
repsonsible to free foo, bar and foobar if they aren't used anymore).
So now we extend the above C example:
char *foo_bar(const char *foo, const char *bar)
{
char *foobar;
if (!foo || !bar)
return NULL;
foobar = malloc(strlen(foo) + strlen(bar) + 1);
if (!foobar)
return NULL;
strcpy(foobar, foo);
strcat(foobar, bar);
return foobar;
}
This has still some problems. First, the caller has to check if
foo_bar() hasn't returned NULL. A very common bug already found in
uncountable places too.
Next, there is still the unsolvable problem about what happens if foo or
bar isn't terminated with a null byte (in other words they aren't C
strings).
So you have to check all callers up to the source of foo and bar to be
sure the program doesn't crash in the possible far far away place called
foo_bar().
And still no comment about ownership. That means someone who just looks
at the prototype or sees a call of foo_bar() somewhere has no idea about
the ownership of foo, bar and the returned foobar without a comment.
So just this very simple functionality about string handling in C
already contains several still open questions and is 17 lines long which
have to be reviewed very carefull (e.g. to not miss the off-by-one bug).
Compare this with the 4 lines in C++ which are almost impossible to do
or to use wrong.
And, again, this thread is about a piece of software which runs with
process ID 1, wants to control the whole system and owns all permissions
to modify the system in almost every possible way. It doesn't run as
some user with restricted permissions or in chroot or something
similar. Some parts might do, but for sure not all (read again the above
"far far away").
And now some stats. I've just checked out systemd:
git grep -E "strcat|strncat|strcpy|strncpy|strlen" | wc -l
570
git grep -E "strcat|strncat|strcpy|strncpy|memcpy|strlen" | wc -l
850
Ok, not every of those places might be part of pid 1. And several places
are trivial calls like strlen("ATTR"), but it gives an idea about how
many places do exist in systemd which might contain a problem wich isn't
trivial to spot.
And regardless how clever and experienced these people are which are
writing this piece of software, everyone is prone to do e.g. such an
off-by-one bug.
Maybe he writes the piece of code after having worked 12 hours, maybe he
got interrupted while writing the code and continued it a day later,
maybe it's full moon or maybe his last meal wasn't like it should have been.
Whatever.
This means, every piece of code in that piece of software has to be
reviewed multiple times (reviewers aren't perfect too), and, as long as
the software changes, every piece which changes has to be reviewed
multiple times again.
I could continue with examples for lists, sets and similiar data
structures, which have to be inventent again and again and again in C,
whereas in C++ people can reuse some code which is already in use by
many, many people in many, many other projects.
And to come to your argument about how simple everything in C is. Just
look at the macro for container_of(). I wouldn't say it's such simple
that everyone understands what it does. And it's just part of the Linux
kernel, that means limited documentation and many people never heard of
it before, compared with stuff which can be found standard libraries.
And, again, this is not about the kernelspace, it isn't the Linux kernel
where Linus has managed it to organize an army of people which do look
at every line again and again (and still do sometimes miss a bug).
Most software projects don't have that many resources (human or not)
available as the Linux kernel. In fact it's an absolute exception.
So you just don't want to use error prone C in new and non-trivial
projects (if not really necessary) which are a major problem if
something fails in the code.
Doing so just means nothing has be learned from the (of corse relatively
short) history of software development.
Alexander Holler
PS: Please don't try to tell me that even the above C++ example ends up
in some similar code as the C code. std::string is used by even more
people than which do review the Linux kernel code. Besides that it was
designed and reviewed by clever people too. And, just to repeat it
again, we are talking about userspace, not kernelspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists