linux-kernel - Re: OT: Open letter to the Linux World

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <540B6819.9030201@ahsoftware.de>
Date:	Sat, 06 Sep 2014 22:01:29 +0200
From:	Alexander Holler <holler@...oftware.de>
To:	Rob Landley <rob@...dley.net>
CC:	Rogelio Serrano <rogelio.serrano@...il.com>,
	Borislav Petkov <bp@...en8.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Måns Rullgård <mans@...sr.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Christopher Barry <christopher.r.barry@...il.com>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: OT: Open letter to the Linux World

Am 05.09.2014 08:31, schrieb Alexander Holler:
> Am 04.09.2014 21:18, schrieb Rob Landley:
>
>> What's actually wrong with C++ at a language design level.
>>
>> Short version:
>
> OMG.
>
> It's better than C. In almost every aspect. Stop. Nothing else. Of
> course, if you want to write something like systemd in Python, Perl,
> Pascal, Modula or Erlang, feel free to so. And if you want more security
> bugs, feel free to still use C for string handling instead of
> std::string, Or still write your sorted list for every structure (or
> just don't and go the slow way, because you don't find the time to do it
> right in C). And ...
>
> You don't have to understand how templates do work to use e.g.
> std::string. Other people do hard stuff for you. So don't panic.


I've brought up the critics about using C in a critical and very 
security sensitive piece of software in userland, so I've decided a bit 
more explanations might make sense.

First, as you don't seem to have noticed or you don't know or you ignore 
the difference, let me repeat that this thread is about a piece of SW 
which runs in userland. So, please, keep away with any comments from 
Linus where he talks about kernelspace. I'm pretty sure he knows the 
difference.

Now let me bring up a very small piece of code which you can find in a 
similiar fashion in almost every piece of software which gets in contact 
with strings. And not just in one place or function, but in dozens or 
even hundred of places (inline, not in functions) in one project.

First in C++:

void foo_bar(const std::string& foo, const std::string& bar, 
std::string& foobar)
{
	foobar = foo + bar;
}

For those which don't know C++, this concatenates the two strings named 
foo and bar and puts the result into foobar.

Now an example how you would have to do that in C:

char *foo_bar(const char *foo, const char *bar)
{
	char *foobar = malloc(strlen(foo) + strlen(bar));

	strcpy(foobar, foo);
	strcat(foobar, bar);

	return foobar;
}

Do you see the difference and spot all the problems?

First I've though about not posting the answer to see the response, but 
that would just have ended up with a lot of people calling me a fool 
and/or assuming I can't write proper C. And it bears the problem that 
some inexperienced people might copy and paste and use it.

So at first: THE ABOVE EXAMPLE IN C IS BROKEN.

The very first problem is that foobar is allocated with the wrong size, 
because it doesn't take care of the terminating null byte. A very common 
problem already found at uncountable places.

But there are several more problems:

- What happens if foo or bar isn't terminated with a null byte?

- What happens if malloc fails?

- Who is the owner of foo, bar and/or foobar? Does the caller still owns 
foo and bar afterwards? Will the caller own foobar? (That means who is 
repsonsible to free foo, bar and foobar if they aren't used anymore).

So now we extend the above C example:

char *foo_bar(const char *foo, const char *bar)
{
	char *foobar;

	if (!foo || !bar)
		return NULL;

	foobar = malloc(strlen(foo) + strlen(bar) + 1);

	if (!foobar)
		return NULL;

	strcpy(foobar, foo);
	strcat(foobar, bar);

	return foobar;
}

This has still some problems. First, the caller has to check if 
foo_bar() hasn't returned NULL. A very common bug already found in 
uncountable places too.

Next, there is still the unsolvable problem about what happens if foo or 
bar isn't terminated with a null byte (in other words they aren't C 
strings).
So you have to check all callers up to the source of foo and bar to be 
sure the program doesn't crash in the possible far far away place called 
foo_bar().

And still no comment about ownership. That means someone who just looks 
at the prototype or sees a call of foo_bar() somewhere has no idea about 
the ownership of foo, bar and the returned foobar without a comment.

So just this very simple functionality about string handling in C 
already contains several still open questions and is 17 lines long which 
have to be reviewed very carefull (e.g. to not miss the off-by-one bug).
Compare this with the 4 lines in C++ which are almost impossible to do 
or to use wrong.

And, again, this thread is about a piece of software which runs with 
process ID 1, wants to control the whole system and owns all permissions 
to modify the system in almost every possible way. It doesn't run as 
some user  with restricted permissions or in chroot or something 
similar. Some parts might do, but for sure not all (read again the above 
"far far away").

And now some stats. I've just checked out systemd:

git grep -E "strcat|strncat|strcpy|strncpy|strlen" | wc -l
570

git grep -E "strcat|strncat|strcpy|strncpy|memcpy|strlen" | wc -l
850

Ok, not every of those places might be part of pid 1. And several places 
are trivial calls like strlen("ATTR"), but it gives an idea about how 
many places do exist in systemd which might contain a problem wich isn't 
trivial to spot.

And regardless how clever and experienced these people are which are 
writing this piece of software, everyone is prone to do e.g. such an 
off-by-one bug.
Maybe he writes the piece of code after having worked 12 hours, maybe he 
got interrupted while writing the code and continued it a day later, 
maybe it's full moon or maybe his last meal wasn't like it should have been.

Whatever.

This means, every piece of code in that piece of software has to be 
reviewed multiple times (reviewers aren't perfect too), and, as long as 
the software changes, every piece which changes has to be reviewed 
multiple times again.

I could continue with examples for lists, sets and similiar data 
structures, which have to be inventent again and again and again in C, 
whereas in C++ people can reuse some code which is already in use by 
many, many people in many, many other projects.

And to come to your argument about how simple everything in C is. Just 
look at the macro for container_of(). I wouldn't say it's such simple 
that everyone understands what it does. And it's just part of the Linux 
kernel, that means limited documentation and many people never heard of 
it before, compared with stuff which can be found standard libraries.

And, again, this is not about the kernelspace, it isn't the Linux kernel 
where Linus has managed it to organize an army of people which do look 
at every line again and again (and still do sometimes miss a bug).
Most software projects don't have that many resources (human or not) 
available as the Linux kernel. In fact it's an absolute exception.

So you just don't want to use error prone C in new and non-trivial 
projects (if not really necessary) which are a major problem if 
something fails in the code.

Doing so just means nothing has be learned from the (of corse relatively 
short) history of software development.

Alexander Holler

PS: Please don't try to tell me that even the above C++ example ends up 
in some similar code as the C code. std::string is used by even more 
people than which do review the Linux kernel code. Besides that it was 
designed and reviewed by clever people too. And, just to repeat it 
again, we are talking about userspace, not kernelspace.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/