linux-ext4 - Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD normalization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20181130161251.GA3512@thunk.org>
Date:   Fri, 30 Nov 2018 11:12:51 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Gabriel Krisman Bertazi <krisman@...labora.com>
Cc:     kernel@...labora.com, linux-ext4@...r.kernel.org,
        Gabriel Krisman Bertazi <krisman@...labora.co.uk>
Subject: Re: [PATCH v3 08/12] ext2fs: nls: Support UTF-8 11.0 with NFKD
 normalization

On Mon, Nov 26, 2018 at 05:19:45PM -0500, Gabriel Krisman Bertazi wrote:
> +static int utf8_casefold(const struct nls_table *table,
> +			  const unsigned char *str, size_t len,
> +			  unsigned char *dest, size_t dlen)
> +{
> +	const struct utf8data *data = utf8nfkdicf(UNICODE_AGE(10,0,0));
> +	struct utf8cursor cur;
> +	size_t nlen = 0;
> +
> +	if (utf8ncursor(&cur, data, str, len) < 0)
> +		goto invalid_seq;
> +
> +	for (nlen = 0; nlen < dlen; nlen++) {
> +		dest[nlen] = utf8byte(&cur);
> +		if (!dest[nlen])
> +			return nlen;
> +		if (dest[nlen] == -1)
> +			break;
> +	}
> +invalid_seq:
> +	/* Treat the sequence as a binary blob. */
> +	memcpy(dest, str, len);
> +	return len;
> +
> +}

So it looks like the interface is if the destination buffer is too
small OR if the string is not a valid UTF-8 string, we treat it as a
binary blob.  I wonder if we would be better off if this function
actually signalling that there is a problem?  (Buffer too small,
invalid UTF-8 string).

It's fine to treat it as a binary blob, and copy it out to the
destination buffer, but I can imagine be use cases where knowing this
will be useful.  *Especially* the destination buffer too small case;
I'm actually a little nervous about having it silently ignoring that
error condition and just copying the binary blob.

Also, there *really* needs to be a check before dlen is assumed to be
>= len in the memcpy after the invalid_seq label.

							- Ted