[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160224051346.GD30919@packer-debian-8-amd64.digitalocean.com>
Date: Wed, 24 Feb 2016 00:13:47 -0500
From: Jessica Yu <jeyu@...hat.com>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Rasmus Villemoes <linux@...musvillemoes.dk>,
Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
Kees Cook <keescook@...omium.org>, linux-kernel@...r.kernel.org
Subject: Re: sscanf: implement basic character sets
+++ Andrew Morton [23/02/16 14:05 -0800]:
>On Tue, 23 Feb 2016 15:38:22 -0500 Jessica Yu <jeyu@...hat.com> wrote:
>
>> Implement basic character sets for the '%[]' conversion specifier.
>>
>> The '%[]' conversion specifier matches a nonempty sequence of characters
>> from the specified set of accepted (or with '^', rejected) characters
>> between the brackets. The substring matched is to be made up of characters
>> in (or not in) the set. This implementation differs from its glibc
>> counterpart in that it does not support character ranges (e.g., 'a-z' or
>> '0-9'), the hyphen '-' is *not* a special character, and the brackets
>> themselves cannot be matched.
>>
>> Signed-off-by: Jessica Yu <jeyu@...hat.com>
>> ---
>>
>> This patch adds support for the '%[' conversion specifier for sscanf().
>> This is useful in cases where we'd like to match substrings delimited by
>> something other than spaces. The original motivation for this patch
>> actually came from a livepatch discussion (See: https://lkml.org/lkml/2016/2/8/790),
>> where we were trying to come up with a clean way to parse symbol names with
>> substrings delimited by periods and commas.
>
> It would be better to include the justification right here in the
> changelog please.
> Not via some link-to-discussion and definitely not
> below the ^--- marker! It's very important.
Thanks for the corrections Andrew. I am however slightly confused, are
you suggesting that I should provide a much more thorough explanation
about the motivation here in the changelog (below the ^--- marker), or
would this be better suited for a (separate) cover letter?
>The deviation from the glibc behaviour is a bit of a worry,
>particularly as it is done in a non-back-compat manner: code which
>assumes "-" is non-magic might break if someone later adds range
>support.
>
>Presumably we can live with that - there won't be many callsites and
>they can be grepped for. But please, let's get a description of all
>these considerations into the code as a comment. Probably it would be
>helpful to include a little usage example in that comment.
Hm, that is a very good point. At the moment we can be sure there
aren't any users of sscanf() using the %[ conversion specifier, as it
doesn't exist yet :-) But yes, this behavior should be documented
clearly in a comment, so future users will be aware..
>> --- a/lib/vsprintf.c
>> +++ b/lib/vsprintf.c
>> @@ -2714,6 +2714,47 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
>> num++;
>> }
>> continue;
>> + case '[':
>> + {
>> + char *s = (char *)va_arg(args, char *);
>> + char *set;
>> + size_t (*op)(const char *str, const char *set);
>> + size_t len = 0;
>> + bool negate = (*(fmt) == '^');
>> +
>> + if (field_width == -1)
>> + field_width = SHRT_MAX;
>> +
>> + op = negate ? &strcspn : &strspn;
>> + if (negate)
>> + fmt++;
>> +
>> + len = strcspn(fmt, "]");
>> + /* invalid format; stop here */
>> + if (!len)
>> + return num;
>> +
>> + set = kstrndup(fmt, len, GFP_KERNEL);
>
>Embedding a GFP_KERNEL allocation into vsscanf is problematic - it
>limits the situations in which this functionality can be used.
>
>afaict the allocation is there merely so we can null-terminate the
>string so we can use existing library functions (strcspn, strspn). Is
>that compromise really worth it? We could pretty easily convert
>strcspn() into
>
> strcnspn(const char *s, const char *reject, size_t len)
>
>and convert strcspn() to call that (ifndef __HAVE_ARCH_STRCSPN)
>
>In fact I think we could still use strspn() and strcspn() on `fmt'
>directly? We just need to check for the return value exceeding `len'
>and if so, treat that as a no-match?
>
Perhaps we can use Rasmus' bitmap solution, as it avoids the
allocation altogether and it doesn't need to use strspn()/strcspn().
>> + if (!set)
>> + return num;
>> +
>> + /* advance fmt past ']' */
>> + fmt += len + 1;
>> +
>> + len = op(str, set);
>> + /* no matches */
>> + if (!len) {
>> + kfree(set);
>> + return num;
>> + }
>> +
>> + while (len-- && field_width--)
>> + *s++ = *str++;
>> + *s = '\0';
>> + kfree(set);
>> + num++;
>> + }
>> + continue;
>> case 'o':
>> base = 8;
>> break;
>
Powered by blists - more mailing lists