linux-kernel - Re: [PATCH v4 1/2] tools api: add a lightweight buffered reading api

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP-5=fW_5pPZTA3bXdT9d9Tt_d5aJw=4bf_fr9eqzGPfoVs3aQ@mail.gmail.com>
Date:   Mon, 13 Apr 2020 09:21:53 -0700
From:   Ian Rogers <irogers@...gle.com>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...hat.com>, Petr Mladek <pmladek@...e.com>,
        Andrey Zhizhikin <andrey.z@...il.com>,
        Kefeng Wang <wangkefeng.wang@...wei.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Kan Liang <kan.liang@...ux.intel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        linux-perf-users <linux-perf-users@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH v4 1/2] tools api: add a lightweight buffered reading api

On Mon, Apr 13, 2020 at 12:29 AM Namhyung Kim <namhyung@...nel.org> wrote:
>
> Hi Ian,
>
> On Sat, Apr 11, 2020 at 3:42 PM Ian Rogers <irogers@...gle.com> wrote:
> >
> > The synthesize benchmark shows the majority of execution time going to
> > fgets and sscanf, necessary to parse /proc/pid/maps. Add a new buffered
> > reading library that will be used to replace these calls in a follow-up
> > CL. Add tests for the library to perf test.
> >
> > v4 adds the test file missed in v3.
> >
> > Signed-off-by: Ian Rogers <irogers@...gle.com>
> > ---
> > +/* Read a hexadecimal value with no 0x prefix into the out argument hex. If the
> > + * first character isn't hexadecimal returns -2, io->eof returns -1, otherwise
> > + * returns the character after the hexadecimal value which may be -1 for eof.
>
> I'm not sure returning -1 is good when it actually reads something and
> meets EOF.
> Although it would have a valid value, users might consider it an error IMHO.
> Why not returning 0 instead? (I'm ok with -1 for the later use of the API).

Thanks for the feedback! In the code for /proc/pid/maps this is a
hypothetical, but I think having the API right is important. I didn't
go with 0 as you mention 0 'could' encode a character, for example,
7fffabcd\0 wouldn't be distinguishable from 7fffabcd<EOF>. The updated
code distinguishes the cases as 0 meaning character \0, -1 meaning EOF
and -2 meaning bad encoding. Your worry is that a hex number that's
next to EOF will get a result of -1 showing the EOF came next. and
code that does 'if ( .. < 0)' would trigger. While clunky, it'd be
possible in those cases to change the code to 'if ( .. < -1)'. So my
thoughts are:
1) being able to tell apart the 3 cases could be important - this is
all hypothetical;
2) keeping EOF and error as negative numbers has a degree of consistency;
3) using -1 for EOF comes from get_char, it'd be nice to have one
value mean EOF.
Perhaps the issue is the name of the function? It isn't a standard API
to return the next character, but it simplified things for me as I
didn't need to add a 'rewind' operation. The function names could be
something like io__get_hex_then_char and io__get_dec_then_char, EOF
for the 'then_char' part would be more consistent. I'd tried to keep
the names short and have a load bearing comment, which isn't ideal but
generally I believe the style is that function names are kept short.
Let me know what you think.

Thanks,
Ian

> > + * If the read value is larger than a u64 the high-order bits will be dropped.
> > + */
> > +static inline int io__get_hex(struct io *io, __u64 *hex)
> > +{
> > +       bool first_read = true;
> > +
> > +       *hex = 0;
> > +       while (true) {
> > +               int ch = io__get_char(io);
> > +
> > +               if (ch < 0)
> > +                       return ch;
> > +               if (ch >= '0' && ch <= '9')
> > +                       *hex = (*hex << 4) | (ch - '0');
> > +               else if (ch >= 'a' && ch <= 'f')
> > +                       *hex = (*hex << 4) | (ch - 'a' + 10);
> > +               else if (ch >= 'A' && ch <= 'F')
> > +                       *hex = (*hex << 4) | (ch - 'A' + 10);
> > +               else if (first_read)
> > +                       return -2;
> > +               else
> > +                       return ch;
> > +               first_read = false;
> > +       }
> > +}
> > +
> > +/* Read a positive decimal value with out argument dec. If the first character
> > + * isn't a decimal returns -2, io->eof returns -1, otherwise returns the
> > + * character after the decimal value which may be -1 for eof. If the read value
> > + * is larger than a u64 the high-order bits will be dropped.
>
> Ditto.
>
> Thanks
> Namhyung
>
>
> > + */
> > +static inline int io__get_dec(struct io *io, __u64 *dec)
> > +{
> > +       bool first_read = true;
> > +
> > +       *dec = 0;
> > +       while (true) {
> > +               int ch = io__get_char(io);
> > +
> > +               if (ch < 0)
> > +                       return ch;
> > +               if (ch >= '0' && ch <= '9')
> > +                       *dec = (*dec * 10) + ch - '0';
> > +               else if (first_read)
> > +                       return -2;
> > +               else
> > +                       return ch;
> > +               first_read = false;
> > +       }
> > +}
> > +
> > +#endif /* __API_IO__ */