linux-kernel - Re: [PATCH] Implementation of the sendgroup() system call

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Wed, 6 May 2009 07:25:34 -0400 (EDT)
From:	Tim Brecht <brecht@...uwaterloo.ca>
To:	Andi Kleen <andi@...stfloor.org>
cc:	Elad Lahav <elahav@...uwaterloo.ca>,
	Elad Lahav <elahav@...terloo.ca>, linux-kernel@...r.kernel.org,
	netdev@...r.kernel.org
Subject: Re: [PATCH] Implementation of the sendgroup() system call



On Mon, 4 May 2009, Andi Kleen wrote:

> On Mon, May 04, 2009 at 09:44:31AM -0400, Elad Lahav wrote:
>>> My guess it's more the copies than the calls?
>> It's a factor of both. This is why we also created the sendgroup()
>> implementation that uses a tight loop of in-kernel calls to sendmsg()
>> as a means for evaluating the cost of mode switches. It is definitely
>> not negligible (exact numbers depend on the size of the group and the
>> size of the payload, of course).
>
> How much is non negligible in your case?

As you can see from Elad's posting it can be pretty
significant.

>>
>>> It sounds like you want sendfile() for UDP.
>> Do you mean by having a per-recipient sendfile() call for the same
>> file? Leaving the cost of the system call aside, this solution does
>> not work well with the kind of real-time data that we've been working
>> with (live streaming, online games). You would have to write the
>> payload to the file as it is being generated and call sendfile() after
>> each such write.
>
> You can mmap the file.

There are a few problem with using mmap and sendfile:

1) One would really want something like sendfilev where
    one could specify multiple recipients in one syscall
    (in order to save on the mode switches).

2) I don't know what it would be like for UDP but for
    TCP one of the big problems with mmap/sendfile
    for zero copy is that the application
    doesn't know when the kernel has finished sending
    the data. As a result one can only reuse the mmapped buffer
    if there is some way for the application to deduce
    that the kernel is finished sending the data.
    Even if the application can deduce this it can
    often be long after the kernel has sent the data
    and as a result memory buffers can accumulate
    unnecessarily. We've had this problem trying to use
    this approach in a high-performance web server.

3) I think that including recipient specific data
    would be cumbersome and would probably require extra
    system calls. Possibly
       write(for prepend)
       sendfile(for common)
       write(for append)
    Unless one copies the common data into prestaged
    areas in user space ... which results in the copying
    we are trying to avoid.

    Perhaps if writev was able to
    write from an mmapped file with zero copies,
    a single recipient could be sent recipient
    specific and common data with one system call.
    However, this approach would still require one system
    call per recipient.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/