[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAF=yD-Kx+nOf_56TvWvRo=pupjf_VH+E_0pKBMawx+N=7BsOpQ@mail.gmail.com>
Date: Thu, 31 Aug 2017 23:31:54 -0400
From: Willem de Bruijn <willemdebruijn.kernel@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: Network Development <netdev@...r.kernel.org>,
David Miller <davem@...emloft.net>,
Willem de Bruijn <willemb@...gle.com>
Subject: Re: [PATCH net-next] doc: document MSG_ZEROCOPY
On Thu, Aug 31, 2017 at 11:10 PM, Alexei Starovoitov
<alexei.starovoitov@...il.com> wrote:
> On Thu, Aug 31, 2017 at 11:04:41PM -0400, Willem de Bruijn wrote:
>> On Thu, Aug 31, 2017 at 10:10 PM, Alexei Starovoitov
>> <alexei.starovoitov@...il.com> wrote:
>> > On Thu, Aug 31, 2017 at 05:00:13PM -0400, Willem de Bruijn wrote:
>> >> From: Willem de Bruijn <willemb@...gle.com>
>> >>
>> >> Documentation for this feature was missing from the patchset.
>> >> Copied a lot from the netdev 2.1 paper, addressing some small
>> >> interface changes since then.
>> >>
>> >> Signed-off-by: Willem de Bruijn <willemb@...gle.com>
>> > ...
>> >> +Notification Batching
>> >> +~~~~~~~~~~~~~~~~~~~~~
>> >> +
>> >> +Multiple outstanding packets can be read at once using the recvmmsg
>> >> +call. This is often not needed. In each message the kernel returns not
>> >> +a single value, but a range. It coalesces consecutive notifications
>> >> +while one is outstanding for reception on the error queue.
>> >> +
>> >> +When a new notification is about to be queued, it checks whether the
>> >> +new value extends the range of the notification at the tail of the
>> >> +queue. If so, it drops the new notification packet and instead increases
>> >> +the range upper value of the outstanding notification.
>> >
>> > Would it make sense to mention that max notification range is 32-bit?
>> > So each 4Gbyte of xmit bytes there will be a notification.
>> > In modern 40Gbps NICs it's not a lot. Means that there will be
>> > at least one notification every second.
>> > Or I misread the code?
>>
>> You're right. The doc does mention that the counter and range
>> are 32-bit. I can state more explicitly that that bounds the working
>> set size to 4GB. Do you expect this to be problematic? Processing
>> a single notification per 4GB of data should not be a significant
>> cost in itself.
>
> I think 4GB is fine. Just there was an idea that in cases when
> notification of transmission can be known by other means
Some kind of unspoofable response from the peer (i.e., not just
a tcp ack), or a kernel mechanism independent from the error
queue? The first does not guarantee that a retransmit is
not in progress.
> the user space
> could have skipped reading errqeuee completely, but looks like it
> still needs to poll.
If a process has no need to see the notification, say because
it is sending out a buffer that is constant for the process lifetime,
then it could conceivably skip the recv, and poll with it. The code
as written will not coalesce more than 4GB of data, but that could
be revised.
Powered by blists - more mailing lists