[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJoZ4U0PNtk2e_x4D3N_fFN32uTVe6Z+FPg7ofG69KjQzsZ7xA@mail.gmail.com>
Date: Wed, 16 Oct 2013 11:10:58 -0400
From: Kyle Hubert <khubert@...il.com>
To: Stephen Hemminger <stephen@...workplumber.org>
Cc: netdev@...r.kernel.org
Subject: Re: SW csum errors
On Mon, Oct 14, 2013 at 4:58 PM, Stephen Hemminger
<stephen@...workplumber.org> wrote:
> On Mon, 14 Oct 2013 16:13:15 -0400
> Kyle Hubert <khubert@...il.com> wrote:
>
>> My problem is rather specific. I am working on an RDMA device, and we
>> have full end to end reliability. However, one of the initial spins of
>> our chip had some errors, since fixed, where the csum was unreliable.
>> So, we did exactly what Dave Miller warned not to do in the linked
>> message. We ran outgoing IP packets through the SKB checksum
>> function.. Unfortunately, we occasionally saw NFS csum errors on full
>> MTU packets.
>>
>> Here is his response:
>>
>> http://marc.info/?l=linux-netdev&m=128286758300676&w=2
>>
>> Relevant portion:
>>
>> "
>> Paged SKBs can have references to page cache pages and similar. These
>> can be updated asynchronously to the transmit, there is no locking at
>> all to freeze the contents, and therefore full checksum offload is
>> required to support SG correctly.
>>
>> So don't get the idea to do the checksum in software in the infiniband
>> layer, and advertize hw checksumming support, to get around this :-)
>> "
>>
>> Now that those chips have long gone, I am left pondering about these
>> packets "corrupted" before the device transfers them. Can I get more
>> information about these paged SKBs with asynchronous modifications?
>> How does NFS use them?
>
> You would have to either mark the pages as copy on write or copy the data.
> Setting COW is expensive because you have to coordinate with other CPU's
> on SMP. Not sure exactly how.
>
> You can demonstrate this with either sendfile() or NFS where underlying
> file contents are being modified while packet is in the queue.
Thanks, I didn't realize it was as simple as file backed pages being
changed. Yes, our device does support SG, so we do have zero-copy
sendfile() support. I'll concoct a simple test to prove this.
-Kyle
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists