linux-kernel - Re: Add PGM protocol support to the IP stack

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.00.1003221146260.17230@router.home>
Date:	Mon, 22 Mar 2010 11:51:08 -0500 (CDT)
From:	Christoph Lameter <cl@...ux-foundation.org>
To:	Andi Kleen <andi@...stfloor.org>
cc:	David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: Add PGM protocol support to the IP stack

On Mon, 22 Mar 2010, Andi Kleen wrote:

> Multicast reliable kernel protocols are somewhat new, I guess one
> would need to make sure to come up with a clean generic interface
> for them first.

It has been around for a long time in another OS. I wonder if I should use
the socket API realized there as a model or come up with something new
from scratch?

What I have right now is:

1. Opening a socket

        A. Native PGM

                fd = socket(AF_INET, SOCK_RDM, IPPROTO_PGM)

        B. PGM over UDP

                fd = socket(AF_INET, SOCK_RDM, IPPROTO_UDP)

        C. PGM over SHM (?)

                fd = socket(AF_UNIX, SOCK_RDM, 0)


2. Binding to a multicast address

        A. Sender

                Connect the socket to a MC address and port using connect().

                Note that the port is significant since multiple streams on different
                ports can be run over the same MC addr.

        B. Receiver

                I. Bind the socket to the MC address and port of interest.

                II. Listen to the socket.

                        Process will wait until a PGM packet destined to the port of interest
                        is received.

                III. Accept a connection.

                        Establishes a session. Data can then be received.


3. Sending and receiving

        Use the usual socket read and write operations and the various flavors of waiting
        for a packet via select, poll, epoll etc.

        Packet sizes are determined by the number of  packets in a single sendmsg() unless
        overridden by the RM_SET_MESSAGE_BOUNDARY socket option.

        The sender will block when the send window is full unless a non blocking write is performed.

        The receiver shows the usual wait semantics. If the stream is set to unreliable then
        packets may arrive in random order. If the set is set to RM_LISTEN_ONLY then packets may
        just be missing.

4.      Transmitter Socket Options


        A. Setting the window size / rate.

                struct pgm_send_window x;
                x.RateKbitsPerSec = 56;
                x.WindowSizeInMsecs = 60000;
                x.WindowSizeinBytes = 10000000;

                setsockopt(fd, SOCK_RDM, RM_RATE_WINDOW_SIZE, &x, sizeof(x));

                Default is sending at 56Kbps with a buffer of 10 Megabytes and buffering for a minute.

        B. FEC mode

                struct pgm_fec_info x;

                x.FECBlocksize = 255;
                x.FECProActivePackets = 0;
                x.FECGroupSize = 0;
                x.fFECOnDemandParityEnabled = 1;

                setsockopt(fd, SOCK_RDM, RM_FEC_MODE, &x, sizeof(x));


5.      Receiver Socket Options

        None?


Possible Extensions

        RM_UNORDERED    accept unordered packet avoiding delays when packets arrive out of sequence.
                        packet is still NAKed.

        RM_RECEIVE_ONLY Simply ignore missed packets. Do not send any replies.



Existing socket options in the other OS (X denotes that this looks like
its screwy and should be avoided)

/* PGM socket options */

/* Transmitter */
#define RM_LATEJOIN                             1       /* X Not supported on receive so why have it? */
#define RM_RATE_WINDOW_SIZE                     2       /* See struct pgm_send_window */
#define RM_SEND_WINDOW_ADV_RATE                 3       /* X Increase of send window in percentage of window */
#define RM_SENDER_STATISTICS                    4       /* see struct pgm_sender_stats */
#define RM_SENDER_WINDOW_ADVANCE_METHOD         5       /* X seems obsolete */
#define RM_SET_MCAST_TTL                        6       /* X Can be set via IP_MULTICAST_TTL */
#define RM_SET_MESSAGE_BOUNDARY                 7       /* Fix the size of the messages in bytes */
#define RM_SET_SEND_IF                          8       /* X use IP_MULTICAST_IF etc instead */
#define RM_USE_FEC                              9

/* Receiver */
#define RM_ADD_RECEIVE_IF                       100     /* X ???? IP_MULTICAST_IF instead? */
#define RM_DEL_RECEIVE_IF                       101     /* X IP_MULTICAST_IF */
#define RM_HIGH_SPEED_INTRANET_OPT              102     /* X PGM should adapt automatically to high speed networks */
#define RM_RECEIVER_STATISTICS                  103     /* See struct pgm_receiver_stats */


/* Socket API structures (established by M$DN) */
struct pgm_receiver_stats {
        u64     NumODataPacketsReceived;        /* Number of ODATA (original) sequences */
        u64     NumRDataPacketsReceived;        /* Number of RDATA (repair) sequences */
        u64     NumDuplicateDataPackets;        /* Duplicate sequences */
        u64     DataBytesReceived;
        u64     TotalBytesReceived;
        u64     RateKBitsPerSecOverall;         /* Receive rate since start of session X */
        u64     RateKBitsPerSecLast;            /* Receive rate for last second X*/
        u64     TrailingEdgeSeqId;              /* Oldest sequence in the receive window */
        u64     LeadingEdgeSeqId;               /* Newest sequence in the receive window */
        u64     AverageSequencesInWindow;       /* Average number of sequences in receive window X */
        u64     MinSequencesInWindow;           /* The mininum number of sequences */
        u64     MaxSequencesInWindow;           /* The maximum number of sequences */
        u64     FirstNakSequenceNumber;         /* First outstanding nack sequence number */
        u64     NumPendingNaks;                 /* Number of sequences waiting for NCF */
        u64     NumOutstandingNaks;             /* Number of sequences waiting for RDATA */
        u64     NumDataPacketsBuffered;         /* Number of packets currently buffered */
        u64     TotalSelectiveNaksSent;         /* Number of NAKs sent total */
        u64     TotalParityNaksSent;            /* Number of parity NAKs sent */
};

struct pgm_sender_stats {
        u64     DataBytesSent;
        u64     TotalBytesSent;
        u64     NaksReceived;
        u64     NaksReceivedTooLate;            /* NAKs received after receive window advanced */
        u64     NumOutstandingNaks;             /* Number of NAKs awaiting response */
        u64     NumNaksAfterRData;              /* Number of NAKs after RDATA sequences were sent which were ignored */
        u64     RepairPacketsSent;
        u64     BufferSpaceAvailable;           /* Number of partial messages dropped */
        u64     TrailingEdgeSeqId;              /* Oldest sequence id in window */
        u64     LeadingEdgeSeqId;               /* Newest sequence id in window */
        u64     RateKBitsPerSecOverall;         /* Rate since start of session X */
        u64     RateKBitsPerSecLast;            /* Rate in last second X */
        u64     TotalODataPacketsSent;          /* Total data packets transmitted */
};

/* Setup of sender RateKbitsPerSec = WindowSizeBytes / WindowSizeMSecs */
struct pgm_send_window {
        u64     RateKbitsPerSec;                /* Allowed rate for the sender in kbits per second */
        u64     WindowSizeInMSecs;              /* Send window size in time */
        u64     WindowSizeInBytes;              /* Window size in bytes */
};

struct pgm_fec_info {
        u16     FECBlockSize;                   /* Maximum number of packets for a group. Default and max = 255 */
        u16     FECProActivePackets;            /* Number of proactive packets per group. */
        u8      FECGroupSize;                   /* Number of packets to be treated as a group. Power of two */
        int     fFECOnDemandParityEnabled;      /* Allow sender to sent parity repair packets */
};


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/