Berkeley and Microsoft socket models that are mostly compatible on the source code level are not so cross-platform in practice.
Letโs examine some subtle differences in their implementation. These differences were found when writing a cross-platform RPC for redirection of network calls of some process from one OS to another.
Contents:
Socket Types
1. BSD:
int
2. Win:
void * // macros SOCKET
While the processor capacity is 32 bits, there are no problems in mutual displaying. On Windows 64 bits, the SOCKET type is twice larger in size.
The socket descriptor on BSD does not differ from the file descriptor. It means that some system calls accept descriptors of sockets and files simultaneously (for example, such commonly used calls as close()
, fcntl()
, and ioctl()
).
There is one more side effect that appears in some cases. The matter is that systems, which support Berkeley model, have a small numerical value of the socket descriptor (less than 100) and the descriptors that are created in succession differ on 1. In the Microsoft model, such descriptors have values that are approximately more than 200 at once, and the descriptors created in succession differ on sizeof(SOCKET).
Error Handling
- BSD: Calls return -1, global variable
errno
is set. - Win: Calls return -1 (
SOCKET_ERROR
macro), we receive the status withWSAGetLastError()
.
errno
constants and Windows error codes have absolutely different values.
Socket creation
socket(int af, int type, int protocol);
Constants for the first argument have absolutely different values on BSD and Windows. Constants for the second argument coincide so far.
Socket Setting
1. BSD:
getsockopt(int sockfd, int level, int option_name, void *option_value, socklen_t *option_len);
setsockopt(int sockfd, int level, int option_name, void const *option_value, socklen_t option_len);
2. Win:
getsockopt(SOCKET sock, int level, int option_name, void *option_value, socklen_t *option_len);
setsockopt(SOCKET sock, int level, int option_name, void const *option_value, socklen_t option_len)
Flag constants for the second and third arguments have absolutely different values on BSD and Windows.
Socket Setting 2
1. BSD:
fcntl(int fd, int cmd, ...);
2.Win
ioctlsocket(SOCKET sock, long cmd, long unsigned *arg);
The only completely correct correspondence is as follows:
fcnlt(descriptor, F_SETFL, O_NONBLOCK) -> ioctlsocket(descriptor, FIONBIO, address of the variable with the O_NONBLOCK value).
Flag numerical values should be considered in regard to the target system (they are different on BSD and Windows systems).
At the same time, we can return 0 or O_RDWR
for the call of the fcnlt(descriptor, F_GETFL)
type.
Socket Setting 3
1. BSD:
ioctl(int fd, int cmd, ...);
2.Win
ioctlsocket(SOCKET sock, long cmd, long unsigned *arg);
The cases of real usage of ioctl()
with the socket as the first argument have not been discovered so far.
Work with DNS
getaddrinfo(char const *node, char const *service, struct addrinfo const *hints, struct addrinfo **res)
1. BSD:
struct addrinfo
{
int ai_flags;
int ai_family;
int ai_socktype;
int ai_protocol;
socklen_t ai_addrlen;
struct sockaddr *ai_addr;
char *ai_canonname;
struct addrinfo *ai_next;
};
2. Win:
typedef struct addrinfo
{
int ai_flags;
int ai_family;
int ai_socktype;
int ai_protocol;
size_t ai_addrlen;
char *ai_canonname;
struct sockaddr_ *ai_addr;
struct addrinfo_ *ai_next;
} ADDRINFOA, *PADDRINFOA;
Pay attention to the invariants of these structures. ai_addr
and ai_canonname
have different offsets from the beginning of the structure. Developers just rearranged them (or mixed up?).
Data Transfer
1. BSD:
recv(int sockfd, void *buffer, size_t length, int flags);
recvfrom(int sockfd, void *buffer, size_t length, int flags, struct sockaddr *from, socklen_t *fromlen);
send(int sockfd, void const *buffer, size_t length, int flags);
sendto(int sockfd, void const *buffer, size_t length, int flags, struct sockaddr const *to, socklen_t tolen);
2. Win:
recv(SOCKET sock, void *buffer, size_t length, int flags);
recvfrom(SOCKET sock, void *buffer, size_t length, int flags, struct sockaddr *from, socklen_t *fromlen);
send(SOCKET sock, void const *buffer, size_t length, int flags);
sendto(SOCKET sock, void const *buffer, size_t length, int flags, struct sockaddr const *to, socklen_t tolen);
Flags for the fourth argument have absolutely different values on BSD and Windows.
Waiting for Operations
1. BSD:
poll(struct pollfd *fds, nfds_t nfds, int timeout);
struct pollfd
{
int fd;
short events;
short revents;
};
2. Win:
WSAPoll(struct pollfd *fds, nfds_t nfds, int timeout);
typedef struct pollfd
{
SOCKET sock;
WORD events;
WORD revents;
} WSAPOLLFD, *PWSAPOLLFD;
Flag constants for the second and third invariants of the pollfd
structure have absolutely different values on BSD and Windows. WSAPoll()
is present only in Windows of the 6th version (Vista) and higher.
Waiting for Operations 2
1. BSD:
select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout);
typedef struct
{
long fds_bits[FD_SETSIZE / 8 * sizeof(long)];
} fd_set;
2. Win:
select(int nfds, FDSET *readfds, FDSET *writefds, FDSET *errorfds, struct timeval *timeout);
typedef struct fd_set
{
unsigned fd_count;
SOCKET fd_array[FD_SETSIZE];
} FDSET, *PFDSET;
The problem in the select
procedure appears while mutual reflection of the fd_set
structure. Letโs recollect how select()
works. This call accepts three sets of sockets: for checking reading, writing, and errors during some period of time. You can add your own socket for checking to one of these sets via the FD_SET(socket, set)
macro. To check the socket on being installed, use the FD_ISSET(socket, set)
macro; to delete one socket from the set, use the FD_CLR(socket, set)
macro; to delete all sockets, use the FD_ZERO(set)
macro. After the call, select()
leaves only those sockets in the corresponding sets, which got the expected state during the time out defined by the last argument.
For BSD, adding of some socket to some set consists in setting its bit which number is equal to the socket descriptor. FD_SETSIZE
is usually equal to 1024. The first select()
argument is one bigger than the maximum numerical value of the socket descriptor that is a part of any of three sets. Taking into account that setting of a bit in the fds_bits
array is performed without the check of range, it becomes clear that the program behavior is undefined with the socket descriptor value equal to or greater than FD_SETSIZE
. Such rather unreliable implementation of select
is a remnant of computers with little memory. Besides, in such case, an indirect conversion of int -> SOCKET
and vice versa is important.
For Windows, adding of some socket to some set consists in its insertion to the fd_array
array by the fd_count
index and the further increase of the latter one. FD_SETSIZE
is usually equal to 64. At the same time, the first select()
argument is skipped at all.
Implementation Details
Here is a certain useful code that I used in my project.
First, it is supposed that we somehow managed to redirect standard network calls to the GLibC library to our implementations (for example, see https://www.apriorit.com/dev-blog/181-elf-hook). Besides, we have some mechanism of a synchronous RPC that performs the serialization of parameters and the transfer of calls from Linux to Windows. Also, there are declarations of all required Windows constants so that they do not cross with Linux analogs.
As socket types are different on the systems, the following class for converting of descriptors during the call transfer will prove useful:
class SocketsStorage
{
public :
bool hasSocket(int const fd);
int addSocket(SOCKET const handle);
void removeSocket(int const fd);
SOCKET convert(int const fd);
int convert(SOCKET const handle);
private :
typedef std::map<int, SOCKET> sockets_map;
sockets_map map_;
};
Its implementation can look as follows:
bool SocketsStorage::hasSocket(int const fd)
{
sockets_map::iterator i = map_.find(fd);
return map_.end() != i ? true : false;
}
int SocketsStorage::addSocket(SOCKET const handle)
{
if (INVALID_SOCKET == handle)
return reinterpret_cast<int>(INVALID_SOCKET);
static int const min = FD_SETSIZE - FD_SETSIZE / 4; //big enough to avoid file descriptors conflict but less than FD_SETSIZE
static int const max = FD_SETSIZE;
for (int fd = min; fd < max; ++fd)
{
sockets_map::iterator i = map_.find(fd);
if (map_.end() == i)
{
map_[fd] = handle;
return fd;
}
}
}
return reinterpret_cast<int>(INVALID_SOCKET);
}
void SocketsStorage::removeSocket(int const fd)
{
s_sockets.erase(fd);
}
SOCKET SocketsStorage::convert(int const fd)
{
return hasSocket(fd) ? map_[fd] : INVALID_SOCKET;
}
int SocketsStorage::convert(SOCKET const handle)
{
sockets_map::iterator i = map_.begin();
sockets_map::const_iterator end = map_.end();
while (end != i)
if (socket == (*i).second)
return (*i).first;
return reinterpret_cast<int>(INVALID_SOCKET);
}
For the created socket, the first pseudodescriptor will have the 768 value. It is rather a lot for real descriptors whose values begin from about 6 but less than FD_SETSIZE
to work out select()
correctly filling its FD_SET
.
Also, we need functions of constants converting for certain calls:
void select2WSASelect(fd_set *bsd, fd_set_ *win, sockets_map &sockets);
void WSASelect2select(fd_set_ *win, fd_set *bsd, sockets_map &sockets);
int WSA2errno(int const WSA);
int domain2WSAdomain(int domain);
int WSAdomain2domain(int domain);
void WSA2sockopt(int *level, int *option);
void sockopt2WSA(int *level, int *option);
short WSAPoll2poll(short const flags);
short poll2WSAPoll(short const flags);
int msgFlags2WSAmsgFlags(int flags);
int WSAmsgFlags2msgFlags(int flags);
Examples of implementation of certain redirected functions are the following:
int socket(int domain, int type, int protocol)
{
int ret = reinterpret_cast<int>(INVALID_SOCKET);
errno = 0;
RPC_SOCKET_REQUEST request;
RPC_SOCKET_RESPONSE response;
request.af = BSD2WSA::domain2WSAdomain(domain);
request.type = type;
request.protocol = protocol;
if (response = static_cast<RPC_SOCKET_RESPONSE>(sendSyncRequest(request)))
{
ret = g_sockets.addSocket(response.socket);
if (INVALID_SOCKET == response.socket)
errno = SD2WSA::WSA2errno(response->errno);
}
else
errno = EAGAIN;
return ret;
}
int poll(struct pollfd *fds, nfds_t nfds, int timeout)
{
int ret = reinterpret_cast<int>(INVALID_SOCKET);
errno = 0;
RPC_POLL_REQUEST request;
RPC_POLL_RESPONSE response;
request.nfds = nfds;
request.timeout = timeout;
for (nfds_t i = 0; i < nfds; ++i)
{
if (g_sockets.hasSocket(fds[i].fd))
{
request.fds[i].fd = g_sockets.getSocket(fds[i].fd);
request.fds[i].events = BSD2WSA::poll2WSAPoll(fds[i].events);
}
else
request.fds[i].events = 0;
request.fds[i].revents = 0;
}
if (response = static_cast<RPC_POLL_RESPONSE>(sendSyncRequest(request)))
{
ret = response.ret;
if (SOCKET_ERROR == ret)
errno = BSD2WSA::WSA2errno(response.errno);
else
for (nfds_t i = 0; i < nfds; ++i)
if (g_sockets.hasSocket(fds[i].fd))
fds[i].revents = BSD2WSA::WSAPoll2poll(response.fds[i].revents);
}
else
errno = EAGAIN;
return ret;
}
int getaddrinfo(char const *node, char const *service, struct addrinfo const *hints, struct addrinfo **res)
{
int ret = reinterpret_cast<int>(INVALID_SOCKET);
errno = 0;
RPC_GETADDRINFO_REQUEST request;
RPC_GETADDRINFO_RESPONSE response;
request.node = node;
request.service = service;
request.ai_flags = hints->ai_flags;
request.ai_family = hints->ai_family;
request.ai_socktype = hints->ai_socktype;
request.ai_protocol = hints->ai_protocol;
request.res = res;
if (response = static_cast<RPC_GETADDRINFO_RESPONSE>(sendSyncRequest(request)))
{
ret = response.ret;
if (SOCKET_ERROR == ret)
errno = BSD2WSA::WSA2errno(response.errno);
else
{
struct addrinfo *q = 0, *prev = 0;
*res = 0;
for (PADDRINFOA p = response.res; p; p = p->ai_next)
{
q = (struct addrinfo *)::malloc(sizeof(struct addrinfo));
::memcpy(q, p, sizeof(struct addrinfo));
if (p->ai_addr)
{
q->ai_addr = (struct sockaddr *)::malloc(sizeof(struct sockaddr));
::memcpy(q->ai_addr, p->ai_addr, sizeof(struct sockaddr) > p->ai_addrlen ? p->ai_addrlen : sizeof(struct sockaddr));
}
else
{
q->ai_addr = 0;
q->ai_addrlen = 0;
}
if (p->ai_canonname)
{
size_t len = ::strlen(p->ai_canonname);
len = len > 0x100 ? 0x100 : len; //if there was an error during transferring
q->ai_canonname = (char *)::malloc(len + 1);
::memcpy(q->ai_canonname, p->ai_canonname, len);
q->ai_canonname[len] = 0;
}
else
q->ai_canonname = 0;
q->ai_next = 0;
if (!*res) //only for the first time
*res = q;
if (prev)
prev->ai_next = q;
prev = q;
}
}
}
else
errno = EAGAIN;
return ret;
}
ssize_t recv(int sockfd, void *buffer, size_t length, int flags)
{
int ret = reinterpret_cast<int>(INVALID_SOCKET);
errno = 0;
RPC_RECV_REQUEST request;
RPC_RECV_RESPONSE response;
request.s = g_sockets.convert(sockfd);
request.len = length;
request.flags = BSD2WSA::msgFlags2WSAmsgFlags(flags);
request.buf = buffer;
if (response = static_cast<RPC_RECV_RESPONSE>(sendSyncRequest(request)))
{
ret = response.ret;
if (SOCKET_ERROR == ret)
errno = BSD2WSA::WSA2errno(response.errno);
else
{
if (buffer)
::memcpy(buffer, response.buf, ret);
}
}
else
errno = EAGAIN;
return ret;
}
int close(int fd)
{
int ret = reinterpret_cast<int>(INVALID_SOCKET);
errno = 0;
if (g_sockets.hasSocket(fd))
{
errno = 0;
RPC_CLOSESOCKET_REQUEST request;
RPC_CLOSESOCKET_RESPONSE response;
request.s = g_sockets.convert(fd);
if (response = static_cast<RPC_CLOSESOCKET_RESPONSE>(sendSyncRequest(request)))
{
ret = response.ret;
if (SOCKET_ERROR == ret)
errno = BSD2WSA::WSA2errno(response.errno);
}
else
errno = EAGAIN;
}
else
{
errno = 0;
ret = ::close(fd);
}
return ret;
}
The complete example of the code with all definitions is attached to the article.
Continue reading with our Linux driver tutorial
Take a look at another dev blog article: Win debugger via USB.