History log of /freebsd-head/sys/sys/socket.h
Revision Date Author Comments
1a7f90755fb06c8dc4e880e0706c5b05a6c47c00 03-Oct-2020 melifaro <melifaro@FreeBSD.org> Introduce scalable route multipath.

This change is based on the nexthop objects landed in D24232.

The change introduces the concept of nexthop groups.
Each group contains the collection of nexthops with their
relative weights and a dataplane-optimized structure to enable
efficient nexthop selection.

Simular to the nexthops, nexthop groups are immutable. Dataplane part
gets compiled during group creation and is basically an array of
nexthop pointers, compiled w.r.t their weights.

With this change, `rt_nhop` field of `struct rtentry` contains either
nexthop or nexthop group. They are distinguished by the presense of
All dataplane lookup functions returns pointer to the nexthop object,
leaving nexhop groups details inside routing subsystem.

User-visible changes:

The change is intended to be backward-compatible: all non-mpath operations
should work as before with ROUTE_MPATH and net.route.multipath=1.

All routes now comes with weight, default weight is 1, maximum is 2^24-1.

Current maximum multipath group width is statically set to 64.
This will become sysctl-tunable in the followup changes.

Using functionality:
* Recompile kernel with ROUTE_MPATH
* set net.route.multipath to 1

route add -6 2001:db8::/32 2001:db8::2 -weight 10
route add -6 2001:db8::/32 2001:db8::3 -weight 20

netstat -6On

Nexthop groups data

GrpIdx NhIdx Weight Slots Gateway Netif Refcnt
1 ------- ------- ------- --------------------------------------- --------- 1
13 10 1 2001:db8::2 vlan2
14 20 2 2001:db8::3 vlan2

Next steps:
* Land outbound hashing for locally-originated routes ( D26523 ).
* Fix net/bird multipath (net/frr seems to work fine)
* Set net.route.multipath=1 by default

Tested by: olivier
Reviewed by: glebius
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D26449
a1c2e427c840e7270d299071291ad1c3f0da02ed 01-Sep-2020 mjg <mjg@FreeBSD.org> sys: clean up empty lines in .c and .h files
cb9f5d61bb3de67bf41c3138a9b89c667e82b78b 19-Aug-2020 rmacklem <rmacklem@FreeBSD.org> Add the MSG_TLSAPPDATA flag to indicate "return ENXIO" for non-application TLS
data records.

The kernel RPC cannot process non-application data records when
using TLS. It must to an upcall to a userspace daemon that will
call SSL_read() to process them.

This patch adds a new flag called MSG_TLSAPPDATA that the kernel
RPC can use to tell sorecieve() to return ENXIO instead of a non-application
data record, when that is what is at the top of the receive queue.
I put the code in #ifdef KERN_TLS/#endif, although it will build without
that, so that it is recognized as only useful when KERN_TLS is enabled.
The alternative to doing this is to have the kernel RPC re-queue the
non-application data message after receiving it, but that seems more
complicated and might introduce message ordering issues when there
are multiple non-application data records one after another.

I do not know what, if any, changes will be required to support TLS1.3.

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D25923
7a58be336ed276a7c0024a657b39f496d003de5d 20-May-2020 whu <whu@FreeBSD.org> HyperV socket implementation for FreeBSD

This change adds Hyper-V socket feature in FreeBSD. New socket address
family AF_HYPERV and its kernel support are added.

Submitted by: Wei Hu <weh@microsoft.com>
Reviewed by: Dexuan Cui <decui@microsoft.com>
Relnotes: yes
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D24061
4eda536b2e707981db4813a6d786610e223536ed 12-Apr-2020 melifaro <melifaro@FreeBSD.org> Introduce nexthop objects and new routing KPI.

This is the foundational change for the routing subsytem rearchitecture.
More details and goals are available in https://reviews.freebsd.org/D24141 .

This patch introduces concept of nexthop objects and new nexthop-based
routing KPI.

Nexthops are objects, containing all necessary information for performing
the packet output decision. Output interface, mtu, flags, gw address goes
there. For most of the cases, these objects will serve the same role as
the struct rtentry is currently serving.
Typically there will be low tens of such objects for the router even with
multiple BGP full-views, as these objects will be shared between routing
entries. This allows to store more information in the nexthop.

New KPI:

struct nhop_object *fib4_lookup(uint32_t fibnum, struct in_addr dst,
uint32_t scopeid, uint32_t flags, uint32_t flowid);
struct nhop_object *fib6_lookup(uint32_t fibnum, const struct in6_addr *dst6,
uint32_t scopeid, uint32_t flags, uint32_t flowid);

These 2 function are intended to replace all all flavours of
<in_|in6_>rtalloc[1]<_ign><_fib>, mpath functions and the previous
fib[46]-generation functions.

Upon successful lookup, they return nexthop object which is guaranteed to
exist within current NET_EPOCH. If longer lifetime is desired, one can
specify NHR_REF as a flag and get a referenced version of the nexthop.
Reference semantic closely resembles rtentry one, allowing sed-style conversion.

Additionally, another 2 functions are introduced to support uRPF functionality
inside variety of our firewalls. Their primary goal is to hide the multipath
implementation details inside the routing subsystem, greatly simplifying
firewalls implementation:

int fib4_lookup_urpf(uint32_t fibnum, struct in_addr dst, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);
int fib6_lookup_urpf(uint32_t fibnum, const struct in6_addr *dst6, uint32_t scopeid,
uint32_t flags, const struct ifnet *src_if);

All functions have a separate scopeid argument, paving way to eliminating IPv6 scope
embedding and allowing to support IPv4 link-locals in the future.

Structure changes:
* rtentry gets new 'rt_nhop' pointer, slightly growing the overall size.
* rib_head gets new 'rnh_preadd' callback pointer, slightly growing overall sz.

Old KPI:
During the transition state old and new KPI will coexists. As there are another 4-5
decent-sized conversion patches, it will probably take a couple of weeks.
To support both KPIs, fields not required by the new KPI (most of rtentry) has to be
kept, resulting in the temporary size increase.
Once conversion is finished, rtentry will notably shrink.

More details:
* architectural overview: https://reviews.freebsd.org/D24141
* list of the next changes: https://reviews.freebsd.org/D24232

Reviewed by: ae,glebius(initial version)
Differential Revision: https://reviews.freebsd.org/D24232
3e0ad7d79480a094c94213f0f8b94253918edc55 21-Aug-2018 tuexen <tuexen@FreeBSD.org> Add SOL_SOCKET level socket option with name SO_DOMAIN to get
the domain of a socket.

This is helpful when testing and Solaris and Linux have the same
socket option using the same name.

Reviewed by: bcr@, rrs@
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D16791
d0aeaa5af7f77964d05bcb54e2f5cfdeb187edaf 06-Jun-2018 sbruno <sbruno@FreeBSD.org> Load balance sockets with new SO_REUSEPORT_LB option.

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures:
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

As DragonflyBSD, a load balance group is limited to 256 pcbs (256 programs or
threads sharing the same socket).

This is a substantially different contribution as compared to its original
incarnation at svn r332894 and reverted at svn r332967. Thanks to rwatson@
for the substantive feedback that is included in this commit.

Submitted by: Johannes Lundberg <johalun0@gmail.com>
Obtained from: DragonflyBSD
Relnotes: Yes
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
257e6e5563df4d5f0a939c72acdef3c49c440e38 24-Apr-2018 sbruno <sbruno@FreeBSD.org> Revert r332894 at the request of the submitter.

Submitted by: Johannes Lundberg <johalun0_gmail.com>
Sponsored by: Limelight Networks
bbf7d4dd035a71710ac94fe1ada4d99244102159 23-Apr-2018 sbruno <sbruno@FreeBSD.org> Load balance sockets with new SO_REUSEPORT_LB option

This patch adds a new socket option, SO_REUSEPORT_LB, which allow multiple
programs or threads to bind to the same port and incoming connections will be
load balanced using a hash function.

Most of the code was copied from a similar patch for DragonflyBSD.

However, in DragonflyBSD, load balancing is a global on/off setting and can not
be set per socket. This patch allows for simultaneous use of both the current
SO_REUSEPORT and the new SO_REUSEPORT_LB options on the same system.

Required changes to structures
Globally change so_options from 16 to 32 bit value to allow for more options.
Add hashtable in pcbinfo to hold all SO_REUSEPORT_LB sockets.

As DragonflyBSD, a load balance group is limited to 256 pcbs
(256 programs or threads sharing the same socket).

Submitted by: Johannes Lundberg <johanlun0@gmail.com>
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11003
4736ccfd9c3411d50371d7f21f9450a47c19047e 20-Nov-2017 pfg <pfg@FreeBSD.org> sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
b5d757f3ab961d1a40b3706646c72a4be99308ad 07-Nov-2017 kib <kib@FreeBSD.org> Use hardware timestamps to report packet timestamps for SO_TIMESTAMP
and other similar socket options.

Provide new control message SCM_TIME_INFO to supply information about
timestamp. Currently it indicates that the timestamp was
hardware-assisted and high-precision, for software timestamps the
message is not returned. Reserved fields are added to ABI to report
additional info about it, it is expected that raw hardware clock value
might be useful for some applications.

Reviewed by: gallatin (previous version), hselasky
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D12638
e35d543ec17b735ba76bb3311ed4a430d6cc945e 08-Jun-2017 glebius <glebius@FreeBSD.org> Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.

o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770
7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da 28-Feb-2017 imp <imp@FreeBSD.org> Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
57e936605b6954f8840ca19d8b57bf944f536efc 18-Jan-2017 glebius <glebius@FreeBSD.org> Format and sort MSG_* flags, to prevent misedits in future. There is no
functional change.
7c0bc68f8a2fa4cc3d769c2c603e79dc808d2bc6 18-Jan-2017 glebius <glebius@FreeBSD.org> Fix regression from r311568: collision of MSG_NOSIGNAL with MSG_MORETOCOME
lead to delayed send of data sent with sendto(MSG_NOSIGNAL).

Submitted by: rrs
efa6326974ec2cdb6721fec731bcd86758d0877c 18-Jan-2017 hselasky <hselasky@FreeBSD.org> Implement kernel support for hardware rate limited sockets.

- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network

Reviewed by: wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision: https://reviews.freebsd.org/D3687
Sponsored by: Mellanox Technologies
MFC after: 3 months
701697521cdd74f2f7cfee4e18bd672544300bb3 16-Jan-2017 sobomax <sobomax@FreeBSD.org> Add a new socket option SO_TS_CLOCK to pick from several different clock
sources to return timestamps when SO_TIMESTAMP is enabled. Two additional
clock sources are:

o nanosecond resolution realtime clock (equivalent of CLOCK_REALTIME);
o nanosecond resolution monotonic clock (equivalent of CLOCK_MONOTONIC).

In addition to this, this option provides unified interface to get bintime
(equivalent of using SO_BINTIME), except it also supported with IPv6 where
SO_BINTIME has never been supported. The long term plan is to depreciate
SO_BINTIME and move everything to using SO_TS_CLOCK.

Idea for this enhancement has been briefly discussed on the Net session
during dev summit in Ottawa last June and the general input was positive.

This change is believed to benefit network benchmarks/profiling as well
as other scenarios where precise time of arrival measurement is necessary.

There are two regression test cases as part of this commit: one extends unix
domain test code (unix_cmsg) to test new SCM_XXX types and another one
implementis totally new test case which exchanges UDP packets between two
processes using both conventional methods (i.e. calling clock_gettime(2)
before recv(2) and after send(2)), as well as using setsockopt()+recv() in
receive path. The resulting delays are checked for sanity for all supported
clock types.

Reviewed by: adrian, gnn
Differential Revision: https://reviews.freebsd.org/D9171
cb5debd34fd20768247322904944cd3c69836b10 06-Jan-2017 jhb <jhb@FreeBSD.org> Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by: adrian
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D8955
b739d603442d3aa17f07d55918ca3a07eefa24f7 17-Nov-2016 glebius <glebius@FreeBSD.org> Add flag SF_USER_READAHEAD to sendfile(2). When specified, the syscall won't
do any speculations about readahead, and use exactly the amount of readahead
specified by user. E.g. setting SF_FLAGS(0, SF_USER_READAHEAD) will guarantee
that no readahead at all will be performed.
25d51f7a7413a1c11861f0b38015ef6a871d9a36 31-May-2016 ed <ed@FreeBSD.org> Make CMSG_*() work without having NULL available.

The <sys/socket.h> is not supposed to declare NULL, according to POSIX.
Our implementation complies with that, meaning that we need to make sure
that CMSG_*() doesn't use it.
00d578928eca75be320b36d37543a7e2a4f9fbdb 27-May-2016 grehan <grehan@FreeBSD.org> Create branch for bhyve graphics import.
e05176a63dbba4794d3d611cf9072885b3cf1eb3 29-Mar-2016 glebius <glebius@FreeBSD.org> The sendfile(2) allows to send extra data from userspace before the file
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.

Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
6c0e620fdbcd382232aa0d3be852301f2a75876d 29-Jan-2016 kib <kib@FreeBSD.org> Add implementations of sendmmsg(3) and recvmmsg(3) functions which
wraps sendmsg(2) and recvmsg(2) into batch send and receive operation.
The goal of this implementation is only to provide API compatibility
with Linux.

The cancellation behaviour of the functions is not quite right, but
due to relative rare use of cancellation it is considered acceptable
comparing with the complexity of the correct implementation. If
functions are reimplemented as syscalls, the fix would come almost
trivial. The direct use of the syscall trampolines instead of libc
wrappers for sendmsg(2) and recvmsg(2) is to avoid data loss on

Submitted by: Boris Astardzhiev <boris.astardzhiev@gmail.com>
Discussed with: jilles (cancellation behaviour)
MFC after: 1 month
aaa09777e1d9fde5591814af536025be01a0182f 08-Jan-2016 glebius <glebius@FreeBSD.org> New sendfile(2) syscall. A joint effort of NGINX and Netflix from 2013 and
up to now.

The new sendfile is the code that Netflix uses to send their multiple tens
of gigabits of data per second. The new implementation features asynchronous
I/O, when I/O operations are launched, but not awaited to be complete. An
explanation of why such behavior is beneficial compared to old one is
going to be too long for a commit message, so we will skip it here.

Additional features of new syscall are extra flags, which provide an
application more control over data sent. The SF_NOCACHE flag tells
kernel that data shouldn't be cached after it was sent. The SF_READAHEAD()
macro allows to specify readahead size in pages.

The new syscalls is a drop in replacement. No modifications are required
to applications. One can take nginx binary for stable/10 and run it
successfully on head. Although SF_NODISKIO lost its original sense, as now
sendfile doesn't block, and now means something completely different (tm),
using the new sendfile the old way is absolutely safe.

Celebrates: Netflix global launch!
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
Relnotes: yes
53273c84d07812ce5a86db41d425a50e20397b64 11-Nov-2014 glebius <glebius@FreeBSD.org> Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by: Netflix
d9d6b88f18b92428357f47580e9e331def5c7f9d 25-Feb-2014 jhb <jhb@FreeBSD.org> Remove more constants related to static sysctl nodes. The MAXID constants
were primarily used to size the sysctl name list macros that were removed
in r254295. A few other constants either did not have an associated
sysctl node, or the associated node used OID_AUTO instead.

PR: ports/184525 (exp-run)
eb1a5f8de9f7ea602c373a710f531abbf81141c4 21-Feb-2014 gjb <gjb@FreeBSD.org> Move ^/user/gjb/hacking/release-embedded up one directory, and remove
^/user/gjb/hacking since this is likely to be merged to head/ soon.

Sponsored by: The FreeBSD Foundation
103d7982d20f88e2b4adc26fa3f034fab18d6635 17-Jan-2014 adrian <adrian@FreeBSD.org> Implement the extension api for sendfile to allow for kqueue notifications.

This is still under a bit of flux, as the final API hasn't been nailed
down. It's also unclear whether we should define the two new types in the
header or not - it may allow bad code to compile that shouldn't (ie,
since uintX's are defined, the developer may not include sys/types.h.)

Reviewed by: peter, imp, bde
Sponsored by: Netflix, Inc.
6b01bbf146ab195243a8e7d43bb11f8835c76af8 27-Dec-2013 gjb <gjb@FreeBSD.org> Copy head@r259933 -> user/gjb/hacking/release-embedded for initial
inclusion of (at least) arm builds with the release.

Sponsored by: The FreeBSD Foundation
a437be72574480c1aedb04a776cc369e323fba4a 26-Aug-2013 jhb <jhb@FreeBSD.org> Remove most of the remaining sysctl name list macros. They were only
ever intended for use in sysctl(8) and it has not used them for many

Reviewed by: bde
Tested by: exp-run by bdrewery
722a1a5e5d54a4935a4136368f443f6c88ca0d71 15-Aug-2013 glebius <glebius@FreeBSD.org> Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
5f141f7f1fc16a09ae4b1cc9defb6db495128161 09-Aug-2013 jeff <jeff@FreeBSD.org> - Reserve a special AF for SDP. The one we were incorrectly using before
was taken by another AF.

Sponsored by: EMC / Isilon Storage Division
35a29fbe26a03efc3163059252a25c0b6f632657 13-Jun-2013 kevlo <kevlo@FreeBSD.org> Add PF_IEEE80211 definition.

Reviewed by: rpaulo
299afd25fd6dead39bf5c78572782db885579911 01-May-2013 jilles <jilles@FreeBSD.org> Add accept4() system call.

The accept4() function, compared to accept(), allows setting the new file
descriptor atomically close-on-exec and explicitly controlling the
non-blocking status on the new socket. (Note that the latter point means
that accept() is not equivalent to any form of accept4().)

The linuxulator's accept4 implementation leaves a race window where the new
file descriptor is not close-on-exec because it calls sys_accept(). This
implementation leaves no such race window (by using falloc() flags). The
linuxulator could be fixed and simplified by using the new code.

Like accept(), accept4() is async-signal-safe, a cancellation point and
permitted in capability mode.
c0fce542aaa122a3a09b673246b404ab20f2dc42 30-Mar-2013 jilles <jilles@FreeBSD.org> Improve namespacing in <sys/socket.h>:

* MSG_NOSIGNAL is in POSIX.1-2008.
* PRU_FLUSH_* (SCTP) are not in POSIX.
* bindat()/connectat() are not in POSIX.

Discussed with: rrs (PRU_FLUSH_*)
c9066bd014b20089911abc91a8c87ef738498a28 19-Mar-2013 jilles <jilles@FreeBSD.org> Implement SOCK_CLOEXEC, SOCK_NONBLOCK and MSG_CMSG_CLOEXEC.

This change allows creating file descriptors with close-on-exec set in some
situations. SOCK_CLOEXEC and SOCK_NONBLOCK can be OR'ed in socket() and
socketpair()'s type parameter, and MSG_CMSG_CLOEXEC to recvmsg() makes file
descriptors (SCM_RIGHTS) atomically close-on-exec.

The numerical values for SOCK_CLOEXEC and SOCK_NONBLOCK are as in NetBSD.
MSG_CMSG_CLOEXEC is the first free bit for MSG_*.

The SOCK_* flags are not passed to MAC because this may cause incorrect
failures and can be done later via fcntl() anyway. On the other hand, audit
is expected to cope with the new flags.

For MSG_CMSG_CLOEXEC, unp_externalize() is extended to take a flags

Reviewed by: kib
702516e70b2669b5076691a0b760b4a37a8c06a2 02-Mar-2013 pjd <pjd@FreeBSD.org> - Implement two new system calls:

int bindat(int fd, int s, const struct sockaddr *addr, socklen_t addrlen);
int connectat(int fd, int s, const struct sockaddr *name, socklen_t namelen);

which allow to bind and connect respectively to a UNIX domain socket with a
path relative to the directory associated with the given file descriptor 'fd'.

- Add manual pages for the new syscalls.

- Make the new syscalls available for processes in capability mode sandbox.

- Add capability rights CAP_BINDAT and CAP_CONNECTAT that has to be present on
the directory descriptor for the syscalls to work.

- Update audit(4) to support those two new syscalls and to handle path
in sockaddr_un structure relative to the given directory descriptor.

- Update procstat(1) to recognize the new capability rights.

- Document the new capability rights in cap_rights_limit(2).

Sponsored by: The FreeBSD Foundation
Discussed with: rwatson, jilles, kib, des
b55183a894ce23691df0c9c6b6fec645a777d401 01-Feb-2013 jhb <jhb@FreeBSD.org> Add placeholder constants to reserve a portion of the socket option
name space for use by downstream vendors to add custom options.

MFC after: 2 weeks
79e65f67d444ac3cdd9179557856fdbd87e82852 26-Feb-2012 kib <kib@FreeBSD.org> Add SO_PROTOCOL/SO_PROTOTYPE socket SOL_SOCKET-level option to get the
socket protocol number. This is useful since the socket type can
be implemented by different protocols in the same protocol family,
e.g. SOCK_STREAM may be provided by both TCP and SCTP.

Submitted by: Jukka A. Ukkonen <jau iki fi>
PR: kern/162352
Discussed with: bz
Reviewed by: glebius
MFC after: 2 weeks
38bf0509a452ec918d5b5c9c887b4f3a213ecc19 11-Feb-2012 bz <bz@FreeBSD.org> Properly name the sysctl to "iflistl" rather than "iflist2", which had been
the prototype name and slipped in in r231505.

Spotted in a reply from: bde
MFC after: 3 days
d05091db1d58d82a2d4c3cb7c1d505fd42a0a13f 11-Feb-2012 bz <bz@FreeBSD.org> Introduce a new NET_RT_IFLISTL API to query the address list. It works
on extended and extensible structs if_msghdrl and ifa_msghdrl. This
will allow us to extend both the msghdrl structs and eventually if_data
in the future without breaking the ABI.

Bump __FreeBSD_version to allow ports to more easily detect the new API.

Reviewed by: glebius, brooks
MFC after: 3 days
933648d638b9e6ad349de688d2137a3353d4ef9c 17-Apr-2011 jilles <jilles@FreeBSD.org> Allow using CMSG_NXTHDR with -Wcast-align.

If various checks are omitted, the CMSG_NXTHDR macro expands to
(struct cmsghdr *)((char *)(cmsg) + \
_ALIGN(((struct cmsghdr *)(cmsg))->cmsg_len))

Although there is no alignment problem (assuming cmsg is properly aligned
and _ALIGN is correct), this violates -Wcast-align on strict-alignment
architectures. Therefore an intermediate cast to void * is appropriate here.

There is no workaround other than not using -Wcast-align.

MFC after: 2 weeks
d5e8d236f4009fc2611f996c317e94b2c8649cf5 12-Nov-2010 luigi <luigi@FreeBSD.org> This commit implements the SO_USER_COOKIE socket option, which lets
you tag a socket with an uint32_t value. The cookie can then be
used by the kernel for various purposes, e.g. setting the skipto
rule or pipe number in ipfw (this is the reason SO_USER_COOKIE has
been implemented; however there is nothing ipfw-specific in its

The ipfw-related code that uses the optopn will be committed separately.

This change adds a field to 'struct socket', but the struct is not
part of any driver or userland-visible ABI so the change should be

See the discussion at

Idea and code from Paul Joe, small modifications and manpage
changes by myself.

Submitted by: Paul Joe
MFC after: 1 week
09f9c897d33c41618ada06fbbcf1a9b3812dee53 19-Oct-2010 jamie <jamie@FreeBSD.org> A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.
f1216d1f0ade038907195fc114b7e630623b402c 19-Mar-2010 delphij <delphij@FreeBSD.org> Create a custom branch where I will be able to do the merge.
d6d319f0caf23f5768b3a8150a81bd8b89d66ac5 12-Jan-2010 brooks <brooks@FreeBSD.org> MFC r201955:
Improve the comment about CMGROUP_MAX.
5d17dfc76dafb332d1aa93c109301e64bcc8d835 09-Jan-2010 brooks <brooks@FreeBSD.org> Improve the comment about CMGROUP_MAX.

MFC after: 3 days
e645b495eda0a345c1b9caa5f932817c25234633 08-Sep-2009 phk <phk@FreeBSD.org> Get rid of the _NO_NAMESPACE_POLLUTION kludge by creating an
architecture specific include file containing the _ALIGN*
stuff which <sys/socket.h> needs.
c8ab4ab72e60824c72824342463fc881d7c7885e 08-Sep-2009 phk <phk@FreeBSD.org> Move the duplicate definition of struct sockaddr_storage to its own
include file, and include this where the previous duplicate definitions were.

Static program checkers like FlexeLint rightfully take a dim view of
duplicate definitions, even if they currently are identical.
19b6af98ec71398e77874582eb84ec5310c7156f 22-Nov-2008 dfr <dfr@FreeBSD.org> Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64
performance in Xen's HVM mode.
cf5320822f93810742e3d4a1ac8202db8482e633 19-Oct-2008 lulf <lulf@FreeBSD.org> - Import the HEAD csup code which is the basis for the cvsmode work.
67cdccd7407c73e563b5f541bf5171314ed88c92 08-Aug-2008 delphij <delphij@FreeBSD.org> Add prototype defination for setfib(2) to sys/socket.h.
72b2c46b5408fc54c24932cb4578e6fa9f07a0f3 31-Jul-2008 kmacy <kmacy@FreeBSD.org> MFC accessor functions for socket fields.
45e15b689a758d70dff5046c44c1c4203ca37cb3 30-Jul-2008 kmacy <kmacy@FreeBSD.org> add socket options for disabling TOE
dc8d54c205784683ec1aae7ecf1f24fe1f6cb2c0 24-Jul-2008 julian <julian@FreeBSD.org> MFC an ABI compatible implementation of Multiple routing tables.
See the commit message for
version 1.129 (svn change # 178888) for more info.

Obtained from: Ironport (Cisco Systems)
565bc001a58e180375c810cb3401b67fd051972d 21-Jul-2008 kmacy <kmacy@FreeBSD.org> Add accessor functions for socket fields.

MFC after: 1 week
1dfc5c98a4f7c32163dfdc61e390ccf805385108 09-May-2008 julian <julian@FreeBSD.org> Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:


One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
packet streams to be routed by more than just the destination address.


I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
ee2393a548fd513cbaef4a7b55481c5a4fc6a646 29-Apr-2008 rrs <rrs@FreeBSD.org> Add pru_flush routine so a transport can
flush itself during Shutdown
5759bc8cd397f57572e3323bc985823017292c42 14-Apr-2008 rrs <rrs@FreeBSD.org> Add pru_flush routine so a transport can
flush itself during Shutdown

MFC after: 1 week
13132840a1ae0f962ec527dbcb06f4ab9ff6e629 03-Feb-2008 phk <phk@FreeBSD.org> Give sendfile(2) a SF_SYNC flag which makes it wait until all mbufs
referencing the files VM pages are returned from the network stack,
making changes to the file safe.

This flag does not guarantee that the data has been transmitted to the
other end.
facc60167bbbf9e19b3968e7e98afe0f6af4231a 12-Dec-2007 kmacy <kmacy@FreeBSD.org> Fix style issues with initial TCP offload commit

Requested by: rwatson
Submitted by: rwatson
95a448c7cbf19f26d0ed071ced5ce83aeeb2c4b2 12-Dec-2007 kmacy <kmacy@FreeBSD.org> Add driver independent interface to offload active established TCP connections

Reviewed by: silby
7100da0db27ffd5bfd649ef9860d16032761e89b 18-Sep-2007 alfred <alfred@FreeBSD.org> Reserve AF_ constants for vendors by giving them the odd numbered
AF_ constants ranging from 39 to 133.

Approved by: re (kensmith)
ffd77d9ba5a1376d64ccbb2909a7179c05de81bc 12-Jun-2007 bms <bms@FreeBSD.org> Import rewrite of IPv4 socket multicast layer to support source-specific
and protocol-independent host mode multicast. The code is written to
accomodate IPv6, IGMPv3 and MLDv2 with only a little additional work.

This change only pertains to FreeBSD's use as a multicast end-station and
does not concern multicast routing; for an IGMPv3/MLDv2 router
implementation, consider the XORP project.

The work is based on Wilbert de Graaf's IGMPv3 code drop for FreeBSD 4.6,
which is available at: http://www.kloosterhof.com/wilbert/igmpv3.html

* IPv4 multicast socket processing is now moved out of ip_output.c
into a new module, in_mcast.c.
* The in_mcast.c module implements the IPv4 legacy any-source API in
terms of the protocol-independent source-specific API.
* Source filters are lazy allocated as the common case does not use them.
They are part of per inpcb state and are covered by the inpcb lock.
* struct ip_mreqn is now supported to allow applications to specify
multicast joins by interface index in the legacy IPv4 any-source API.
* In UDP, an incoming multicast datagram only requires that the source
port matches the 4-tuple if the socket was already bound by source port.
An unbound socket SHOULD be able to receive multicasts sent from an
ephemeral source port.
* The UDP socket multicast filter mode defaults to exclusive, that is,
sources present in the per-socket list will be blocked from delivery.
* The RFC 3678 userland functions have been added to libc: setsourcefilter,
getsourcefilter, setipv4sourcefilter, getipv4sourcefilter.
* Definitions for IGMPv3 are merged but not yet used.
* struct sockaddr_storage is now referenced from <netinet/in.h>. It
is therefore defined there if not already declared in the same way
as for the C99 types.
* The RFC 1724 hack (specify addresses to IP_MULTICAST_IF
which are then interpreted as interface indexes) is now deprecated.
* A patch for the Rhyolite.com routed in the FreeBSD base system
is available in the -net archives. This only affects individuals
running RIPv1 or RIPv2 via point-to-point and/or unnumbered interfaces.
* Make IPv6 detach path similar to IPv4's in code flow; functionally same.
* Bump __FreeBSD_version to 700048; see UPDATING.

This work was financially supported by another FreeBSD committer.

Obtained from: p4://bms_netdev
Submitted by: Wilbert de Graaf (original work)
Reviewed by: rwatson (locking), silence from fenner,
net@ (but with encouragement)
7e51e6f339fdd9e1e88a4aeb84ce426f003cb0ac 22-May-2007 mtm <mtm@FreeBSD.org> MFC:
sys/sys/socket.h ver. 1.93
lib/libc/net/rthdr.c ver. 1.9
date: 2007/04/19 15:48:16; author: mtm; state: Exp; lines: +4 -2
Make inet6_rth_* family of functions more compliant with RFC3542:
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.
b79eceb7f37331e0724c1eb4b575b3b779b8a47b 19-Apr-2007 mtm <mtm@FreeBSD.org> Make inet6_rth_* family of functions more compliant with RFC3542:
1. CMSG_NXTHDR(mhdr, cmsg) is supposed to dereference cmsg and return
the next header in the chain. If cmsg is NULL it should return
the first header, behaving essentially like CMSG_FIRSTHDR().
2. inet6_rth_(space|init|add) should do basic checking on their input
to verify that the number of headers (segments) is
between 0 and 127 inclusive.

MFC-After: 1 month
3d3e3f2242423b47549f89486754bc40030fbe9f 03-Nov-2006 rrs <rrs@FreeBSD.org> Ok, here it is, we finally add SCTP to current. Note that this
work is not just mine, but it is also the works of Peter Lei
and Michael Tuexen. They both are my two key other developers
working on the project.. and they need ata-boy's too:
I did do a make sysent which updated the
syscall's and sysproto.. I hope that is correct... without
it you don't build since we have new syscalls for SCTP :-0

So go out and look at the NOTES, add
option SCTP (make sure inet and inet6 are present too)
and play with SCTP.

I will see about comitting some test tools I have after I
figure out where I should place them. I also have a
lib (libsctp.a) that adds some of the missing socketapi
functions that I need to put into lib's.. I will talk
to George about this :-)

There may still be some 64 bit issues in here, none of
us have a 64 bit processor to test with yet.. Michael
may have a MAC but thats another beast too..

If you have a mac and want to use SCTP contact Michael
he maintains a web site with a loadable module with
this code :-)

Reviewed by: gnn
Approved by: gnn
42a69dcadeb2a9e395f7747f2d88a1ea76673757 02-Nov-2006 andre <andre@FreeBSD.org> Rewrite kern_sendfile() to work in two loops, the inner which turns as many
VM pages into mbufs as it can -- up to the free send socket buffer space.
The outer loop then drops the whole mbuf chain into the send socket buffer,
calls tcp_output() on it and then waits until 50% of the socket buffer are
free again to repeat the cycle. This way tcp_output() gets the full amount
of data to work with and can issue up to 64K sends for TSO to chop up in
the network adapter without using any CPU cycles. Thus it gets very efficient
especially with the readahead the VM and I/O system do.

The previous sendfile(2) code simply looped over the file, turned each 4K
page into an mbuf and sent it off. This had the effect that TSO could only
generate 2 packets per send instead of up to 44 at its maximum of 64K.

Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of
sleeping on mbuf allocation failures.

Benchmarking shows significant improvements (95% confidence):
45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO)
83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO)

(Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver
DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back
at 1000Base-TX full duplex.)

Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 month
a245550432954c7f1d3eebf64cf691435cc5f2f2 26-Jul-2006 sam <sam@FreeBSD.org> add support for 802.11 packet injection via bpf

Together with: Andrea Bittau <a.bittau@cs.ucl.ac.uk>
Reviewed by: arch@
MFC after: 1 month
bb13f5c766414b14b1a9e533c1d7ff42deec2d64 27-Sep-2005 rwatson <rwatson@FreeBSD.org> Merge uipc_socket.c:1.249, socket.h:1.89 from HEAD to RELENG_6:

Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them

Discussed with: andre

Approved by: re (scottl)
bfb05b4b93adffb0e9783a141f4a30195b6a6cb9 18-Sep-2005 rwatson <rwatson@FreeBSD.org> Add three new read-only socket options, which allow regression tests
and other applications to query the state of the stack regarding the
accept queue on a listen socket:

SO_LISTENQLIMIT Return the value of so_qlimit (socket backlog)
SO_LISTENQLEN Return the value of so_qlen (complete sockets)
SO_LISTENINCQLEN Return the value of so_incqlen (incomplete sockets)

Minor white space tweaks to existing socket options to make them

Discussed with: andre
MFC after: 1 week
bdcac6ad82d9d15d367abad3a4d81e966455070b 13-Apr-2005 mdodd <mdodd@FreeBSD.org> Implement unix(4) socket options LOCAL_CREDS and LOCAL_CONNWAIT.

- Add unp_addsockcred() (for LOCAL_CREDS).
- Add an argument to unp_connect2() to differentiate between

Obtained from: NetBSD (with some changes)
93318a5e79cd3c39a99f56cbb64d664291aff16f 09-Mar-2005 alfred <alfred@FreeBSD.org> Make MSG_NOSIGNAL available to native programs.
Bump FreeBSD_version to note this change.

Reviewed by: sobomax
b795e2430a08adbe58c55709aea6123ff893cd2c 08-Mar-2005 sobomax <sobomax@FreeBSD.org> Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting

Suggested by: alfred
f0bf889d0d2ea7d83fd3b67266a98c89cdf14853 07-Jan-2005 imp <imp@FreeBSD.org> /* -> /*- for license, minor formatting changes
8932ce4fb41694ceb46a12ae914556cbd9d98e3b 29-Nov-2004 ps <ps@FreeBSD.org> If soreceive() is called from a socket callback, there's no reason
to do a window update to the peer (thru an ACK) from soreceive()
itself. TCP will do that upon return from the socket callback.
Sending a window update from soreceive() results in a lock reversal.

Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
000bdd697d665b156279bc31444d15997339b7f6 11-Aug-2004 andre <andre@FreeBSD.org> RFC 2292 requires to check msg_controllen, in case that the kernel returns
an empty list for some reasons.

Obtained from:

NetBSD: socket.h,v 1.62 2001/09/07 08:13:01 itojun
OpenBSD: socket.h,v 1.39 2001/09/07 16:45:25 itojun

MFC after: 2 weeks
3a094836b49d08d76bd712b5b411157e96c4878d 16-Jul-2004 harti <harti@FreeBSD.org> According to POSIX sys/socket.h must define CMSG_NXTHDR but most not
define NULL. This means we cannot use NULL in the definition of CMSG_NXTHDR.
So replace NULL with 0.

PR: kern/60309
Submitted by: Jeff King <peff-freebsd@peff.net>
967b923f982b26fe406c1acdd924f406900cc18e 01-Jun-2004 truckman <truckman@FreeBSD.org> Whitespace correction - #define should be followed by a tab.
d503c79cad595bd26f835062ce4bb6ef858857eb 01-Jun-2004 truckman <truckman@FreeBSD.org> Add MSG_NBIO flag option to soreceive() and sosend() that causes
them to behave the same as if the SS_NBIO socket flag had been set
for this call. The SS_NBIO flag for ordinary sockets is set by
fcntl(fd, F_SETFL, O_NONBLOCK).

Pass the MSG_NBIO flag to the soreceive() and sosend() calls in
fifo_read() and fifo_write() instead of frobbing the SS_NBIO flag
on the underlying socket for each I/O operation. The O_NONBLOCK
flag is a property of the descriptor, and unlike ordinary sockets,
fifos may be referenced by multiple descriptors.
7cd8d7675e61081cb557719b3aad6c1e290ccf3f 10-May-2004 emax <emax@FreeBSD.org> Mode few Bluetooth defines into system include files

Reviewed by: imp
9e1925f5605eb84767aaedbc35ab643a8f9f90a5 07-Apr-2004 imp <imp@FreeBSD.org> Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
deef4b06daa12b66b70e4705d33e8d8f6a587f8c 14-Mar-2004 mdodd <mdodd@FreeBSD.org> Define AF_ARP/PF_ARP.
9428de17dec59f3de2ef6169eb732876b548dce7 08-Feb-2004 silby <silby@FreeBSD.org> Add the SF_NODISKIO flag to sendfile. This flag causes sendfile to be
mindful of blocking on disk I/O and instead return EBUSY when such
blocking would occur.

Results from the DeBox project indicate that blocking on disk I/O
can slow the performance of a kqueue/poll based webserver. Using
a flag such as SF_NODISKIO and throwing connections that would block
to helper processes/threads helped increase performance.

Currently, only the Flash webserver uses this flag, although it could
probably be applied to thttpd with relative ease.

Idea by: Yaoping Ruan & Vivek Pai
35592de77bccf63789e2fd7396b81328733450f9 31-Jan-2004 phk <phk@FreeBSD.org> Introduce the SO_BINTIME option which takes a high-resolution timestamp
at packet arrival.

For benchmarking purposes SO_BINTIME is preferable to SO_TIMEVAL
since it has higher resolution and lower overhead. Simultaneous
use of the two options is possible and they will return consistent

This introduces an extra test and a function call for SO_TIMEVAL, but I have
not been able to measure that.
c4b798a73d1f2d92567266fb70735b27ddba63cf 24-Dec-2003 alfred <alfred@FreeBSD.org> Add restrict qualifiers.

PR: 44394
Submitted by: Craig Rodrigues <rodrige@attbi.com>
0cfaf203b6dc7906d6cf6e6df3d4999dec616f45 14-Nov-2003 bms <bms@FreeBSD.org> Add a sysctl MIB, NET_RT_IFMALIST, to retrieve multicast group memberships
in a protocol-independent way.

Submitted by: harti
fbc7526e8f7461209707f4f096b1ec1f9f4aca7b 05-Mar-2003 peter <peter@FreeBSD.org> Finish driving a stake through the heart of netns and the associated
ifdefs scattered around the place - its dead Jim!

The SMB stuff had stolen AF_NS, make it official.
7d819be55789099285aa8b8d940c2203a750872a 26-Feb-2003 mike <mike@FreeBSD.org> Move the typedef for size_t into _iovec.h, so that size_t is available
for struct iovec.
de21d6bfa0525729ca2075eaed679d0d458c3eb6 28-Dec-2002 phk <phk@FreeBSD.org> It is bad style to define the same structure in multiple header
files which might be included together.

Things like debuggers and lint-like programs get their knickers in
a twist (rightly so one might add) when they find different locations
for the same named struct depending on which .h file were included

This is a stellar example of Very Bad Thinking on the part of the
standards dudes who wrote that both sys/uio.h and sys/socket.h
should define struct iovec the same way.

Fix this by putting struct iovec into its own miniature sys/_iovec.h
file and #include that from sys/socket.h and sys/uio.h.

Sensible people could just put iovec into sys/_types.h but there
is probably some standard or other which will be violated if we
did something that horrible.
0347b391165bf804160d9154893976db5620243a 14-Dec-2002 fenner <fenner@FreeBSD.org> Add prototype for sockatmark().
02c7d00fbdd4f6cff671b03b1be3f8a4c81e053c 13-Nov-2002 mike <mike@FreeBSD.org> Fix a constant in the standard namespace not to depend on another
constant in the BSD namespace.
7460c4d6d43240f32f039a9f1fa31e33e4b8446e 12-Oct-2002 mike <mike@FreeBSD.org> o Add typedefs for size_t and ssize_t.
o Add typedefs for gid_t, off_t, pid_t, and uid_t in the non-standards
o Add struct iovec (also defined in <sys/uio.h>).
o Add visibility conditionals to avoid defining non-standard
extentions in the standards case.
o Change spelling of some types so they work without including
<sys/types.h> (u_char -> unsigned char, u_short -> unsigned short,
int64 -> __int64, caddr_t -> char *)
o Add comments about missing restrict type-qualifiers and missing
9e6f796b0d2083dcc48c062853660f96db0a3c8d 21-Aug-2002 mike <mike@FreeBSD.org> o Merge <machine/ansi.h> and <machine/types.h> into a new header
called <machine/_types.h>.
o <machine/ansi.h> will continue to live so it can define MD clock
macros, which are only MD because of gratuitous differences between
o Change all headers to make use of this. This mainly involves
#ifdef _BSD_FOO_T_
typedef _BSD_FOO_T_ foo_t;
#undef _BSD_FOO_T_
typedef __foo_t foo_t;

Concept by: bde
Reviewed by: jake, obrien
619c88aeeb99f6aa38801e896d23cbb5def3a151 20-Jun-2002 alfred <alfred@FreeBSD.org> Implement SO_NOSIGPIPE option for sockets. This allows one to request that
an EPIPE error return not generate SIGPIPE on sockets.

Submitted by: lioux
Inspired by: Darwin
d895407bfb783edba3e36e85e1abc98c8070eaec 14-Jun-2002 rwatson <rwatson@FreeBSD.org> Reserve two constants for managing socket MAC labels via socket options.
6d8d0aaf500294945cbb814d18063d760830b399 14-Jun-2002 rwatson <rwatson@FreeBSD.org> Whitespaec consistency.
5217090ee45857447f7ff5e0d1d9f193b68b4adb 11-Jun-2002 wollman <wollman@FreeBSD.org> SO_PRIVSTATE has been commented out for long enough now....
3ff28a5819e5627cd1f0beaf945d5e5543ccf5d1 02-Jun-2002 alfred <alfred@FreeBSD.org> bde noticed that SOMAXCONN breaks pretty badly as an option for LINT.
so back it out.
39f7a31d8080bfe4427a83cb28ee01fef0e3831a 20-Apr-2002 mike <mike@FreeBSD.org> Add sa_family_t type to <sys/_types.h> and typedefs to <netinet/in.h>
and <sys/socket.h>. Previously, sa_family_t was only typedef'd in
3b2d03b60a11ce28e58a87212bcccedd306f2c81 19-Mar-2002 alfred <alfred@FreeBSD.org> Remove __P
08c0b74dd59e51ffeffadf02915c9d0f243d00b4 03-Feb-2002 markm <markm@FreeBSD.org> Zero functional difference; make some integer constants unsigned, as
they are used in unsigned context. This shuts lint(1) up in a few
significant ways with "signed/unsigned" arithmetic warnings.
40da6b02e6f93a023ffa59f81d5dbb34506aac5c 05-Sep-2001 obrien <obrien@FreeBSD.org> style(9) the structure definitions.
93040d6fc0d3c6dcb06ba87e27e7ff7f3814a9be 12-Jun-2001 ume <ume@FreeBSD.org> FreeBSD already avoided namespace pollution (rev.1.45).

Submitted by: bde
832f8d224926758a9ae0b23a6b45353e44fbc87a 11-Jun-2001 ume <ume@FreeBSD.org> Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
95124478aefed6b12e0ad0b14da2cd59d0464be8 23-Apr-2001 grog <grog@FreeBSD.org> Add address families AF_SLOW and AF_SCLUSTER. These are used by the
Sitara QoSworks box.

Obtained from: Sitara Networks Inc.
d6216a6396cb28149e9fd3da298b3c91f85d7963 13-Apr-2001 alfred <alfred@FreeBSD.org> Make SOMAXCONN a kernel option.

Submitted by: Terry Lambert <terry@lambert.org>
5066c83435ca425d3acd22ee09aead4c11a76aef 22-Mar-2001 alfred <alfred@FreeBSD.org> Remove struct cmessage from sys/socket.h and reintroduce the private

Requested by: wollman
281af9370ca4060089860089233faf33682090ff 22-Mar-2001 alfred <alfred@FreeBSD.org> Hopefully fix some of the bugs in passing credentials over UNIX domain sockets.

Make struct cmessage visible from socket.h (about 4 places were
defining it for themselves which wasn't good)

Make __rpc_get_local_uid() useable and give it prototype that's

Fix some issues with printing out usernames from rpcbind and keyserv.
995431ad38ffd45cc9f4e66ed933c4859582e4e2 17-Feb-2001 bde <bde@FreeBSD.org> Fixed disordering in previous commit.
bf66c2eda88095d4910d396920c45781fd19db78 15-Feb-2001 ume <ume@FreeBSD.org> Correct 2nd argument of getnameinfo(3) to socklen_t.

Reviewed by: itojun
bdeaad1d1352639499878a962c61a3dad908a7b4 19-Dec-2000 assar <assar@FreeBSD.org> remove pfctlinput
d2c58fba40aed01f738fa46576a1827211e1d614 22-Nov-2000 asmodai <asmodai@FreeBSD.org> Reduce number of #ifdef nestings.

Submitted by: bde
97f7cec0968d034bffbf124dc2946ada93fb521a 08-Nov-2000 asmodai <asmodai@FreeBSD.org> Fix CMSG and ALIGN macro usage.
Previously we had to include <machine/param.h> or <sys/param.h> bogusly
due to the fact that <sys/socket.h> CMSG macros needed the ALIGN macro,
which was defined in param.h. However, including param.h was a disaster
for namespace pollution.
This solution, as contributed by shin a while ago, fixes it elegantly
by wrapping the definitions around some namespace pollution preventer
This patch was long overdue.
This should allow any network programmer to use <sys/socket.h> as

PR: 19971, 20530
Submitted by: Martin Kaeske <MartinKaeske@lausitz.net>
Mark Andrews <Mark.Andrews@nominum.com>
Patch submitted by: shin
Reviewed by: bde
bf5d99e0d9410f4f325c87984be29cae61527433 22-Sep-2000 asmodai <asmodai@FreeBSD.org> Document which RFC introduced CMSG_SPACE() and CMSG_LEN().
aaaeed879f8b0f2b838f330c11eda3a1f8dab842 22-Sep-2000 asmodai <asmodai@FreeBSD.org> Fix comment about the bsd-api-new-02a draft. This has been superceded
by RFC 2553.
e3e72a583b17917255252eaad0aded3476d3652b 20-Jun-2000 alfred <alfred@FreeBSD.org> return of the accept filter part II

accept filters are now loadable as well as able to be compiled into
the kernel.

two accept filters are provided, one that returns sockets when data
arrives the other when an http request is completed (doesn't work
with 0.9 requests)

Reviewed by: jmg
b88daebd21c57cbc2bca2a7c5b4c24b2db6f6598 18-Jun-2000 alfred <alfred@FreeBSD.org> backout accept optimizations.

Requested by: jmg, dcs, jdp, nate
dcf66cb4e24edd35d222b9fc01940f7b17909c9b 15-Jun-2000 alfred <alfred@FreeBSD.org> add socketoptions DELAYACCEPT and HTTPACCEPT which will not allow an accept()
until the incoming connection has either data waiting or what looks like a
HTTP request header already in the socketbuffer. This ought to reduce
the context switch time and overhead for processing requests.

The initial idea and code for HTTPACCEPT came from Yahoo engineers and has
been cleaned up and a more lightweight DELAYACCEPT for non-http servers
has been added

Reviewed by: silence on hackers.
b42951578188c5aab5c9f8cbcde4a743f8092cdc 02-Apr-2000 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'ALSA'.
282ee285816797c45c9c7a1663f63fea4ab18566 11-Mar-2000 shin <shin@FreeBSD.org> Fix sockaddr_storage related macro definition, as ss_family member type change.
(Currently, no effect but for future portability)

Approved by: jkh

Reviewed by: bde
73d476cc6479c6344b1bd61ca4519254b46d544c 03-Mar-2000 shin <shin@FreeBSD.org> CMSG_XXX macros alignment fixes to follow RFC2292.

Approved by: jkh

Submitted by: Partly from tech@openbsd
Reviewed by: itojun
8813e718dc87a6dcf42bd2743686c7a74df222ca 13-Jan-2000 shin <shin@FreeBSD.org> Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.


Makeworld is confirmed, and no application should be affected by this change
15b9bcb121e1f3735a2c98a11afdb52a03301d7e 29-Dec-1999 peter <peter@FreeBSD.org> Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
50ba589c666f7d356304339b9cfc7fc9d173ad8d 22-Dec-1999 shin <shin@FreeBSD.org> IPSEC support in the kernel.
pr_input() routines prototype is also changed to support IPSEC and IPV6
chained protocol headers.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
2431275ac48499acc89eff888d82d8352f3e175b 24-Nov-1999 phk <phk@FreeBSD.org> General clean-up of socket.h and associated sources to synchronise up
with NetBSD and the Single Unix Specification v2.

This updates some structures with other, almost equivalent types and
effort is under way to get the whole more consistent.

Also removes a double definition of INET6 and some other clean-ups.

Reviewed by: green, bde, phk
Some part obtained from: NetBSD, SUSv2 specification
7efc91cadcfeb421fc4d02ba94db784616f3714c 05-Nov-1999 shin <shin@FreeBSD.org> KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project
a316fc2921e595d318c2021355aefda57ed9b0e0 22-Oct-1999 julian <julian@FreeBSD.org> Add missing entries in a structure.
8c2a435f28152413be887b091d56aeaac9f2d827 21-Oct-1999 julian <julian@FreeBSD.org> fix typo
c5c63975d538cf48ceb99ba48c341293676d15c0 21-Oct-1999 julian <julian@FreeBSD.org> Whistle's Netgraph link-layer (sometimes more) networking infrastructure.
Been in production for 3 years now. Gives Instant Frame relay to if_sr
and if_ar drivers, and PPPOE support soon. See:
for on-line manual pages.

Reviewed by: Doug Rabson (dfr@freebsd.org)
Obtained from: Whistle CVS tree
010a32d6458bd7e412e002c28e3878b4705e31d4 15-Oct-1999 msmith <msmith@FreeBSD.org> Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD
3b842d34e82312a8004a7ecd65ccdb837ef72ac1 28-Aug-1999 peter <peter@FreeBSD.org> $Id$ -> $FreeBSD$
b178f74f12a0446656640ae873e0bc71057f5de3 05-Nov-1998 dg <dg@FreeBSD.org> Implemented zero-copy TCP/IP extensions via sendfile(2) - send a
file to a stream socket. sendfile(2) is similar to implementations in
HP-UX, Linux, and other systems, but the API is more extensive and
addresses many of the complaints that the Apache Group and others have
had with those other implementations. Thanks to Marc Slemko of the
Apache Group for helping me work out the best API for this.
Anyway, this has the "net" result of speeding up sends of files over
TCP/IP sockets by about 10X (that is to say, uses 1/10th of the CPU
cycles) when compared to a traditional read/write loop.
cfbddb7f4672d13b48619c9c88bd327832ec70a9 15-Sep-1998 phk <phk@FreeBSD.org> (this is an extract from src/share/examples/atm/README)

HARP | Host ATM Research Platform


What is this stuff?
The Advanced Networking Group (ANG) at the Minnesota Supercomputer Center,
Inc. (MSCI), as part of its work on the MAGIC Gigabit Testbed, developed
the Host ATM Research Platform (HARP) software, which allows IP hosts to
communicate over ATM networks using standard protocols. It is intended to
be a high-quality platform for IP/ATM research.

HARP provides a way for IP hosts to connect to ATM networks. It supports
standard methods of communication using IP over ATM. A host's standard IP
software sends and receives datagrams via a HARP ATM interface. HARP provides
functionality similar to (and typically replaces) vendor-provided ATM device
driver software.

HARP includes full source code, making it possible for researchers to
experiment with different approaches to running IP over ATM. HARP is
self-contained; it requires no other licenses or commercial software packages.

HARP implements support for the IETF Classical IP model for using IP over ATM
networks, including:

o IETF ATMARP address resolution client
o IETF ATMARP address resolution server
o UNI 3.1 and 3.0 signalling protocols
o Fore Systems's SPANS signalling protocol

What's supported
The following are supported by HARP 3:

o ATM Host Interfaces
- FORE Systems, Inc. SBA-200 and SBA-200E ATM SBus Adapters
- FORE Systems, Inc. PCA-200E ATM PCI Adapters
- Efficient Networks, Inc. ENI-155p ATM PCI Adapters

o ATM Signalling Protocols
- The ATM Forum UNI 3.1 signalling protocol
- The ATM Forum UNI 3.0 signalling protocol
- The ATM Forum ILMI address registration
- FORE Systems's proprietary SPANS signalling protocol
- Permanent Virtual Channels (PVCs)

o IETF "Classical IP and ARP over ATM" model
- RFC 1483, "Multiprotocol Encapsulation over ATM Adaptation Layer 5"
- RFC 1577, "Classical IP and ARP over ATM"
- RFC 1626, "Default IP MTU for use over ATM AAL5"
- RFC 1755, "ATM Signaling Support for IP over ATM"
- RFC 2225, "Classical IP and ARP over ATM"
- RFC 2334, "Server Cache Synchronization Protocol (SCSP)"
- Internet Draft draft-ietf-ion-scsp-atmarp-00.txt,
"A Distributed ATMARP Service Using SCSP"

o ATM Sockets interface
- The file atm-sockets.txt contains further information

What's not supported
The following major features of the above list are not currently supported:

o UNI point-to-multipoint support
o Driver support for Traffic Control/Quality of Service
o SPANS multicast and MPP support
o SPANS signalling using Efficient adapters

This software was developed under the sponsorship of the Defense Advanced
Research Projects Agency (DARPA).

Reviewed (lightly) by: phk
Submitted by: Network Computing Services, Inc.
f91828ff268ccdd9f1d9d725b84774b1a384c553 12-Sep-1998 wollman <wollman@FreeBSD.org> Define the Posix.1g names for the howto argument to shutdown(2).
b3fa7a8575a69cb459088af76d741da3679fcb8d 01-Feb-1998 alex <alex@FreeBSD.org> Added inet6 to CTL_NET_NAMES.
0506343883d62f6649f7bbaf1a436133cef6261d 11-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'jb'.
7c6e96080c4fb49bf912942804477d202a53396c 10-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'JB'.
7fb46d49218076b8a6b7f0ddeb7afd3d18187df8 21-Dec-1997 bde <bde@FreeBSD.org> Moved some declarations from <sys/socket.h> to the correct places, and
fixed everything that depended on them being misplaced.
36e7a51ea1dedf0fc860ff3106aee1db1ab3b1f5 12-Oct-1997 phk <phk@FreeBSD.org> Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of

A couple of finer points by: bde
4542c1cf5d7077caf33d6d9468f5e647cd9d19e5 16-Aug-1997 wollman <wollman@FreeBSD.org> Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
2b17ef6899aa09fa383a9e0b48a836cf1fb59768 09-May-1997 kjc <kjc@FreeBSD.org> merge ATM driver
ad536a313dc75fe14e3803f74a90162161ec7b99 30-Apr-1997 wollman <wollman@FreeBSD.org> Remove SO_PRIVSTATE socket option; it is no longer necessary, nor implemented
in the kernel. inetd should automatically notic that it has gone away
once it is recompiled.
cdd7ea42624c882c277e717173e0d9f2d4edfbd1 21-Mar-1997 wpaul <wpaul@FreeBSD.org> Add support to sendmsg()/recvmsg() for passing credentials between
processes using AF_LOCAL sockets. This hack is going to be used with
Secure RPC to duplicate a feature of STREAMS which has no real counterpart
in sockets (with STREAMS/TLI, you can apparently use t_getinfo() to learn
UID of a local process on the other side of a transport endpoint).

What happens is this: the client sets up a sendmsg() call with ancillary
data using the SCM_CREDS socket-level control message type. It does not
need to fill in the structure. When the kernel notices the data,
unp_internalize() fills in the cmesgcred structure with the sending
process' credentials (UID, EUID, GID, and ancillary groups). This data
is later delivered to the receiving process. The receiver can then
perform the follwing tests:

- Did the client send ancillary data?
o Yes, proceed.
o No, refuse to authenticate the client.

- The the client send data of type SCM_CREDS?
o Yes, proceed.
o No, refuse to authenticate the client.

- Is the cmsgcred structure the right size?
o Yes, proceed.
o No, signal a possible error.

The receiver can now inspect the credential information and use it to
authenticate the client.
94b6d727947e1242356988da003ea702d41a97de 22-Feb-1997 peter <peter@FreeBSD.org> Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
808a36ef658c1810327b5d329469bcf5dad24b28 14-Jan-1997 jkh <jkh@FreeBSD.org> Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
d8c7a9570872ddd0575d6ef9f25a2df4a74f5768 30-Aug-1996 peter <peter@FreeBSD.org> Grab the next slot for AF_INET6/PF_INET6, the resolver uses it.
c244429aa628b07406d26d351444a51dcc59ad09 15-Aug-1996 jdp <jdp@FreeBSD.org> Fix a typo in the #define for PF_RTIP, even though I doubt it will
ever make one bit of difference to anybody.
358b715a24f5eb80c6403f2a5cb4a97daa8f37d3 18-Jun-1996 wollman <wollman@FreeBSD.org> When bringing the netkey stuff over, I forgot that I had decided to change
AF_KEY into pseudo_AF_KEY, and defined PF_KEY incorrectly. Fix.

Noticed by: pst
845782b7e052a1136c206b3e7ae58942a6a725bd 14-Jun-1996 wollman <wollman@FreeBSD.org> This is the `netkey' kernel key-management service (the PF_KEY analogue
to PF_ROUTE) from NRL's IPv6 distribution, heavily modified by me for
better source layout, formatting, and textual conventions. I am told
that this code is no longer under active development, but it's a useful
hack for those interested in doing work on network security, key management,
etc. This code has only been tested twice, so it should be considered
highly experimental.

Obtained from: ftp.ripe.net
9ea36adbecaec6093c2c6769fbc0e405e5618487 09-May-1996 wollman <wollman@FreeBSD.org> Make it possible to return more than one piece of control information
(PR #1178).
Define a new SO_TIMESTAMP socket option for datagram sockets to return
packet-arrival timestamps as control information (PR #1179).

Submitted by: Louis Mamakos <loiue@TransSys.com>
73a498e93ef77f792f958b4a1ea0d9ad0490888a 11-Mar-1996 peter <peter@FreeBSD.org> Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[new sys/syscallargs.h file, to be "cvs rm"ed]
88a3e24de1bb1e786a6f5373009c12057bebad20 07-Feb-1996 wollman <wollman@FreeBSD.org> Define a new socket option, SO_PRIVSTATE. Getting it returns the state
of the SS_PRIV flag in so_state; setting it always clears same.
f3dd75a38d66ed54a0f2660b0a27d177fb33f068 30-Jan-1996 mpp <mpp@FreeBSD.org> Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.
d0328cb98caa8e43e38f976c0547ee89fb80a737 05-Jan-1996 dg <dg@FreeBSD.org> Increased default SOMAXCONN from 32 to 128. 128 is the largest value I
consider "safe" for most systems. Note that this is (has been for some
time) also tunable with sysctl (via kern.somaxconn) should the operator
wish to increase this value even higher. Also note that 128 is what
the Netscape WWW server reportedly asks for.
86f1bc4514fdcfd255f37f3218fe234bdc3664fc 05-Nov-1995 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'LINUX'.
7a80a4be51b01e07c37ffe3976bc4adfab0ff03d 13-Sep-1995 dg <dg@FreeBSD.org> Increased SOMAXCONN from 5 to 32. 5 was too small a value for just about
any reasonably busy machine, and by any measure is a lousy "max" value.
32 was chosen after a careful analysis of typical listen queue depths
on several busy Internet servers (both web and ftp). I also intend to
add a statistics counter for dropped connection requests due to the limit
being exceeded.
2e14d9ebc3d3592c67bdf625af9ebe0dfc386653 14-Mar-1995 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'MATT_THOMAS'.
897c89c0d9833b3b7deadd0f70b1e18c4d98de3a 07-Feb-1995 wollman <wollman@FreeBSD.org> Merge in the socket-level support for Transaction TCP from the OLAH_TTCP

Submitted by: Andras Olah <olah@cs.utwente.nl>
5d7722cc7e910b9204c266b26795f0db14242903 05-Jan-1995 se <se@FreeBSD.org> Submitted by: Wolfgang Stanglmeier <wolf@dentaro.GUN.de>
Reviewed by: <wollman>
First hooks and defines for the ISDN driver,
that soon will see the light ...
5251e75557bb5804b6c692b0cdc8e97ff673855f 08-Oct-1994 phk <phk@FreeBSD.org> Added prototypes here and there. Moved pfctlinput into socket.h.
b1b8768e6a30f06704cfedaeae8f3397ad760a4e 02-Oct-1994 phk <phk@FreeBSD.org> Prototypes, prototypes and even more prototypes. Not quite done yet, but
getting closer all the time.
34cd81d75f398ee455e61969b118639dacbfd7a6 23-Sep-1994 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'MACKERRAS'.
e16baf7a5fe7ac1453381d0017ed1dcdeefbc995 07-Aug-1994 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'SUNRPC'.
8d205697aac53476badf354623abd4e1c7bc5aff 02-Aug-1994 dg <dg@FreeBSD.org> Added $Id$
8fb65ce818b3e3c6f165b583b910af24000768a5 24-May-1994 rgrimes <rgrimes@FreeBSD.org> BSD 4.4 Lite Kernel Sources