History log of /freebsd-head/sys/net/if_var.h
Revision Date Author Comments
47f7dad6688c8dfa75b6813f7362fa38cb9978b6 12-Jul-2020 melifaro <melifaro@FreeBSD.org> Switch inet6 default route subscription to the new rib subscription api.

Old subscription model allowed only single customer.

Switch inet6 to the new subscription api and eliminate the old model.

Differential Revision: https://reviews.freebsd.org/D25615
29b95a0fda87e076e693ff2759dd9773cf187ec3 23-May-2020 melifaro <melifaro@FreeBSD.org> Move <add|del|change>_route() functions to route_ctl.c in preparation of
multipath control plane changed described in D24141.

Currently route.c contains core routing init/teardown functions, route table
manipulation functions and various helper functions, resulting in >2KLOC
file in total. This change moves most of the route table manipulation parts
to a dedicated file, simplifying planned multipath changes and making
route.c more manageable.

Differential Revision: https://reviews.freebsd.org/D24870
2338a28d9e4f3ed6044df834ff39c4c2dc11c290 29-Apr-2020 melifaro <melifaro@FreeBSD.org> Add nhop to the ifa_rtrequest() callback.

With the upcoming multipath changes described in D24141,
rt->rt_nhop can potentially point to a nexthop group instead of
an individual nhop.
To simplify caller handling of such cases, change ifa_rtrequest() callback
to pass changed nhop directly.

Differential Revision: https://reviews.freebsd.org/D24604
ebeaf1673a4eaed2cb977ec59f0cbae552feff56 17-Apr-2020 melifaro <melifaro@FreeBSD.org> Finish r191148: replace rtentry with route in if_bridge if_output() callback.

Generic if_output() callback signature was modified to use struct route
instead of struct rtentry in r191148, back in 2009.

Quoting commit message:

Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Fix bridge_output() to match this signature and update the remaining
comment in if_var.h.

Reviewed by: kp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D24394
3ea844b1da3f287b59141a9acb13ebed50efbb42 09-Mar-2020 gallatin <gallatin@FreeBSD.org> make lacp's use_numa hashing aware of send tags

When I did the use_numa support, I missed the fact that there is
a separate hash function for send tag nic selection. So when
use_numa is enabled, ktls offload does not work properly, as it
does not reliably allocate a send tag on the proper egress nic
since different egress nics are selected for send-tag allocation
and packet transmit. To fix this, this change:

- refectors lacp_select_tx_port_by_hash() and
lacp_select_tx_port() to make lacp_select_tx_port_by_hash()
always called by lacp_select_tx_port()

- pre-shifts flowids to convert them to hashes when calling lacp_select_tx_port_by_hash()

- adds a numa_domain field to if_snd_tag_alloc_params

- plumbs the numa domain into places where we allocate send tags

In testing with NIC TLS setup on a NUMA machine, I see thousands
of output errors before the change when enabling
kern.ipc.tls.ifnet.permitted=1. After the change, I see no
errors, and I see the NIC sysctl counters showing active TLS
offload sessions.

Reviewed by: rrs, hselasky, jhb
Sponsored by: Netflix
89c301457c7757b2c3d524e337a56c1cb6014450 03-Mar-2020 brooks <brooks@FreeBSD.org> Expose ifr_buffer_get_(buffer|length) outside if.c.

This is a preparatory commit for D23933.

Reviewed by: jhb
cac424b2914736ad0b7958cbcd0cb2a876d8fcde 26-Feb-2020 rrs <rrs@FreeBSD.org> This commit expands tcp_ratelimit to be able to handle cards
like the mlx-c5 and c6 that require a "setup" routine before
the tcp_ratelimit code can declare and use a rate. I add the
setup routine to if_var as well as fix tcp_ratelimit to call it.
I also revisit the rates so that in the case of a mlx card
of type c5/6 we will use about 100 rates concentrated in the range
where the most gain can be had (1-200Mbps). Note that I have
tested these on a c5 and they work and perform well. In fact
in an unloaded system they pace right to the correct rate (great
job mlx!). There will be a further commit here from Hans that
will add the respective changes to the mlx driver to support this
work (which I was testing with).

Sponsored by: Netflix Inc.
Differential Revision: ttps://reviews.freebsd.org/D23647
efe8e39f9c1874f194dc90613a81e54df8d5210e 15-Jan-2020 glebius <glebius@FreeBSD.org> - Move global network epoch definition to epoch.h, as more different
subsystems tend to need to know about it, and including if_var.h is
huge header pollution for them. Polluting possible non-network
users with single symbol seems much lesser evil.
- Remove non-preemptible network epoch. Not used yet, and unlikely
to get used in close future.
4d54a3986281aacfdd53fc83ffe25187b48aeb4f 29-Oct-2019 glebius <glebius@FreeBSD.org> There is a long standing problem with multicast programming for NICs
and IPv6. With IPv6 we may call if_addmulti() in context of processing
of an incoming packet. Usually this is interrupt context. While most
of the NIC drivers are able to reprogram multicast filters without
sleeping, some of them can't. An example is e1000 family of drivers.
With iflib conversion the problem was somewhat hidden. Iflib processes
packets in private taskqueue, so going to sleep doesn't trigger an
assertion. However, the sleep would block operation of the driver and
following incoming packets would fill the ring and eventually would
start being dropped. Enabling epoch for the full time of a packet
processing again started to trigger assertions for e1000.

Fix this problem once and for all using a general taskqueue to call
if_ioctl() method in all cases when if_addmulti() is called in a
non sleeping context. Note that nobody cares about returned value.

Reviewed by: hselasky, kib
Differential Revision: https://reviews.freebsd.org/D22154
8ec25643caf24ae806634d1e4e6ceb4512ce33ca 21-Oct-2019 glebius <glebius@FreeBSD.org> Remove obsoleted KPIs that were used to access interface address lists.
f3a0ee41db2b870d4a81d36ad4357bce1b1b840c 17-Oct-2019 cem <cem@FreeBSD.org> Split out a more generic debugnet(4) from netdump(4)

Debugnet is a simplistic and specialized panic- or debug-time reliable
datagram transport. It can drive a single connection at a time and is
currently unidirectional (debug/panic machine transmit to remote server
only).

It is mostly a verbatim code lift from netdump(4). Netdump(4) remains
the only consumer (until the rest of this patch series lands).

The INET-specific logic has been extracted somewhat more thoroughly than
previously in netdump(4), into debugnet_inet.c. UDP-layer logic and up, as
much as possible as is protocol-independent, remains in debugnet.c. The
separation is not perfect and future improvement is welcome. Supporting
INET6 is a long-term goal.

Much of the diff is "gratuitous" renaming from 'netdump_' or 'nd_' to
'debugnet_' or 'dn_' -- sorry. I thought keeping the netdump name on the
generic module would be more confusing than the refactoring.

The only functional change here is the mbuf allocation / tracking. Instead
of initiating solely on netdump-configured interface(s) at dumpon(8)
configuration time, we watch for any debugnet-enabled NIC for link
activation and query it for mbuf parameters at that time. If they exceed
the existing high-water mark allocation, we re-allocate and track the new
high-water mark. Otherwise, we leave the pre-panic mbuf allocation alone.
In a future patch in this series, this will allow initiating netdump from
panic ddb(4) without pre-panic configuration.

No other functional change intended.

Reviewed by: markj (earlier version)
Some discussion with: emaste, jhb
Objection from: marius
Differential Revision: https://reviews.freebsd.org/D21421
d234ae58eee2f08094c50f8d917c827d51c5f65c 15-Oct-2019 hselasky <hselasky@FreeBSD.org> The two functions ifnet_byindex() and ifnet_byindex_locked() are exactly the
same after the network stack was epochified. Merge the two into one function
and cleanup all uses of ifnet_byindex_locked().

While at it:
- Add branch prediction macros.
- Make sure the ifnet pointer is only deferred once,
also when code optimisation is disabled.

Sponsored by: Mellanox Technologies
05b77fd4771d34e898089ab664b658446cd9901a 10-Oct-2019 glebius <glebius@FreeBSD.org> Add two extra functions that basically give count of addresses
on interface. Such function could been implemented on top of
the if_foreach_llm?addr(), but several drivers need counting,
so avoid copy-n-paste inside the drivers.
f4c0c06c4798255f82d0b0982b213fb38398086b 10-Oct-2019 glebius <glebius@FreeBSD.org> Provide new KPI for network drivers to access lists of interface
addresses. The KPI doesn't reveal neither how addresses are stored,
how the access to them is synchronized, neither reveal struct ifaddr
and struct ifmaddr.

Reviewed by: gallatin, erj, hselasky, philip, stevek
Differential Revision: https://reviews.freebsd.org/D21943
1cf31620c802b9e665385827148ab45a22cef571 27-Aug-2019 jhb <jhb@FreeBSD.org> Add kernel-side support for in-kernel TLS.

KTLS adds support for in-kernel framing and encryption of Transport
Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports
offload of TLS for transmitted data. Key negotation must still be
performed in userland. Once completed, transmit session keys for a
connection are provided to the kernel via a new TCP_TXTLS_ENABLE
socket option. All subsequent data transmitted on the socket is
placed into TLS frames and encrypted using the supplied keys.

Any data written to a KTLS-enabled socket via write(2), aio_write(2),
or sendfile(2) is assumed to be application data and is encoded in TLS
frames with an application data type. Individual records can be sent
with a custom type (e.g. handshake messages) via sendmsg(2) with a new
control message (TLS_SET_RECORD_TYPE) specifying the record type.

At present, rekeying is not supported though the in-kernel framework
should support rekeying.

KTLS makes use of the recently added unmapped mbufs to store TLS
frames in the socket buffer. Each TLS frame is described by a single
ext_pgs mbuf. The ext_pgs structure contains the header of the TLS
record (and trailer for encrypted records) as well as references to
the associated TLS session.

KTLS supports two primary methods of encrypting TLS frames: software
TLS and ifnet TLS.

Software TLS marks mbufs holding socket data as not ready via
M_NOTREADY similar to sendfile(2) when TLS framing information is
added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then
called to schedule TLS frames for encryption. In the case of
sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving
the mbufs marked M_NOTREADY until encryption is completed. For other
writes (vn_sendfile when pages are available, write(2), etc.), the
PRUS_NOTREADY is set when invoking pru_send() along with invoking
ktls_enqueue().

A pool of worker threads (the "KTLS" kernel process) encrypts TLS
frames queued via ktls_enqueue(). Each TLS frame is temporarily
mapped using the direct map and passed to a software encryption
backend to perform the actual encryption.

(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if
someone wished to make this work on architectures without a direct
map.)

KTLS supports pluggable software encryption backends. Internally,
Netflix uses proprietary pure-software backends. This commit includes
a simple backend in a new ktls_ocf.ko module that uses the kernel's
OpenCrypto framework to provide AES-GCM encryption of TLS frames. As
a result, software TLS is now a bit of a misnomer as it can make use
of hardware crypto accelerators.

Once software encryption has finished, the TLS frame mbufs are marked
ready via pru_ready(). At this point, the encrypted data appears as
regular payload to the TCP stack stored in unmapped mbufs.

ifnet TLS permits a NIC to offload the TLS encryption and TCP
segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS)
is allocated on the interface a socket is routed over and associated
with a TLS session. TLS records for a TLS session using ifnet TLS are
not marked M_NOTREADY but are passed down the stack unencrypted. The
ip_output_send() and ip6_output_send() helper functions that apply
send tags to outbound IP packets verify that the send tag of the TLS
record matches the outbound interface. If so, the packet is tagged
with the TLS send tag and sent to the interface. The NIC device
driver must recognize packets with the TLS send tag and schedule them
for TLS encryption and TCP segmentation. If the the outbound
interface does not match the interface in the TLS send tag, the packet
is dropped. In addition, a task is scheduled to refresh the TLS send
tag for the TLS session. If a new TLS send tag cannot be allocated,
the connection is dropped. If a new TLS send tag is allocated,
however, subsequent packets will be tagged with the correct TLS send
tag. (This latter case has been tested by configuring both ports of a
Chelsio T6 in a lagg and failing over from one port to another. As
the connections migrated to the new port, new TLS send tags were
allocated for the new port and connections resumed without being
dropped.)

ifnet TLS can be enabled and disabled on supported network interfaces
via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported
across both vlan devices and lagg interfaces using failover, lacp with
flowid enabled, or lacp with flowid enabled.

Applications may request the current KTLS mode of a connection via a
new TCP_TXTLS_MODE socket option. They can also use this socket
option to toggle between software and ifnet TLS modes.

In addition, a testing tool is available in tools/tools/switch_tls.
This is modeled on tcpdrop and uses similar syntax. However, instead
of dropping connections, -s is used to force KTLS connections to
switch to software TLS and -i is used to switch to ifnet TLS.

Various sysctls and counters are available under the kern.ipc.tls
sysctl node. The kern.ipc.tls.enable node must be set to true to
enable KTLS (it is off by default). The use of unmapped mbufs must
also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.

KTLS is enabled via the KERN_TLS kernel option.

This patch is the culmination of years of work by several folks
including Scott Long and Randall Stewart for the original design and
implementation; Drew Gallatin for several optimizations including the
use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records
awaiting software encryption, and pluggable software crypto backends;
and John Baldwin for modifications to support hardware TLS offload.

Reviewed by: gallatin, hselasky, rrs
Obtained from: Netflix
Sponsored by: Netflix, Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21277
8a34b17735d7079d0019f78ac9030811e8670d30 01-Aug-2019 rrs <rrs@FreeBSD.org> This adds the third step in getting BBR into the tree. BBR and
an updated rack depend on having access to the new
ratelimit api in this commit.

Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D20953
250e158ddf52459661439141407bca505d199c19 20-May-2019 cem <cem@FreeBSD.org> Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
accdb3810dee12b48cf729b4ab7e8132b9ca219b 22-Apr-2019 gallatin <gallatin@FreeBSD.org> Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory

This commit adds new if_alloc_domain() and if_alloc_dev() methods to
allocate ifnets. When called with a domain on a NUMA machine,
ifalloc_domain() will record the NUMA domain in the ifnet, and it will
allocate the ifnet struct from memory which is local to that NUMA
node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain
which uses a driver supplied device_t to call ifalloc_domain() with
the appropriate domain.

Note that the new if_numa_domain field fits in an alignment pad in
struct ifnet, and so does not alter the size of the structure.

Reviewed by: glebius, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19930
9978a7d9242f744fc7473e287332c8df88e33e3e 31-Jan-2019 glebius <glebius@FreeBSD.org> New pfil(9) KPI together with newborn pfil API and control utility.

The KPI have been reviewed and cleansed of features that were planned
back 20 years ago and never implemented. The pfil(9) internals have
been made opaque to protocols with only returned types and function
declarations exposed. The KPI is made more strict, but at the same time
more extensible, as kernel uses same command structures that userland
ioctl uses.

In nutshell [KA]PI is about declaring filtering points, declaring
filters and linking and unlinking them together.

New [KA]PI makes it possible to reconfigure pfil(9) configuration:
change order of hooks, rehook filter from one filtering point to a
different one, disconnect a hook on output leaving it on input only,
prepend/append a filter to existing list of filters.

Now it possible for a single packet filter to provide multiple rulesets
that may be linked to different points. Think of per-interface ACLs in
Cisco or Juniper. None of existing packet filters yet support that,
however limited usage is already possible, e.g. default ruleset can
be moved to single interface, as soon as interface would pride their
filtering points.

Another future feature is possiblity to create pfil heads, that provide
not an mbuf pointer but just a memory pointer with length. That would
allow filtering at very early stages of a packet lifecycle, e.g. when
packet has just been received by a NIC and no mbuf was yet allocated.

Differential Revision: https://reviews.freebsd.org/D18951
6d8cc191f953b3680c5e5911afc66b7c1f8e6c4b 09-Jan-2019 glebius <glebius@FreeBSD.org> Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin
4508d14d92282d441b7414a32edba5cb0a3fc307 13-Nov-2018 glebius <glebius@FreeBSD.org> For compatibility KPI functions like if_addr_rlock() that used to have
mutexes but now are converted to epoch(9) use thread-private epoch_tracker.
Embedding tracker into ifnet(9) or ifnet derived structures creates a non
reentrable function, that will fail miserably if called simultaneously from
two different contexts.
A thread private tracker will provide a single tracker that would allow to
call these functions safely. It doesn't allow nested call, but this is not
expected from compatibility KPIs.

Reviewed by: markj
8d3e25d418ef68d6a15c5e94e25d40be89dadbbe 21-Oct-2018 ae <ae@FreeBSD.org> Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100
cc6dd2b04691a8ab51639f7f556447e99e9fba4b 29-Sep-2018 dim <dim@FreeBSD.org> Merge ^/head r338988 through r339014.
90ffd2da8f1c5a00d78d561858f5b1e75cef38e9 29-Sep-2018 tuexen <tuexen@FreeBSD.org> For changing the MTU on tun/tap devices, it should not matter whether it
is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or
doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command.
Without this patch, for IPv6 the new MTU is not used when creating routes.
Especially, when initiating TCP connections after increasing the MTU,
the old MTU is still used to compute the MSS.
Thanks to ae@ and bz@ for helping to improve the patch.

Reviewed by: ae@, bz@
Approved by: re (kib@)
MFC after: 1 week
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D17180
d09bac9dfb4d2f09f4b9350f29b5bc177f798c96 21-Sep-2018 mmacy <mmacy@FreeBSD.org> fix vlan locking to permit sx acquisition in ioctl calls

- update vlan(9) to handle changes earlier this year in multicast locking

Tested by: np@, darkfiberu at gmail.com

PR: 230510
Reviewed by: mjoras@, shurd@, sbruno@
Approved by: re (gjb@)
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16808
99cec0a00c69bf34c44b008e46eea9df8bd57395 15-Aug-2018 mmacy <mmacy@FreeBSD.org> Fix in6_multi double free

This is actually several different bugs:
- The code is not designed to handle inpcb deletion after interface deletion
- add reference for inpcb membership
- The multicast address has to be removed from interface lists when the refcount
goes to zero OR when the interface goes away
- decouple list disconnect from refcount (v6 only for now)
- ifmultiaddr can exist past being on interface lists
- add flag for tracking whether or not it's enqueued
- deferring freeing moptions makes the incpb cleanup code simpler but opens the
door wider still to races
- call inp_gcmoptions synchronously after dropping the the inpcb lock

Fundamentally multicast needs a rewrite - but keep applying band-aids for now.

Tested by: kp
Reported by: novel, kp, lwhsu
19e11c571f3c5a7674740b9dc200f51dba8c2467 09-Jul-2018 ae <ae@FreeBSD.org> Deduplicate the code.

Add generic function if_tunnel_check_nesting() that does check for
allowed nesting level for tunneling interfaces and also does loop
detection. Use it in gif(4), gre(4) and me(4) interfaces.

Differential Revision: https://reviews.freebsd.org/D16162
14de8a2820efdf121114eefd291e6427fa353690 04-Jul-2018 mmacy <mmacy@FreeBSD.org> epoch(9): allow preemptible epochs to compose

- Add tracker argument to preemptible epochs
- Inline epoch read path in kernel and tied modules
- Change in_epoch to take an epoch as argument
- Simplify tfb_tcp_do_segment to not take a ti_locked argument,
there's no longer any benefit to dropping the pcbinfo lock
and trying to do so just adds an error prone branchfest to
these functions
- Remove cases of same function recursion on the epoch as
recursing is no longer free.
- Remove the the TAILQ_ENTRY and epoch_section from struct
thread as the tracker field is now stack or heap allocated
as appropriate.

Tested by: pho and Limelight Networks
Reviewed by: kbowling at llnw dot com
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D16066
c937b516d8b592949105032489fe94e3b7589f23 24-May-2018 mmacy <mmacy@FreeBSD.org> CK: update consumers to use CK macros across the board

r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
ecd6e9d3074a783a743434c51dbfd16571c55fa2 23-May-2018 mmacy <mmacy@FreeBSD.org> UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409
8e6130804881614398e13582c3ce8be0cebda171 21-May-2018 mmacy <mmacy@FreeBSD.org> ck: simplify interface with libkvm consumers by defining ck_queue types
as their queue.h equivalents if !_KERNEL
7aeac9ef1893e0b29408213e3a320d9d1ef28357 18-May-2018 mmacy <mmacy@FreeBSD.org> ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366
a48d80f193bea83be8ef25275428954a76f0fe76 18-May-2018 mmacy <mmacy@FreeBSD.org> epoch(9): Make epochs non-preemptible by default

There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.

Reported by: markj
361b54f07a1705b83486ccd487f30e8ec82ccb86 11-May-2018 mmacy <mmacy@FreeBSD.org> Allow different bridge types to coexist

if_bridge has a lot of limitations that make it scale poorly to higher data
rates. In my projects/VPC branch I leverage the bridge interface between
layers for my high speed soft switch as well as for purposes of stacking
in general.

Reviewed by: sbruno@
Approved by: sbruno@
Differential Revision: https://reviews.freebsd.org/D15344
0f77b86d6407c59093f5fcd8dad78a594831ff3e 10-May-2018 mmacy <mmacy@FreeBSD.org> Allocate epoch for networking at startup

Additionally add CK to include paths for modules

Approved by: sbruno@
d3f138323c94bbb53f28aee7671a45db13e78ecb 06-May-2018 mmacy <mmacy@FreeBSD.org> r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl
to sleep on commands to the NIC when updating multicast filters. More generally this permitted
driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a
a multicast update would still be queued for deletion when ifconfig deleted the interface
thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses
on the interface.

Synchronously remove all external references to a multicast address before enqueueing for delete.

Reported by: lwhsu
Approved by: sbruno
071e927e22e1f0642a2f6d5fc06d39eec7946fb7 06-May-2018 markj <markj@FreeBSD.org> Import the netdump client code.

This is a component of a system which lets the kernel dump core to
a remote host after a panic, rather than to a local storage device.
The server component is available in the ports tree. netdump is
particularly useful on diskless systems.

The netdump(4) man page contains some details describing the protocol.
Support for configuring netdump will be added to dumpon(8) in a future
commit. To use netdump, the kernel must have been compiled with the
NETDUMP option.

The initial revision of netdump was written by Darrell Anderson and
was integrated into Sandvine's OS, from which this version was derived.

Reviewed by: bdrewery, cem (earlier versions), julian, sbruno
MFC after: 1 month
X-MFC note: use a spare field in struct ifnet
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D15253
5663cd283745b455337039060c755801fb47d95f 26-Apr-2018 hselasky <hselasky@FreeBSD.org> Add network device event for priority code point, PCP, changes.

When the PCP is changed for either a VLAN network interface or when
prio tagging is enabled for a regular ethernet network interface,
broadcast the IFNET_EVENT_PCP event so applications like ibcore can
update its GID tables accordingly.

MFC after: 3 days
Reviewed by: ae, kib
Differential Revision: https://reviews.freebsd.org/D15040
Sponsored by: Mellanox Technologies
ac0325b4db68e6658c0ca652e4ca905a15b6a026 30-Mar-2018 brooks <brooks@FreeBSD.org> Use an accessor function to access ifr_data.

This fixes 32-bit compat (no ioctl command defintions are required
as struct ifreq is the same size). This is believed to be sufficent to
fully support ifconfig on 32-bit systems.

Reviewed by: kib
Obtained from: CheriBSD
MFC after: 1 week
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14900
9de215608cfe3e871e92c6d6444063dd8be2b5c9 27-Mar-2018 kib <kib@FreeBSD.org> Allow to specify PCP on packets not belonging to any VLAN.

According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.

Reviewed by: ae (previous version), hselasky, melifaro
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14702
4736ccfd9c3411d50371d7f21f9450a47c19047e 20-Nov-2017 pfg <pfg@FreeBSD.org> sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
7c2ab1d9f6ff94b01027808fde17933a54ea8c82 06-Sep-2017 hselasky <hselasky@FreeBSD.org> Add support for generic backpressure indicator for ratelimited
transmit queues aswell as non-ratelimited ones.

Add the required structure bits in order to support a backpressure
indication with ratelimited connections aswell as non-ratelimited
ones. The backpressure indicator is a value between zero and 65535
inclusivly, indicating if the destination transmit queue is empty or
full respectivly. Applications can use this value as a decision point
for when to stop transmitting data to avoid endless ENOBUFS error
codes upon transmitting an mbuf. This indicator is also useful to
reduce the latency for ratelimited queues.

Reviewed by: gallatin, kib, gnn
Differential Revision: https://reviews.freebsd.org/D11518
Sponsored by: Mellanox Technologies
0cfdb3c3056d9ea5b7a14292e663a5a0afd1f5a4 10-May-2017 rpokala <rpokala@FreeBSD.org> Persistently store NIC's hardware MAC address, and add a way to retrive it

The MAC address reported by `ifconfig ${nic} ether' does not always match
the address in the hardware, as reported by the driver during attach. In
particular, NICs which are components of a lagg(4) interface all report the
same MAC.

When attaching, the NIC driver passes the MAC address it read from the
hardware as an argument to ether_ifattach(). Keep a second copy of it, and
create ioctl(SIOCGHWADDR) to return it. Teach `ifconfig' to report it along
with the active MAC address.

PR: 194386
Reviewed by: glebius
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D10609
7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da 28-Feb-2017 imp <imp@FreeBSD.org> Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
73279c7326bc6558fb6669f4f97ab47717126806 31-Jan-2017 stevek <stevek@FreeBSD.org> Add the folowing set accessor functions for recently-added members of ifnet
structure:

if_gethwtsomax(), if_sethwtsomax() - if_hw_tsomax
if_gethwtsomaxsegcount(), if_sethwtsomaxsegcount() - if_hw_tsomaxsegcount
if_gethwtsomaxsegsize(), if_sethwtsomaxsegsize() - if_hw_tsomaxsegsize

Update em and vnic drivers which had already been coverted to use accessor
functions for the other ifnet structure members.

Reviewed by: erj
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D8544
a25116b2e3c1014bd7f49e338a36ba69d64189f3 28-Jan-2017 dexuan <dexuan@FreeBSD.org> ifnet: move the new ifnet_event EVENTHANDLER_DECLARE to net/if_var.h

Thank glebius for pointing this out:
"The network stuff shall not be added to sys/eventhandler.h"

Reviewed by: David_A_Bright_DELL.com, sephe, glebius
Approved by: sephe (mentor)
MFC after: 2 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D9345
efa6326974ec2cdb6721fec731bcd86758d0877c 18-Jan-2017 hselasky <hselasky@FreeBSD.org> Implement kernel support for hardware rate limited sockets.

- Add RATELIMIT kernel configuration keyword which must be set to
enable the new functionality.

- Add support for hardware driven, Receive Side Scaling, RSS aware, rate
limited sendqueues and expose the functionality through the already
established SO_MAX_PACING_RATE setsockopt(). The API support rates in
the range from 1 to 4Gbytes/s which are suitable for regular TCP and
UDP streams. The setsockopt(2) manual page has been updated.

- Add rate limit function callback API to "struct ifnet" which supports
the following operations: if_snd_tag_alloc(), if_snd_tag_modify(),
if_snd_tag_query() and if_snd_tag_free().

- Add support to ifconfig to view, set and clear the IFCAP_TXRTLMT
flag, which tells if a network driver supports rate limiting or not.

- This patch also adds support for rate limiting through VLAN and LAGG
intermediate network devices.

- How rate limiting works:

1) The userspace application calls setsockopt() after accepting or
making a new connection to set the rate which is then stored in the
socket structure in the kernel. Later on when packets are transmitted
a check is made in the transmit path for rate changes. A rate change
implies a non-blocking ifp->if_snd_tag_alloc() call will be made to the
destination network interface, which then sets up a custom sendqueue
with the given rate limitation parameter. A "struct m_snd_tag" pointer is
returned which serves as a "snd_tag" hint in the m_pkthdr for the
subsequently transmitted mbufs.

2) When the network driver sees the "m->m_pkthdr.snd_tag" different
from NULL, it will move the packets into a designated rate limited sendqueue
given by the snd_tag pointer. It is up to the individual drivers how the rate
limited traffic will be rate limited.

3) Route changes are detected by the NIC drivers in the ifp->if_transmit()
routine when the ifnet pointer in the incoming snd_tag mismatches the
one of the network interface. The network adapter frees the mbuf and
returns EAGAIN which causes the ip_output() to release and clear the send
tag. Upon next ip_output() a new "snd_tag" will be tried allocated.

4) When the PCB is detached the custom sendqueue will be released by a
non-blocking ifp->if_snd_tag_free() call to the currently bound network
interface.

Reviewed by: wblock (manpages), adrian, gallatin, scottl (network)
Differential Revision: https://reviews.freebsd.org/D3687
Sponsored by: Mellanox Technologies
MFC after: 3 months
c430547a157a39f3559c61424a7cae2d97360478 19-Oct-2016 kevlo <kevlo@FreeBSD.org> Fix typo in comment.
2d24f3537374fd5f878b4dd100de081baa45cedb 06-Oct-2016 kevlo <kevlo@FreeBSD.org> Remove an alias if_list, use if_link consistently.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D8075
c1c4efb9b3ed28721264961d0fe4a29cd3a87046 29-Sep-2016 kevlo <kevlo@FreeBSD.org> Remove the compatibility macro if_addrlist.

Since if_addrlist is used only for ipfilter(4), add a macro if_addrlist
in ip_compat.h.

Reviewed by: cy
Differential Revision: https://reviews.freebsd.org/D8059
6f20b23b0e247d59c442ebbfd520ac172ccba783 28-Sep-2016 kevlo <kevlo@FreeBSD.org> Remove ifa_list, use ifa_link (structure field) instead.

While here, prefer if_addrhead (FreeBSD) to if_addrlist (BSD compat) naming
for the interface address list in sctp_bsd_addr.c

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D8051
23c987c93ad828670c70494594277af32a107360 27-Sep-2016 kevlo <kevlo@FreeBSD.org> Remove a comment about the size of the ifnet structure.

Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D8036
af533198e352d1ee97e831497a7a004f6f1f2740 23-Jun-2016 np <np@FreeBSD.org> Add spares to struct ifnet and socket for packet pacing and/or general
use. Update comments regarding the spare fields in struct inpcb.

Bump __FreeBSD_version for the changes to the size of the structures.

Reviewed by: gnn@
Approved by: re@ (gjb@)
Sponsored by: Chelsio Communications
6bb446b900c1a249fac0ca1fa639ca5c3e5b3417 15-Jun-2016 sephe <sephe@FreeBSD.org> MFC 296178

buf_ring/drbr: Add buf_ring_peek_clear_sc and use it in drbr_peek

Unlike buf_ring_peek, it only supports single consumer mode, and it
clears the cons_head if DEBUG_BUFRING/INVARIANTS is defined.

The normal use case of drbr_peek for network drivers is:

m = drbr_peek(br);
err = hw_spec_encap(&m); /* could m_defrag/m_collapse */
(*)
if (err) {
if (m == NULL)
drbr_advance(br);
else
drbr_putback(br, m);
/* break the loop */
}
drbr_advance(br);

The race is:
If hw_spec_encap() m_defrag or m_collapse the mbuf, i.e. the old mbuf
was freed, or like the Hyper-V's network driver, that transmission-
done does not even require the TX lock; then on the other CPU at the
(*) time, the freed mbuf could be recycled and being drbr_enqueue even
before the current CPU had the chance to call drbr_{advance,putback}.
This triggers a panic in drbr_enqueue duplicated element check, if
DEBUG_BUFRING/INVARIANTS is defined.

Use buf_ring_peek_clear_sc() in drbr_peek() to fix the above race.

This change is a NO-OP, if neither DEBUG_BUFRING nor INVARIANTS are
defined.

MFC after: 1 week
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D5416
00d578928eca75be320b36d37543a7e2a4f9fbdb 27-May-2016 grehan <grehan@FreeBSD.org> Create branch for bhyve graphics import.
47d341d05f92ffc768b2ba8feef940df6eb70a87 18-May-2016 bz <bz@FreeBSD.org> Rather than having the if_vmove() code intermixed in the vnet_destroy()
function in vnet.c move it to if.c where it logically belongs and put
it under a VNET_SYSUNINIT() call.
To not change the current behaviour make sure it runs first thing
during teardown. In the future this will allow us more flexibility
on changing the order on when we want to get rid of interfaces.

Stop exporting if_vmove() and make it file static.

Reviewed by: gnn
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6438
596f2a146d0b80c29929db3873e8d0f95be58796 18-May-2016 scottl <scottl@FreeBSD.org> Import the 'iflib' API library for network drivers. From the author:

"iflib is a library to eliminate the need for frequently duplicated device
independent logic propagated (poorly) across many network drivers."

Participation is purely optional. The IFLIB kernel config option is
provided for drivers that want to transition between legacy and iflib
modes of operation. ixl and ixgbe driver conversions will be committed
shortly. We hope to see participation from the Broadcom and maybe
Chelsio drivers in the near future.

Submitted by: mmacy@nextbsd.org
Reviewed by: gallatin
Differential Revision: D5211
93152c67c93acd0eca913cc1939a3393129c2c4d 31-Dec-2015 melifaro <melifaro@FreeBSD.org> Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102
45d5617154226a3aec179038a1be774c890a4f26 17-Dec-2015 smh <smh@FreeBSD.org> Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay
864cf1812819836284d12030ce553ee743ca10f0 15-Dec-2015 smh <smh@FreeBSD.org> Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111
d81208c9488e0efbf99f327d11bbd7bc055c5b1b 25-Nov-2015 ae <ae@FreeBSD.org> Overhaul if_enc(4) and make it loadable in run-time.

Use hhook(9) framework to achieve ability of loading and unloading
if_enc(4) kernel module. INET and INET6 code on initialization registers
two helper hooks points in the kernel. if_enc(4) module uses these helper
hook points and registers its hooks. IPSEC code uses these hhook points
to call helper hooks implemented in if_enc(4).
7ff6dd508c4b1f7e015e548ec82f1bf936b529c0 08-Oct-2015 hselasky <hselasky@FreeBSD.org> MFC r287775:
Update TSO limits to include all headers.

To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits. See r287775
for a more detailed description.

Differential Revision: https://reviews.freebsd.org/D3477
Reviewed by: rmacklem
44be0cdaa84fb8686a0c6d5983075b09672c1e93 16-Sep-2015 melifaro <melifaro@FreeBSD.org> Unify loopback route switching:
* prepare gateway before insertion
* use RTM_CHANGE instead of explicit find/change route
* Remove fib argument from ifa_switch_loopback_route added in r264887:
if old ifp fib differes from new one, that the caller
is doing something wrong
* Make ifa_*_loopback_route call single ifa_maintain_loopback_route().
ac0c211c77e6cbd81678b59bd8036900dffcf626 14-Sep-2015 hselasky <hselasky@FreeBSD.org> Update TSO limits to include all headers.

To make driver programming easier the TSO limits are changed to
reflect the values used in the BUSDMA tag a network adapter driver is
using. The TCP/IP network stack will subtract space for all linklevel
and protocol level headers and ensure that the full mbuf chain passed
to the network adapter fits within the given limits.

Implementation notes:

If a network adapter driver needs to fixup the first mbuf in order to
support VLAN tag insertion, the size of the VLAN tag should be
subtracted from the TSO limit. Else not.

Network adapters which typically inline the complete header mbuf could
technically transmit one more segment. This patch does not implement a
mechanism to recover the last segment for data transmission. It is
believed when sufficiently large mbuf clusters are used, the segment
limit will not be reached and recovering the last segment will not
have any effect.

The current TSO algorithm tries to send MTU-sized packets, where the
MTU typically is 1500 bytes, which gives 1448 bytes of TCP data
payload per packet for IPv4. That means if the TSO length limitiation
is set to 65536 bytes, there will be a data payload remainder of
(65536 - 1500) mod 1448 bytes which is equal to 324 bytes. Trying to
recover total TSO length due to inlining mbuf header data will not
have any effect, because adding or removing the ETH/IP/TCP headers
to or from 324 bytes will not cause more or less TCP payload to be
TSO'ed.

Existing network adapter limits will be updated separately.

Differential Revision: https://reviews.freebsd.org/D3458
Reviewed by: rmacklem
MFC after: 2 weeks
bbab60824360d4c8995444c35b2fe6947b08d15a 05-Sep-2015 melifaro <melifaro@FreeBSD.org> Constantify lookup key in ifa_ifwith* functions.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).

Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3464
a29f5e7ca80965b7551a01382485f2e7cba838ec 16-Apr-2015 glebius <glebius@FreeBSD.org> Move ALTQ from contrib to net/altq. The ALTQ code is for many years
discontinued by its initial authors. In FreeBSD the code was already
slightly edited during the pf(4) SMP project. It is about to be edited
more in the projects/ifnet. Moving out of contrib also allows to remove
several hacks to the make glue.

Reviewed by: net@
6c212ad708bd62ba031604b1f98397ab5464cd29 27-Feb-2015 glebius <glebius@FreeBSD.org> Hide struct ifmultiaddr under _KERNEL, too.
90eb9ef3d2a7e3442f2fe0a5a49a823f00dc0689 19-Feb-2015 glebius <glebius@FreeBSD.org> Now that all users of _WANT_IFADDR are fixed, remove this crutch and
hide ifaddr, in_ifaddr and in6_ifaddr under _KERNEL.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
57be9990bda3e6f3bc0fd4d3b67627241f0b0d75 23-Dec-2014 ae <ae@FreeBSD.org> Add if_inc_counter() and if_get_counter_default() functions that do
access to ifnet counters for code compatibility with FreeBSD 11.

This is direct commit to stable/10.

Discussed with: glebius@, arch@
9fcf944d2ab177691803626c24a94fa931a59d01 19-Nov-2014 hselasky <hselasky@FreeBSD.org> MFC r274376:
Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by: Mellanox Technologies
4d1b5f70ee0bf0a60ac9e1a7806fe21a1e3c7556 11-Nov-2014 hselasky <hselasky@FreeBSD.org> Fix some minor TSO issues:
- Improve description of TSO limits.
- Remove a not needed KASSERT()
- Remove some not needed variable casts.

Sponsored by: Mellanox Technologies
Discussed with: lstewart @
MFC after: 1 week
83e84205ec887b0e6deeded16265050f3237bd1f 09-Nov-2014 glebius <glebius@FreeBSD.org> Use standard mtx(9), rwlock(9), sx(9) system initialization macros
instead of doing initialization manually.

Sponsored by: Nginx, Inc.
Sponsored by: Netflix
11af63037f17d7b85036d03dc07687f77171b4b2 06-Nov-2014 melifaro <melifaro@FreeBSD.org> Make checks for rt_mtu generic:

Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
In this case domain overrides radix _addroute callback (in[6]_addroute)
and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
In this case, there are no explicit per-domain calls, but one can
override rte by setting ifa_rtrequest hook to domain handler
(inet6 does this).

3) ifconfig ifaceX mtu YYYY
In this case, we have no callbacks, but ip[6]_output performes runtime
checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
control plane, not in runtime part, and properly deal with increased
interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
for some cases, so we need an abstract way to know maximum MTU size
for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
use this functions on new non-inserted rte.

More changes will follow soon.

MFC after: 1 month
Sponsored by: Yandex LLC
a8147b2f482df00c3d01addaa0681face98b9b88 03-Nov-2014 hselasky <hselasky@FreeBSD.org> Clarify TSO segment limit comment and remove two TABs to make lines a
bit shorter.

Sponsored by: Mellanox Technologies
fa183f01741aa54ff3ba0fcf31b7b1404b7a7e53 03-Nov-2014 hselasky <hselasky@FreeBSD.org> MFC r271946 and r272595:
Improve transmit sending offload, TSO, algorithm in general. This
change allows all HCAs from Mellanox Technologies to function properly
when TSO is enabled. See r271946 and r272595 for more details about
this commit.

Sponsored by: Mellanox Technologies
aab771d81275210e947cce80bbc0529e88b67337 28-Sep-2014 bz <bz@FreeBSD.org> Move the unconditional #include of net/ifq.h to the very end of file.
This seems to allow us to pass a universe with either clang or gcc
after r272244 (and r272260) and probably makes it easier to untabgle
these chained #includes in the future.
0f9d61b26bf6208f30965550c504651a3bb13e0d 28-Sep-2014 glebius <glebius@FreeBSD.org> - Remove empty wrappers ether_poll_[de]register_drv(). [1]
- Move polling(9) declarations out of ifq.h back to if_var.h
they are absolutely unrelated to queues.

Submitted by: Mikhail <mp lenta.ru> [1]
2cb60789396436bccdbe1d9265655c2fadac9697 28-Sep-2014 glebius <glebius@FreeBSD.org> Finally, convert counters in struct ifnet to counter(9).

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
7d70b89c51d6f8dff25417542c5ec6bf48da64e4 27-Sep-2014 melifaro <melifaro@FreeBSD.org> Use underlying ports counters to get lagg statistics instead of
per-packet accounting.
This introduce user-visible changes like aggregating error counters.

Reviewed by: asomers (prev.version), glebius
CR: D781
MFC after: 2 weeks
Sponsored by: Yandex LLC
bdacf9ba4dd91cc0515199d29fd6debec10cee6a 22-Sep-2014 hselasky <hselasky@FreeBSD.org> Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by: adrian, rmacklem
Sponsored by: Mellanox Technologies
MFC after: 1 week
de25153d5914b46ee65ffbb937b9c6bea65ae6fd 18-Sep-2014 glebius <glebius@FreeBSD.org> Remove a bunch of methods that are superseded by if_inc_counter().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
c2d27a81fe2d1f4a607c723cc463421f29501395 18-Sep-2014 glebius <glebius@FreeBSD.org> While not too late rename 'ifnet_counter' to 'ift_counter'. One of the
imporant moments that we discussed with Marcel and Anuranjan was that
a converted driver should return false for 'grep ifnet if_driver.c' :)

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
0917a065ca1381c4aa57563ea3ebf5092d237a14 18-Sep-2014 glebius <glebius@FreeBSD.org> Add a function to set if_get_counter method for an ifnet. To be used
in the drivers that are already converted to "Juniper drvapi". This
can be revisited in future.
bf71125f67007cbce166cbf5bb6459ec0bbf5770 18-Sep-2014 glebius <glebius@FreeBSD.org> While not too late rename if_get_counter_compat() to if_get_counter_default().
The compat counters will go away, but the function will remain in its place,
and in all places where it is going to be called.

Discussed with: melifaro
f76e492f6dfbd579c3522f6e324fde5b1c468b40 18-Sep-2014 glebius <glebius@FreeBSD.org> Add if_inc_counter(), a generic method to update ifnet(9) counter
w/o dereferencing the struct.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
727760a4e452031c6ff1856c341de106c671838e 13-Sep-2014 hselasky <hselasky@FreeBSD.org> Revert r271504. A new patch to solve this issue will be made.

Suggested by: adrian @
3d04a989df2ae6c0fb4728b68f2274714b25fd72 13-Sep-2014 hselasky <hselasky@FreeBSD.org> Improve transmit sending offload, TSO, algorithm in general.

The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

MFC after: 1 week
Sponsored by: Mellanox Technologies
081aa8a15cafb217da990350e158841f186feab9 11-Sep-2014 asomers <asomers@FreeBSD.org> Revisions 264905 and 266860 added a "int fib" argument to ifa_ifwithnet and
ifa_ifwithdstaddr. For the sake of backwards compatibility, the new
arguments were added to new functions named ifa_ifwithnet_fib and
ifa_ifwithdstaddr_fib, while the old functions became wrappers around the
new ones that passed RT_ALL_FIBS for the fib argument. However, the
backwards compatibility is not desired for FreeBSD 11, because there are
numerous other incompatible changes to the ifnet(9) API. We therefore
decided to remove it from head but leave it in place for stable/9 and
stable/10. In addition, this commit adds the fib argument to
ifa_ifwithbroadaddr for consistency's sake.

sys/sys/param.h
Increment __FreeBSD_version

sys/net/if.c
sys/net/if_var.h
sys/net/route.c
Add fibnum argument to ifa_ifwithbroadaddr, and remove the _fib
versions of ifa_ifwithdstaddr, ifa_ifwithnet, and ifa_ifwithroute.

sys/net/route.c
sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_options.c
sys/netinet/ip_output.c
sys/netinet6/nd6.c
Fixup calls of modified functions.

share/man/man9/ifnet.9
Document changed API.

CR: https://reviews.freebsd.org/D458
MFC after: Never
Sponsored by: Spectra Logic
9dfcf3eeb1336e54d393a9c0cbb34a354ed8cc5d 31-Aug-2014 glebius <glebius@FreeBSD.org> Toss fields so that no padding field is required to achieve alignment.
833eb3c331411d6a1706369c576e8f7abcaa6234 31-Aug-2014 glebius <glebius@FreeBSD.org> It is actually possible to have if_t a typedef to non-void type,
and keep both converted to drvapi and non-converted drivers
compilable.

o Make if_t typedef to struct ifnet *.
o Remove shim functions.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
3b5ede57e944576392dbde8e9ef322c6bd7f146a 31-Aug-2014 glebius <glebius@FreeBSD.org> Provide pointer from struct ifnet to struct netmap_adapter,
instead of abusing spare field.
70b7c46209e1ad46231f23b2cda4ef4e42bd14ba 31-Aug-2014 glebius <glebius@FreeBSD.org> o Remove struct if_data from struct ifnet. Now it is merely API structure
for route(4) socket and ifmib(4) sysctl.
o Move fields from if_data to ifnet, but keep all statistic counters
separate, since they should disappear later.
o Provide function if_data_copy() to fill if_data, utilize it in routing
socket and ifmib handler.
o Provide overridable ifnet(9) method to fetch counters. If no provided,
if_get_counters_compat() would be used, that returns old counters.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
d32e428cc37439544fc5159604ba10ed560a88c1 29-Jul-2014 glebius <glebius@FreeBSD.org> Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
a8aa481895641687bc168b6283d7521e52a48280 06-Jun-2014 asomers <asomers@FreeBSD.org> MFC changes relating to running multiple interfaces on different fibs but
with addresses on the same subnet.

MFC r266860

Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.

MFC r264917

Style fixes, mostly trailing whitespace elimination. No functional change.

MFC r264905

Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

MFC r263738

tests/sys/netinet/Makefile
tests/sys/netinet/fibs.sh
Replace fibs:udp_dontroute with fibs:src_addr_selection_by_subnet.
The original test was poorly written; it was actually testing
kern/167947 instead of the desired kern/187553. The root cause of the
bug is that ifa_ifwithnet did not have a fib argument. The new test
more directly targets that behavior.

tests/sys/netinet/udp_dontroute.c
Delete the auxilliary binary used by the old test
916c7006f53cf6a21d270b2c41727f786f65e066 02-Jun-2014 marcel <marcel@FreeBSD.org> Introduce a procedural interface to the ifnet structure. The new
interface allows the ifnet structure to be defined as an opaque
type in NIC drivers. This then allows the ifnet structure to be
changed without a need to change or recompile NIC drivers.

Put differently, NIC drivers can be written and compiled once and
be used with different network stack implementations, provided of
course that those network stack implementations have an API and
ABI compatible interface.

This commit introduces the 'if_t' type to replace 'struct ifnet *'
as the type of a network interface. The 'if_t' type is defined as
'void *' to enable the compiler to perform type conversion to
'struct ifnet *' and vice versa where needed and without warnings.
The functions that implement the API are the only functions that
need to have an explicit cast.

The MII code has been converted to use the driver API to avoid
unnecessary code churn. Code churn comes from having to work with
both converted and unconverted drivers in correlation with having
callback functions that take an interface. By converting the MII
code first, the callback functions can be defined so that the
compiler will perform the typecasts automatically.

As soon as all drivers have been converted, the if_t type can be
redefined as needed and the API functions can be fix to not need
an explicit cast.

The immediate benefactors of this change are:
1. Juniper Networks - The network stack implementation in Junos
is entirely different from FreeBSD's one and this change
allows Juniper to build "stock" NIC drivers that can be used
in combination with both the FreeBSD and Junos stacks.
2. FreeBSD - This change opens the door towards changing ifnet
and implementing new features and optimizations in the network
stack without it requiring a change in the many NIC drivers
FreeBSD has.

Submitted by: Anuranjan Shukla <anshukla@juniper.net>
Reviewed by: glebius@
Obtained from: Juniper Networks, Inc.
7ca8bf0f2c27a73f51aeaa039bf10ab3c6af198b 29-May-2014 asomers <asomers@FreeBSD.org> Fix unintended KBI change from r264905. Add _fib versions of
ifa_ifwithnet() and ifa_ifwithdstaddr() The legacy functions will call the
_fib() versions with RT_ALL_FIBS, preserving legacy behavior.

sys/net/if_var.h
sys/net/if.c
Add legacy-compatible functions as described above. Ensure legacy
behavior when RT_ALL_FIBS is passed as fibnum.

sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/net/route.c
sys/net/rtsock.c
sys/netinet6/nd6.c
Call with _fib() functions if we must use a specific fib, or the
legacy functions otherwise.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
Improve the udp_dontroute test. The bug that this test exercises is
that ifa_ifwithnet() will return the wrong address, if multiple
interfaces have addresses on the same subnet but with different
fibs. The previous version of the test only considered one possible
failure mode: that ifa_ifwithnet_fib() might fail to find any
suitable address at all. The new version also checks whether
ifa_ifwithnet_fib() finds the correct address by checking where the
ARP request goes.

Reported by: bz, hrs
Reviewed by: hrs
MFC after: 1 week
X-MFC-with: 264905
Sponsored by: Spectra Logic
f8a34b6f4917582dfd787f47a09088f66e0ac509 24-Apr-2014 asomers <asomers@FreeBSD.org> Fix subnet and default routes on different FIBs on the same subnet.

These two bugs are closely related. The root cause is that ifa_ifwithnet
does not consider FIBs when searching for an interface address.

sys/net/if_var.h
sys/net/if.c
Add a fib argument to ifa_ifwithnet and ifa_ifwithdstadddr. Those
functions will only return an address whose interface fib equals the
argument.

sys/net/route.c
Update calls to ifa_ifwithnet and ifa_ifwithdstaddr with fib
arguments.

sys/netinet/in.c
Update in_addprefix to consider the interface fib when adding
prefixes. This will prevent it from not adding a subnet route when
one already exists on a different fib.

sys/net/rtsock.c
sys/netinet/in_pcb.c
sys/netinet/ip_output.c
sys/netinet/ip_options.c
sys/netinet6/nd6.c
Add RT_DEFAULT_FIB arguments to ifa_ifwithdstaddr and ifa_ifwithnet.
In some cases it there wasn't a clear specific fib number to use.
In others, I was unable to test those functions so I chose
RT_DEFAULT_FIB to minimize divergence from current behavior. I will
fix some of the latter changes along with PR kern/187553.

tests/sys/netinet/fibs_test.sh
tests/sys/netinet/udp_dontroute.c
tests/sys/netinet/Makefile
Revert r263738. The udp_dontroute test was right all along.
However, bugs kern/187550 and kern/187553 cancelled each other out
when it came to this test. Because of kern/187553, ifa_ifwithnet
searched the default fib instead of the requested one, but because
of kern/187550, there was an applicable subnet route on the default
fib. The new test added in r263738 doesn't work right, however. I
can verify with dtrace that ifa_ifwithnet returned the wrong address
before I applied this commit, but route(8) miraculously found the
correct interface to use anyway. I don't know how.

Clear expected failure messages for kern/187550 and kern/187552.

PR: kern/187550
PR: kern/187552
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic
6e7494c7e1c3c8d00265fe530a28384c7ccdee68 24-Apr-2014 asomers <asomers@FreeBSD.org> Fix host and network routes for new interfaces when net.add_addr_allfibs=0

sys/net/route.c
In rtinit1, use the interface fib instead of the process fib. The
latter wasn't very useful because ifconfig(8) is usually invoked
with the default process fib. Changing ifconfig(8) to use setfib(2)
would be redundant, because it already sets the interface fib.

tests/sys/netinet/fibs_test.sh
Clear the expected ATF failure

sys/net/if.c
Pass the interface fib in calls to rtrequest1_fib and rtalloc1_fib

sys/netinet/in.c
sys/net/if_var.h
Add a fibnum argument to ifa_switch_loopback_route, a subroutine of
in_scrubprefix. Pass it the interface fib.

PR: kern/187549
Reviewed by: melifaro
MFC after: 3 weeks
Sponsored by: Spectra Logic Corporation
26117c8d4672944ae8818ee2f233ef5ab642e437 20-Mar-2014 np <np@FreeBSD.org> Add a shorter alias for if_data.ifi_oqdrops.
b38edcd355dfe9c2ac4080b8837687b0dba7dd41 13-Mar-2014 glebius <glebius@FreeBSD.org> Since 32-bit if_baudrate isn't enough to describe a baud rate of a 10 Gbit
interface, in the r241616 a crutch was provided. It didn't work well, and
finally we decided that it is time to break ABI and simply make if_baudrate
a 64-bit value. Meanwhile, the entire struct if_data was reviewed.

o Remove the if_baudrate_pf crutch.

o Make all fields of struct if_data fixed machine independent size. The
notion of data (packet counters, etc) are by no means MD. And it is a
bug that on amd64 we've got a 64-bit counters, while on i386 32-bit,
which at modern speeds overflow within a second.

This also removes quite a lot of COMPAT_FREEBSD32 code.

o Give 16 bit for the ifi_datalen field. This field was provided to
make future changes to if_data less ABI breaking. Unfortunately the
8 bit size of it had effectively limited sizeof if_data to 256 bytes.

o Give 32 bits to ifi_mtu and ifi_metric.
o Give 64 bits to the rest of fields, since they are counters.

__FreeBSD_version bumped.

Discussed with: emax
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
eb1a5f8de9f7ea602c373a710f531abbf81141c4 21-Feb-2014 gjb <gjb@FreeBSD.org> Move ^/user/gjb/hacking/release-embedded up one directory, and remove
^/user/gjb/hacking since this is likely to be merged to head/ soon.

Sponsored by: The FreeBSD Foundation
6b01bbf146ab195243a8e7d43bb11f8835c76af8 27-Dec-2013 gjb <gjb@FreeBSD.org> Copy head@r259933 -> user/gjb/hacking/release-embedded for initial
inclusion of (at least) arm builds with the release.

Sponsored by: The FreeBSD Foundation
592c1d7a8e9590ce76786fe68d0cac491b6b23e6 05-Nov-2013 glebius <glebius@FreeBSD.org> In complemence to ifa_add_loopback_route() and ifa_del_loopback_route()
provide function ifa_switch_loopback_route() that will be used in case when
an interface address used for a loopback route goes away, but we have another
interface address with same address value and want to preserve loopback
route.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
bce78dfe179929ceecbfdee04dfb909517d70d13 05-Nov-2013 glebius <glebius@FreeBSD.org> Remove net.link.ether.inet.useloopback sysctl tunable. It was always on by
default from the very beginning. It was placed in wrong namespace
net.link.ether, originally it had been at another wrong namespace. It was
incorrectly documented at incorrect manual page arp(8). Since new-ARP commit,
the tunable have been consulted only on route addition, and ignored on route
deletion. Behaviour of a system with tunable turned off is not fully correct,
and has no advantages comparing to normal behavior.
d480a68d35317e8162d1287994788d4c3a688605 31-Oct-2013 andre <andre@FreeBSD.org> Make struct ifnet readable and comprehensible again by grouping
and ordering related variables, fields and locks next to each
other. Add more comments to variables.

Over time 'ifnet' has accumlated a lot of additional pointers and
functionality in an unstructured way making it quite hard to read
and understand while obfuscating relationships between fields and
variables.

Quantify the structure size and how bloated it has become.

This is only a mechanical change in preparation for upcoming
work to make ifnet opaque to drivers and to separate out the
interface queuing.

Sponsored by: The FreeBSD Foundation
1dfe65e51e3c86e975ff1f3a6fab9325c099909b 29-Oct-2013 andre <andre@FreeBSD.org> Move all interface queue related structures, macros and definitions
from net/if_var to it own new net/ifq.h.

For now net/ifq.h is unconditionally included through net/if_var.h.

This is a mechanical change in preparation to make struct ifnet and
the individual interface queue mechanisms opaque.

Discussed with: glebius
Sponsored by: The FreeBSD Foundation
67835727b43f038edc5165ef54ae93ae77959d5d 28-Oct-2013 glebius <glebius@FreeBSD.org> Style: s/SYS_EVENTHANDLER_H/_SYS_EVENTHANDLER_H_/g

Submitted by: bde
c829949efacb31ac7d8dc8d4df99d801a43a2fec 28-Oct-2013 glebius <glebius@FreeBSD.org> - Make the prophecy from 1997 happen and remove if_var.h inclusion
from if.h.
- Remove unnecessary includes and declarations from if.h
- Remove unnecessary includes and declarations from if_var.h [1]
- Mark some declarations that are about to be removed in near
future with comments, explaning why this declaration is still
necessary.
- Protect eventhandler declarations with #ifdef SYS_EVENTHANDLER_H.

Obtained from: bdeBSD [1]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
790225cfbce40af294b0962bcba0febee9125487 15-Oct-2013 glebius <glebius@FreeBSD.org> - Utilize counter(9) to accumulate statistics on interface addresses. Add
four counters to struct ifaddr. This kills '+=' on a variables shared
between processors for every packet.
- Nuke struct if_data from struct ifaddr.
- In ip_input() do not put a reference on ifaddr, instead update statistics
right now in place and do IN_IFADDR_RUNLOCK(). These removes atomic(9)
for every packet. [1]
- To properly support NET_RT_IFLISTL sysctl used by getifaddrs(3), in
rtsock.c fill if_data fields using counter_u64_fetch().
- Accidentially fix bug in COMPAT_32 version of NET_RT_IFLISTL, which
took if_data not from the ifaddr, but from ifaddr's ifnet. [2]

Submitted by: melifaro [1], pluknet[2]
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
bc71d67cbbd8f45860c1763eb82f7de40d07538c 15-Oct-2013 glebius <glebius@FreeBSD.org> Push some defines under _KERNEL, improve styling and comments.
cb3115eac5997db6088dcc2e232b63bca7d19696 15-Oct-2013 glebius <glebius@FreeBSD.org> Remove ifa_mtx. It was used only in one place in kernel, and ifnet's
ifaddr lock can substitute it there.

Discussed with: melifaro, ae
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
564d02b3040edd78c448583a6b0821a509487497 15-Oct-2013 glebius <glebius@FreeBSD.org> Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
1c87562bdb8818f73d1face0bb6c8f2d02fb4ce4 15-Oct-2013 glebius <glebius@FreeBSD.org> Hide 'struct ifaddr' definition from userland. Two tools left that use it,
namely ipftest(1) and ifmcstat(1). These sniff structure definition using
_WANT_IFADDR define.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
c6a6dc71e90012e4be1c261a84301910796ee2d0 05-Jul-2013 cperciva <cperciva@FreeBSD.org> Fix typo: minmum -> minimum.

Submitted by: @z3ndrag0n
b706ceb4abd7f12c0ad38c0da053f4523814bcc4 03-Jun-2013 andre <andre@FreeBSD.org> Allow drivers to specify a maximum TSO length in bytes if they are
limited in the amount of data they can handle at once.

Drivers can set ifp->if_hw_tsomax before calling ether_ifattach() to
change the limit.

The lowest allowable size is IP_MAXPACKET / 8 (8192 bytes) as anything
less wouldn't be very useful anymore. The upper limit is still at
IP_MAXPACKET (65536 bytes). Raising it requires further auditing of
the IPv4/v6 code path's as the length field in the IP header would
overflow leading to confusion in firewalls and others packet handler on
the real size of the packet.

The placement into "struct ifnet" is a bit hackish but the best place
that was found. When the stack/driver boundary is updated it should
be handled in a better way.

Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week (using spare struct members to preserve ABI)
cc8c6e4d0185c640c9d03ed2804e3020ff84fed0 06-May-2013 andre <andre@FreeBSD.org> Back out r249318, r249320 and r249327 due to a heisenbug most
likely related to a race condition in the ipi_hash_lock with
the exact cause currently unknown but under investigation.
b4bc270e8f6757fa385861750ab22ba0ca4978ed 26-Apr-2013 glebius <glebius@FreeBSD.org> Add const qualifier to the dst parameter of the ifnet if_output method.
e79bb9704b0801e384d3cf9953bd20277e72961d 10-Apr-2013 glebius <glebius@FreeBSD.org> Fix build.
306fddaf7801d7fae1206025486e9d9a97f52ad4 09-Apr-2013 andre <andre@FreeBSD.org> Change certain heavily used network related mutexes and rwlocks to
reside on their own cache line to prevent false sharing with other
nearby structures, especially for those in the .bss segment.

NB: Those mutexes and rwlocks with variables next to them that get
changed on every invocation do not benefit from their own cache line.
Actually it may be net negative because two cache misses would be
incurred in those cases.
a47c0295c5e9d047b28372a33fcabe99a178107f 11-Feb-2013 glebius <glebius@FreeBSD.org> Resolve source address selection in presense of CARP. Add a couple
of helper functions:

- carp_master() - boolean function which is true if an address
is in the MASTER state.
- ifa_preferred() - boolean function that compares two addresses,
and is aware of CARP.

Utilize ifa_preferred() in ifa_ifwithnet().

The previous version of patch also changed source address selection
logic in jails using carp_master(), but we failed to negotiate this part
with Bjoern. May be we will approach this problem again later.

Reported & tested by: Anton Yuzhaninov <citrin citrin.ru>
Sponsored by: Nginx, Inc
450eca0c4dcf45d80e1b47ef0f8f534027e67fd3 08-Feb-2013 attilio <attilio@FreeBSD.org> Merge from vmcontention
8eff194a94cd4b457472d64ddfad18bdc706f3e4 08-Feb-2013 attilio <attilio@FreeBSD.org> MFC
75ad250e9798bf02f5b9ad805d88a8852c3e9545 07-Feb-2013 rrs <rrs@FreeBSD.org> This fixes a out-of-order problem with several
of the newer drivers. The basic problem was
that the driver was pulling the mbuf off the
drbr ring and then when sending with xmit(), encounting
a full transmit ring. Thus the lower layer
xmit() function would return an error, and the
drivers would then append the data back on to the ring.
For TCP this is a horrible scenario sure to bring
on a fast-retransmit.

The fix is to use drbr_peek() to pull the data pointer
but not remove it from the ring. If it fails then
we either call the new drbr_putback or drbr_advance
method. Advance moves it forward (we do this sometimes
when the xmit() function frees the mbuf). When
we succeed we always call advance. The
putback will always copy the mbuf back to the top
of the ring. Note that the putback *cannot* be used
with a drbr_dequeue() only with drbr_peek(). We most
of the time, in putback, would not need to copy it
back since most likey the mbuf is still the same, but
sometimes xmit() functions will change the mbuf via
a pullup or other call. So the optimial case for
the single consumer is to always copy it back. If
we ever do a multiple_consumer (for lagg?) we
will need a test and atomic in the put back possibly
a seperate putback_mc() in the ring buf.

Reviewed by: jhb@freebsd.org, jlv@freebsd.org
34a9a386cb4df8844bca8e43dae20e4a15710fcc 18-Oct-2012 andre <andre@FreeBSD.org> Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
0dfb309a1fc65341261b94a9852bbd1ee0b58577 17-Oct-2012 emax <emax@FreeBSD.org> provide helper if_initbaudrate() to set if_baudrate_pf and if_baudrate_pf.
again, use ixgbe(4) as an example of how to use new helper function.

Reviewed by: jhb
MFC after: 1 week
214df82afacb6e4f782e0d8090c25db6a7230fdf 16-Oct-2012 emax <emax@FreeBSD.org> introduce concept of ifi_baudrate power factor. the idea is to work
around the problem where high speed interfaces (such as ixgbe(4))
are not able to report real ifi_baudrate. bascially, take a spare
byte from struct if_data and use it to store ifi_baudrate power
factor. in other words,

real ifi_baudrate = ifi_baudrate * 10 ^ ifi_baudrate power factor

this should be backwards compatible with old binaries. use ixgbe(4)
as an example on how drivers would set ifi_baudrate power factor

Discussed with: kib, scottl, glebius
MFC after: 1 week
4b29d585cfcde7e84697e4af66a48144fb111e27 28-Sep-2012 glebius <glebius@FreeBSD.org> The drbr(9) API appeared to be so unclear, that most drivers in
tree used it incorrectly, which lead to inaccurate overrated
if_obytes accounting. The drbr(9) used to update ifnet stats on
drbr_enqueue(), which is not accurate since enqueuing doesn't
imply successful processing by driver. Dequeuing neither mean
that. Most drivers also called drbr_stats_update() which did
accounting again, leading to doubled if_obytes statistics. And
in case of severe transmitting, when a packet could be several
times enqueued and dequeued it could have been accounted several
times.

o Thus, make drbr(9) API thinner. Now drbr(9) merely chooses between
ALTQ queueing or buf_ring(9) queueing.
- It doesn't touch the buf_ring stats any more.
- It doesn't touch ifnet stats anymore.
- drbr_stats_update() no longer exists.

o buf_ring(9) handles its stats itself:
- It handles br_drops itself.
- br_prod_bytes stats are dropped. Rationale: no one ever
reads them but update of a common counter on every packet
negatively affects performance due to excessive cache
invalidation.
- buf_ring_enqueue_bytes() reduced to buf_ring_enqueue(), since
we no longer account bytes.

o Drivers handle their stats theirselves: if_obytes, if_omcasts.

o mlx4(4), igb(4), em(4), vxge(4), oce(4) and ixv(4) no longer
use drbr_stats_update(), and update ifnet stats theirselves.

o bxe(4) was the most correct driver, it didn't call
drbr_stats_update(), thus it was the only driver accurate under
moderate load. Now it also maintains stats itself.

o ixgbe(4) had already taken stats from hardware, so just
- drop software stats updating.
- take multicast packet count from hardware as well.

o mxge(4) just no longer needs NO_SLOW_STATS define.

o cxgb(4), cxgbe(4) need no change, since they obtain stats
from hardware.

Reviewed by: jfv, gnn
b490e053cd9aa2ed0142ab95406c9e7f92728c62 04-Sep-2012 melifaro <melifaro@FreeBSD.org> Fix the build broken by r240099.
Hide link_pfil_hook under _KERNEL macro.

MFC after: 3 weeks
1fbae66b6e67117d899f9c10f12c000c4584d32c 04-Sep-2012 melifaro <melifaro@FreeBSD.org> Introduce new link-layer PFIL hook V_link_pfil_hook.
Merge ether_ipfw_chk() and part of bridge_pfil() into
unified ipfw_check_frame() function called by PFIL.
This change was suggested by rwatson? @ DevSummit.

Remove ipfw headers from ether/bridge code since they are unneeded now.

Note this thange introduce some (temporary) performance penalty since
PFIL read lock has to be acquired for every link-level packet.

MFC after: 3 weeks
abf245020a075c487a1ac4e60c7069e2d8c9c7c3 02-Aug-2012 glebius <glebius@FreeBSD.org> Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
67d5f1a727273d8e141e96c429114dff9fb06ec3 19-Jun-2012 np <np@FreeBSD.org> - Updated TOE support in the kernel.

- Stateful TCP offload drivers for Terminator 3 and 4 (T3 and T4) ASICs.
These are available as t3_tom and t4_tom modules that augment cxgb(4)
and cxgbe(4) respectively. The cxgb/cxgbe drivers continue to work as
usual with or without these extra features.

- iWARP driver for Terminator 3 ASIC (kernel verbs). T4 iWARP in the
works and will follow soon.

Build-tested with make universe.

30s overview
============
What interfaces support TCP offload? Look for TOE4 and/or TOE6 in the
capabilities of an interface:
# ifconfig -m | grep TOE

Enable/disable TCP offload on an interface (just like any other ifnet
capability):
# ifconfig cxgbe0 toe
# ifconfig cxgbe0 -toe

Which connections are offloaded? Look for toe4 and/or toe6 in the
output of netstat and sockstat:
# netstat -np tcp | grep toe
# sockstat -46c | grep toe

Reviewed by: bz, gnn
Sponsored by: Chelsio communications.
MFC after: ~3 months (after 9.1, and after ensuring MFC is feasible)
e86f714c838f0ea47586ee0b771dc80bb182dfc8 19-Mar-2012 jhb <jhb@FreeBSD.org> Retire the IF_ADDR_LOCK() and IF_ADDR_UNLOCK() compat macros from HEAD.
The new [RW]LOCK macros are merged back to 8.x so should be suitable for
new code in HEAD even if it is to be MFC'd.
322f330b9214ae7d4356e7d66d3f1970ab7c74cd 08-Feb-2012 pluknet <pluknet@FreeBSD.org> g/c last bit of old ipv6 prefix management.

Reviewed by: bz
Obtained from: NetBSD, net/if.h, rev 1.80
0577b44f7395fee798c50b2e4f6f24d717dabeb7 09-Jan-2012 jhb <jhb@FreeBSD.org> Convert the per-interface address list lock from a mutex to a reader/writer
lock.

Reviewed by: bz
219e62f17ed5852c0d61721758ef94a0587939f4 05-Jan-2012 jhb <jhb@FreeBSD.org> Add new variants of the IF_ADDR_*LOCK*() macros used for protecting
interface address lists that distinguish read locks from write locks.
To preserve the KPI, the previous operations are mapped to the write
lock macros. The lock is still kept as a mutex for now.

Reviewed by: bz
MFC after: 2 weeks
27a36f6ac8242750daa092abd7180b10d16f4508 16-Dec-2011 glebius <glebius@FreeBSD.org> A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
9029bb4f3b3dcc9ee7c165dafa255c0a1be9dff4 09-Dec-2011 brooks <brooks@FreeBSD.org> Remove the unused if_free_type() function.

X-MFC after: never
fb9b0b1cbedce705bd12afd3a921ffdbc1f299fa 27-Oct-2011 glebius <glebius@FreeBSD.org> Add macro IF_DEQUEUE_ALL(ifq, m), that takes the entire mbuf chain off
the queue. It can be utilized in queue processing to avoid multiple
locking/unlocking.
352be4e985c0df0cf92cf64d89515b0b32bd1bf4 17-Jul-2011 bz <bz@FreeBSD.org> Add spares to the network stack for FreeBSD-9:
- TCP keep* timers
- TCP UTO (adjust from what was there already)
- netmap
- route caching
- user cookie (temporary to allow for the real fix)

Slightly re-shuffle struct ifnet moving fields out of the middle
of spares and to better align.

Discussed with: rwatson (slightly earlier version)
cf260e73d64c6f9645f9836a1a5e0f93061ae882 03-Jul-2011 bz <bz@FreeBSD.org> Remove extra white space to comply with style for the rest of the struct.

MFC after: 2 weeks
9cad5bfef3ce97c030d30e66deb6371458c2281b 03-Jul-2011 bz <bz@FreeBSD.org> Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet
2d7d8c05e7404fbebf1f0fe24c13bc5bb58d2338 21-Mar-2011 jeff <jeff@FreeBSD.org> - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.
09f9c897d33c41618ada06fbbcf1a9b3812dee53 19-Oct-2010 jamie <jamie@FreeBSD.org> A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.
32a9bf8a034f839ea78a80d8f68a2d101a102afd 25-Jun-2010 qingli <qingli@FreeBSD.org> MFC r208553

This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

Approved by: re (bz)
f6ab4a681092467819a08db78ce8d607027932f3 25-May-2010 qingli <qingli@FreeBSD.org> This patch fixes the problem where proxy ARP entries cannot be added
over the if_ng interface.

MFC after: 3 days
689fe828644e04fd49fbcf141c59eb2a0cf8f553 25-May-2010 thompsa <thompsa@FreeBSD.org> MFC r202588

Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev

MFC r202611

Do not hold the lock over if_setlladdr() as it calls into the interface driver
init routine.
f1216d1f0ade038907195fc114b7e630623b402c 19-Mar-2010 delphij <delphij@FreeBSD.org> Create a custom branch where I will be able to do the merge.
868d4ad7333a0afc69984b5739d67c20040b516b 18-Mar-2010 mlaier <mlaier@FreeBSD.org> MFC r203834 and r205197: Make ALTQ work for drbr consumers.
d62719cf37edd243761ae9dbc2adb07004e7182d 15-Mar-2010 mlaier <mlaier@FreeBSD.org> Fix a small bug in drbr_dequeue_cond spotted while preparing MFC of r203834.

MFC after: 3 days
2c255a85f1a94490d53a0d50a11a7292322fdebb 26-Feb-2010 delphij <delphij@FreeBSD.org> MFC 203052:

Add interface description capability as inspired by OpenBSD. Thanks for
rwatson@, jhb@, brooks@ and others for feedback to the old implementation!

Sponsored by: iXsystems, Inc.
b056acb8640e378c0d878874ce7f56a67b32ac83 13-Feb-2010 mlaier <mlaier@FreeBSD.org> Fix drbr and altq interaction:
- introduce drbr_needs_enqueue that returns whether the interface/br needs
an enqueue operation: returns true if altq is enabled or there are
already packets in the ring (as we need to maintain packet order)
- update all drbr consumers
- fix drbr_flush
- avoid using the driver queue (IFQ_DRV_*) in the altq case as the
multiqueue consumer does not provide enough protection, serialize altq
interaction with the main queue lock
- make drbr_dequeue_cond work with altq

Discussed with: kmacy, yongari, jfv
MFC after: 4 weeks
d0d2d29daf090b9ec7d5ac61985fb32a0ec76bc0 31-Jan-2010 syrinx <syrinx@FreeBSD.org> MFC r202935:
While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaulo, bms
d9a0cd0982402f9faf826972323ba7e2c92d4da2 27-Jan-2010 delphij <delphij@FreeBSD.org> Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month
40d92428fb5aa102b7ab0489b9b39e999c4bab8e 24-Jan-2010 syrinx <syrinx@FreeBSD.org> While flushing the multicast filter of an interface, do not zero the relevant
ifmultiaddr structures' reference to the parent interface, unless the parent
interface is really detaching. While here, program only link layer multicast
filters to a wlan's hardware parent interface.

PR: kern/142391, kern/142392
Reviewed by: sam, rpaolo, bms
MFC after: 1 week
5056e27c2d94cd3dce26860a82a784814b55b3e9 18-Jan-2010 thompsa <thompsa@FreeBSD.org> Declare a new EVENTHANDLER called iflladdr_event which signals that the L2
address on an interface has changed. This lets stacked interfaces such as
vlan(4) detect that their lower interface has changed and adjust things in
order to keep working. Previously this situation broke at least vlan(4) and
lagg(4) configurations.

The EVENTHANDLER_INVOKE call was not placed within if_setlladdr() due to the
risk of a loop.

PR: kern/142927
Submitted by: Nikolay Denev
b738408ac27336e956e752c61e1a2b2939ec7506 05-Jan-2010 qingli <qingli@FreeBSD.org> MFC r201319

Remove a deleted comment line that was brought back by
my previous commit.
ea5192e625ebf087e54f747a66b38f4034431708 05-Jan-2010 qingli <qingli@FreeBSD.org> MFC r201282, r201543

r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

PR: ports/141342, kern/141134
c44d3e680a9bf89f498ebc30c778a4517e6a7b61 31-Dec-2009 qingli <qingli@FreeBSD.org> Remove a deleted comment line that was brought back by
my previous commit.

MFC after: 5 days
ed965a92bc17f25c5049fbd529d10a9e94f8a3a7 30-Dec-2009 qingli <qingli@FreeBSD.org> The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days
70c567a7537653168b0f86d33547b306938d324c 21-Dec-2009 jhb <jhb@FreeBSD.org> Remove commented out prototype for ifinit(). This prototype has been
commented out since 1.1 and has not been present in <sys/systm.h> since at
least 1.1 of that file. It is also not needed in FreeBSD due to SYSINIT().
1f48c677b57a5904d8da8f832f04e5037fb047bb 30-Nov-2009 jhb <jhb@FreeBSD.org> Remove if_timer/if_watchdog now that they are no longer used. The space
used by if_timer is reserved for expanding if_index to an int in the
future.

Reviewed by: rwatson, brooks
8fed657163fb373990aaa15c79b58a7c963373b2 12-Nov-2009 delphij <delphij@FreeBSD.org> Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.
13a19ef806aacb68fca8a06969fe760e790cf191 11-Nov-2009 delphij <delphij@FreeBSD.org> Add interface description capability as inspired by OpenBSD.

MFC after: 3 months
ceec1be0ff52ee7829036be5125f8e0795e26acd 15-Sep-2009 qingli <qingli@FreeBSD.org> MFC r197227

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
Approved by: re
3a82e44273f4a5c05d848c2959b2a9d8188b1ba0 15-Sep-2009 qingli <qingli@FreeBSD.org> Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately
7ada92d3753b9dcbdd54b669d00046d10867c4e8 28-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge r196510 from head to stable/8:

Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

Approved by: re (kib)
464ba339f01ef15c3bef8d990c268431fe769b42 28-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge r196481 from head to stable/8:

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian

Approved by: re (kib)
260dfcf9e90c1d463bc98f1adb301db35d139e33 24-Aug-2009 rwatson <rwatson@FreeBSD.org> Make if_grow static -- it's not used outside of if.c, and with the
internals destined to change, it's better if it remains that way.

MFC after: 3 days
ef8d755d4df716bf13f8a1833f7dd1db0b78c569 23-Aug-2009 rwatson <rwatson@FreeBSD.org> Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days
e0ea45278f73df92d45a90c54860d19b1c7073f5 20-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge r196263 from head to stable/8:

Remove unused if_rawoutput() macro; it has been unused since at least
FreeBSD 2.

Approved by: re (kib)
e905082a5a59cce82377b1bf910e37f03c0eb91c 15-Aug-2009 rwatson <rwatson@FreeBSD.org> Remove unused if_rawoutput() macro; it has been unused since at least
FreeBSD 2.

Approved by: re (kib)
8c1899d9347988f10f8675cc62ab08f26bb9f2d7 27-Jul-2009 qingli <qingli@FreeBSD.org> This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re
88f8de4d4001c74946458579ca0710df70161c90 16-Jul-2009 rwatson <rwatson@FreeBSD.org> Remove unused VNET_SET() and related macros; only VNET_GET() is
ever actually used. Rename VNET_GET() to VNET() to shorten
variable references.

Discussed with: bz, julian
Reviewed by: bz
Approved by: re (kensmith, kib)
57ca4583e728cab422fba8f15de10bd0b637b3dd 14-Jul-2009 rwatson <rwatson@FreeBSD.org> Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
0cabaf8791593503b5c5aee1394849a275c48ef9 29-Jun-2009 brooks <brooks@FreeBSD.org> Remove support for the /dev/net/* per-interface devices. They serve
little purpose and are unused in the base system.

The IOCTL functionality is entirely duplicated and routing sockets
provide a richer interface than the kqueue functionality.

Further, it is not practical for these devices to be made sensible in
the face of VIMAGE.

Bump __FreeBSD_version on the off chance that there is any code out
there that actually uses this stuff.

Reviewed by: rwatson
Discussed with: bz, zec
Approved by: re@ (kensmith)
c4ac6ab020126b381316fa8dcb627571af2683c8 26-Jun-2009 rwatson <rwatson@FreeBSD.org> Define four wrapper functions for interface address locking,
if_addr_rlock() and if_addr_runlock() for regular address lists, and
if_maddr_rlock() and if_maddr_runlock() for multicast address lists.

We will use these in various kernel modules to avoid encoding specific
type and locking strategy information into modules that currently use
IF_ADDR_LOCK() and IF_ADDR_UNLOCK() directly.

MFC after: 6 weeks
3c02410d55d9cfe0cb4dcbfad5eef376463d8e08 22-Jun-2009 rwatson <rwatson@FreeBSD.org> Add a new function, ifa_ifwithaddr_check(), which rather than returning
a pointer to an ifaddr matching the passed socket address, returns a
boolean indicating whether one was present. In the (near) future,
ifa_ifwithaddr() will return a referenced ifaddr rather than a raw
ifaddr pointer, and the new wrapper will allow callers that care only
about the boolean condition to avoid having to free that reference.

MFC after: 3 weeks
1f7e54e8c51edb13935d195e0c1f2ec68c672794 21-Jun-2009 rwatson <rwatson@FreeBSD.org> Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks
6154623e0c7a2a355870e4a5ffacd4ec8e4ce8f9 19-Jun-2009 kmacy <kmacy@FreeBSD.org> add helper function for flushing software queues
df30f1c9f2ef0498e2aa537d4aec3aab759dc6c3 15-Jun-2009 sam <sam@FreeBSD.org> r193336 moved ifq_detach to if_free which broke if_alloc followed
by if_free (w/o doing if_attach); move ifq_attach to if_alloc and
rename ifq_attach/detach to ifq_init/ifq_delete to better identify
their purpose

Reviewed by: jhb, kmacy
3f394b4e7809f55e44ba6dbd134af82530dc0479 09-Jun-2009 kmacy <kmacy@FreeBSD.org> - add drbr routines for accessing #qentries and conditionally dequeueing
- track bytes enqueued in buf_ring
8b1f38241aaf07621c062901b7946145be2862b6 08-Jun-2009 zec <zec@FreeBSD.org> Introduce an infrastructure for dismantling vnet instances.

Vnet modules and protocol domains may now register destructor
functions to clean up and release per-module state. The destructor
mechanisms can be triggered by invoking "vimage -d", or a future
equivalent command which will be provided via the new jail framework.

While this patch introduces numerous placeholder destructor functions,
many of those are currently incomplete, thus leaking memory or (even
worse) failing to stop all running timers. Many of such issues are
already known and will be incrementaly fixed over the next weeks in
smaller incremental commits.

Apart from introducing new fields in structs ifnet, domain, protosw
and vnet_net, which requires the kernel and modules to be rebuilt, this
change should have no impact on nooptions VIMAGE builds, since vnet
destructors can only be called in VIMAGE kernels. Moreover,
destructor functions should be in general compiled in only in
options VIMAGE builds, except for kernel modules which can be safely
kldunloaded at run time.

Bump __FreeBSD_version to 800097.
Reviewed by: bz, julian
Approved by: rwatson, kib (re), julian (mentor)
b523608331b881784ac18a7dfcb65c7a679130b0 30-May-2009 attilio <attilio@FreeBSD.org> When user_frac in the polling subsystem is low it is going to busy the
CPU for too long period than necessary. Additively, interfaces are kept
polled (in the tick) even if no more packets are available.
In order to avoid such situations a new generic mechanism can be
implemented in proactive way, keeping track of the time spent on any
packet and fragmenting the time for any tick, stopping the processing
as soon as possible.

In order to implement such mechanism, the polling handler needs to
change, returning the number of packets processed.
While the intended logic is not part of this patch, the polling KPI is
broken by this commit, adding an int return value and the new flag
IFCAP_POLLING_NOCOUNT (which will signal that the return value is
meaningless for the installed handler and checking should be skipped).

Bump __FreeBSD_version in order to signal such situation.

Reviewed by: emaste
Sponsored by: Sandvine Incorporated
363a644ce641f813d42c7c9f07a00a8b85f64c6c 22-May-2009 zec <zec@FreeBSD.org> Introduce the if_vmove() function, which will be used in the future
for reassigning ifnets from one vnet to another.

if_vmove() works by calling a restricted subset of actions normally
executed by if_detach() on an ifnet in the current vnet, and then
switches to the target vnet and executes an appropriate subset of
if_attach() actions there.

if_attach() and if_detach() have become wrapper functions around
if_attach_internal() and if_detach_internal(), where the later
variants have an additional argument, a flag indicating whether a
full attach or detach sequence is to be executed, or only a
restricted subset suitable for moving an ifnet from one vnet to
another. Hence, if_vmove() will not call if_detach() and if_attach()
directly, but will call the if_detach_internal() and
if_attach_internal() variants instead, with the vmove flag set.

While here, staticize ifnet_setbyindex() since it is not referenced
from outside of sys/net/if.c.

Also rename ifccnt field in struct vimage to ifcnt, and do some minor
whitespace garbage collection where appropriate.

This change should have no functional impact on nooptions VIMAGE kernel
builds.

Reviewed by: bz, rwatson, brooks?
Approved by: julian (mentor)
d78a1b1a824c4f5eb8cb3583bb5265f73dcc24dd 05-May-2009 zec <zec@FreeBSD.org> Change the curvnet variable from a global const struct vnet *,
previously always pointing to the default vnet context, to a
dynamically changing thread-local one. The currvnet context
should be set on entry to networking code via CURVNET_SET() macros,
and reverted to previous state via CURVNET_RESTORE(). Recursions
on curvnet are permitted, though strongly discuouraged.

This change should have no functional impact on nooptions VIMAGE
kernel builds, where CURVNET_* macros expand to whitespace.

The curthread->td_vnet (aka curvnet) variable's purpose is to be an
indicator of the vnet context in which the current network-related
operation takes place, in case we cannot deduce the current vnet
context from any other source, such as by looking at mbuf's
m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so
far curvnet has turned out to be an invaluable consistency checking
aid: it helps to catch cases when sockets, ifnets or any other
vnet-aware structures may have leaked from one vnet to another.

The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros
was a result of an empirical iterative process, whith an aim to
reduce recursions on CURVNET_SET() to a minimum, while still reducing
the scope of CURVNET_SET() to networking only operations - the
alternative would be calling CURVNET_SET() on each system call entry.
In general, curvnet has to be set in three typicall cases: when
processing socket-related requests from userspace or from within the
kernel; when processing inbound traffic flowing from device drivers
to upper layers of the networking stack, and when executing
timer-driven networking functions.

This change also introduces a DDB subcommand to show the list of all
vnet instances.

Approved by: julian (mentor)
39b6dc8ba2de1c81754454858aae4fc4b706bdbf 30-Apr-2009 zec <zec@FreeBSD.org> Permit buiding kernels with options VIMAGE, restricted to only a single
active network stack instance. Turning on options VIMAGE at compile
time yields the following changes relative to default kernel build:

1) V_ accessor macros for virtualized variables resolve to structure
fields via base pointers, instead of being resolved as fields in global
structs or plain global variables. As an example, V_ifnet becomes:

options VIMAGE: ((struct vnet_net *) vnet_net)->_ifnet
default build: vnet_net_0._ifnet
options VIMAGE_GLOBALS: ifnet

2) INIT_VNET_* macros will declare and set up base pointers to be used
by V_ accessor macros, instead of resolving to whitespace:

INIT_VNET_NET(ifp->if_vnet); becomes

struct vnet_net *vnet_net = (ifp->if_vnet)->mod_data[VNET_MOD_NET];

3) Memory for vnet modules registered via vnet_mod_register() is now
allocated at run time in sys/kern/kern_vimage.c, instead of per vnet
module structs being declared as globals. If required, vnet modules
can now request the framework to provide them with allocated bzeroed
memory by filling in the vmi_size field in their vmi_modinfo structures.

4) structs socket, ifnet, inpcbinfo, tcpcb and syncache_head are
extended to hold a pointer to the parent vnet. options VIMAGE builds
will fill in those fields as required.

5) curvnet is introduced as a new global variable in options VIMAGE
builds, always pointing to the default and only struct vnet.

6) struct sysctl_oid has been extended with additional two fields to
store major and minor virtualization module identifiers, oid_v_subs and
oid_v_mod. SYSCTL_V_* family of macros will fill in those fields
accordingly, and store the offset in the appropriate vnet container
struct in oid_arg1.
In sysctl handlers dealing with virtualized sysctls, the
SYSCTL_RESOLVE_V_ARG1() macro will compute the address of the target
variable and make it available in arg1 variable for further processing.

Unused fields in structs vnet_inet, vnet_inet6 and vnet_ipfw have
been deleted.

Reviewed by: bz, rwatson
Approved by: julian (mentor)
471539dc8f7f8952bce906c2f71708c614007fcb 23-Apr-2009 rwatson <rwatson@FreeBSD.org> Add ifunit_ref(), a version of ifunit(), that returns not just an
interface pointer, but also a reference to it.

Modify ifioctl() to use ifunit_ref(), holding the reference until
all ioctls, etc, have completed.

This closes a class of reader-writer races in which interfaces
could be removed during long-running ioctls, leading to crashes.
Many other consumers of ifunit() should now use ifunit_ref() to
avoid similar races.

MFC after: 3 weeks
ccc05d4c7fc358ca3cc8339274b58835f1ba153b 23-Apr-2009 rwatson <rwatson@FreeBSD.org> During if_detach(), invoke if_dead() to set the ifnet's function
pointers to "dead" implementations that no-op rather than invoking
the device driver. This would generally be unexpected and
possibly quite badly handled by most device drivers after
if_detach() has completed.

Reviewed by: bms
MFC after: 3 weeks
6b19bec016f3ac4388638851255a7bec97b3a4ce 21-Apr-2009 rwatson <rwatson@FreeBSD.org> Start to address a number of races relating to use of ifnet pointers
after the corresponding interface has been destroyed:

(1) Add an ifnet refcount, ifp->if_refcount. Initialize it to 1 in
if_alloc(), and modify if_free_type() to decrement and check the
refcount.

(2) Add new if_ref() and if_rele() interfaces to allow kernel code
walking global interface lists to release IFNET_[RW]LOCK() yet
keep the ifnet stable. Currently, if_rele() is a no-op wrapper
around if_free(), but this may change in the future.

(3) Add new ifnet field, if_alloctype, which caches the type passed
to if_alloc(), but unlike if_type, won't be changed by drivers.
This allows asynchronous free's of the interface after the
driver has released it to still use the right type. Use that
instead of the type passed to if_free_type(), but assert that
they are the same (might have to rethink this if that doesn't
work out).

(4) Add a new ifnet_byindex_ref(), which looks up an interface by
index and returns a reference rather than a pointer to it.

(5) Fix if_alloc() to fully initialize the if_addr_mtx before hooking
up the ifnet to global lists.

(6) Modify sysctls in if_mib.c to use ifnet_byindex_ref() and release
the ifnet when done.

When this change is MFC'd, it will need to replace if_ispare fields
rather than adding new fields in order to avoid breaking the binary
interface. Once this change is MFC'd, if_free_type() should be
removed, as its 'type' argument is now optional.

This refcount is not appropriate for counting mbuf pkthdr references,
and also not for counting entry into the device driver via ifnet
function pointers. An rmlock may be appropriate for the latter.
Rather, this is about ensuring data structure stability when reaching
an ifnet via global ifnet lists and tables followed by copy in or out
of userspace.

MFC after: 3 weeks
Reported by: mdtancsa
Reviewed by: brooks
7370d77f7808c519f45f5b97005d10d654fa0275 16-Apr-2009 kmacy <kmacy@FreeBSD.org> export if_qflush for use by driver if_qflush routines
only set ifp->if_{transmit, qflush} if not already set
KASSERT that neither or both are set
24b38efdce5f73d92ac948039ef4966d9502b484 16-Apr-2009 kmacy <kmacy@FreeBSD.org> Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by: rwatson
81fc29cb4ec033baae01089e70eafecd3a2578d3 14-Apr-2009 kmacy <kmacy@FreeBSD.org> Adapt buf_ring abstraction interface to allow consumers to interoperate with ALTQ
70b6a8119c02ed07bc12918814c950d358cb1885 15-Mar-2009 rwatson <rwatson@FreeBSD.org> Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@
16815243808e39728fb36990d4b900962c89c2f3 01-Mar-2009 rwatson <rwatson@FreeBSD.org> Do a bit of struct ifnet cleanup in preparation for 8.0: group function
pointers together, move padding to the bottom of the structure, and add
two new integer spares due to attrition over time. Remove unused spare
"flags" field, we can use one of the spare ints if we need it later.

This change requires a rebuild of device driver modules that depend on
the layout of ifnet for binary compatibility reasons.

Discussed with: kmacy
7efe2ccd487653a48a680daec35810cb43bdb1f2 17-Dec-2008 kmacy <kmacy@FreeBSD.org> Keep stats in drbr_enqueue

Discussed with: ps
4b4aad01dc1f9b4b7b3a5e843c8a34629fb6fd67 17-Dec-2008 kmacy <kmacy@FreeBSD.org> merge in 2 buf_ring helper routines for enqueueing and freeing buf_rings
d0147f27c78887e74c73b5157a363120412f0c73 17-Dec-2008 kmacy <kmacy@FreeBSD.org> convert ifnet and afdata locks from mutexes to rwlocks
ec826ad5c7f97de814529d3b3bae7950f91d9a5d 15-Dec-2008 qingli <qingli@FreeBSD.org> This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
98e7fe0e6a4b7ee298fffadcf232867ffeecbad6 13-Dec-2008 bz <bz@FreeBSD.org> Second round of putting global variables, which were virtualized
but formerly missed under VIMAGE_GLOBAL.

Put the extern declarations of the virtualized globals
under VIMAGE_GLOBAL as the globals themsevles are already.
This will help by the time when we are going to remove the globals
entirely.

Sponsored by: The FreeBSD Foundation
604d89458ab94ec81eaefa2d55ef219cba461e31 02-Dec-2008 bz <bz@FreeBSD.org> Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation
19b6af98ec71398e77874582eb84ec5310c7156f 22-Nov-2008 dfr <dfr@FreeBSD.org> Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64
performance in Xen's HVM mode.
9d3bb599b193495af5419ee85be4afe9a18b6091 22-Nov-2008 kmacy <kmacy@FreeBSD.org> - bump __FreeBSD version to reflect added buf_ring, memory barriers,
and ifnet functions

- add memory barriers to <machine/atomic.h>
- update drivers to only conditionally define their own

- add lockless producer / consumer ring buffer
- remove ring buffer implementation from cxgb and update its callers

- add if_transmit(struct ifnet *ifp, struct mbuf *m) to ifnet to
allow drivers to efficiently manage multiple hardware queues
(i.e. not serialize all packets through one ifq)
- expose if_qflush to allow drivers to flush any driver managed queues

This work was supported by Bitgravity Inc. and Chelsio Inc.
cf5320822f93810742e3d4a1ac8202db8482e633 19-Oct-2008 lulf <lulf@FreeBSD.org> - Import the HEAD csup code which is the basis for the cvsmode work.
8797d4caecd5881e312923ee1d07be3de68755dc 02-Oct-2008 zec <zec@FreeBSD.org> Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
fb39793d413da2a62c1fa0f3126c3b01e5c41e8b 20-Aug-2008 thompsa <thompsa@FreeBSD.org> ifnet_setbyindex() is only used locally, go back to being static.
01940f4e651c7f9086a04458e21966b4a28b7bfc 20-Aug-2008 kmacy <kmacy@FreeBSD.org> Fix build
331f5de14e1c693f1ee0eeb4230ca2199195d877 03-Aug-2008 rwatson <rwatson@FreeBSD.org> Merge r180042 from head to stable/7:

Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks
dc8d54c205784683ec1aae7ecf1f24fe1f6cb2c0 24-Jul-2008 julian <julian@FreeBSD.org> MFC an ABI compatible implementation of Multiple routing tables.
See the commit message for
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/net/route.c
version 1.129 (svn change # 178888) for more info.

Obtained from: Ironport (Cisco Systems)
46dd6e44fc7f70ee8d82d41fb83bedfb2c7829c8 26-Jun-2008 rwatson <rwatson@FreeBSD.org> Introduce locking around use of ifindex_table, whose use was previously
unsynchronized. While races were extremely rare, we've now had a
couple of reports of panics in environments involving large numbers of
IPSEC tunnels being added very quickly on an active system.

- Add accessor functions ifnet_byindex(), ifaddr_byindex(),
ifdev_byindex() to replace existing accessor macros. These functions
now acquire the ifnet lock before derefencing the table.
- Add IFNET_WLOCK_ASSERT().
- Add static accessor functions ifnet_setbyindex(), ifdev_setbyindex(),
which set values in the table either asserting of acquiring the ifnet
lock.
- Use accessor functions throughout if.c to modify and read
ifindex_table.
- Rework ifnet attach/detach to lock around ifindex_table modification.

Note that these changes simply close races around use of ifindex_table,
and make no attempt to solve the probem of disappearing ifnets. Further
refinement of this work, including with respect to ifindex_table
resizing, is still required.

In a future change, the ifnet lock should be converted from a mutex to an
rwlock in order to reduce contention.

Reviewed and tested by: brooks
1dfc5c98a4f7c32163dfdc61e390ccf805385108 09-May-2008 julian <julian@FreeBSD.org> Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
d5c642ca443f41542b578374557a4935625a9ecf 25-Mar-2008 sam <sam@FreeBSD.org> expose if_purgemaddrs, it will be used by the vap code unless someone
redesigns the mcast support code in the next few weeks

MFC after: 3 weeks
a5002e6b8597d3911ca2106e12147c870fd0fbbc 07-Dec-2007 kmacy <kmacy@FreeBSD.org> Add padding for anticipated functionality
- vimage
- TOE
- multiq
- host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
Approved by: re(gnn)
12b5f9c8c99a01b1d40e88aaa1a58ce757e68d5e 07-Dec-2007 kmacy <kmacy@FreeBSD.org> Add padding for anticipated functionality
- vimage
- TOE
- multiq
- host rtentry caching

Rename spare used by 80211 to if_llsoftc

Reviewed by: rwatson, gnn
MFC after: 1 day
77b1e42295c59b8ca8b893bd38c73ef40ffb574c 07-Jul-2007 brian <brian@FreeBSD.org> Fix a problem introduced in netinet/in.c 1.85.2.7 where
in_ifdetach() calls in_delmulti_ifp().

The code now *really* deletes the elements in in_multihead
for the ifp that's going away (rather than just decrementing
the reference count). Previously we were left with inm and
ifma structures containing bogus ifnet pointers after
destroying an interface that had more than one IP4 assignment
made to it in it's lifetime.

I've also added a if_delmulti_ent() to make deleting known
ifma structures possible rather than depending on
if_findmulti() to end up finding the same thing. It
will in fact always find the correct ifma *unless* the
passed sockaddr has a bogus sa_len of zero.

Finally, when adding a multicast address, we no longer
increment the refcount (well, we do, but then we decrement
it again). The refcount here is in fact bogus so hopefully
readers will see that now.

This code is going directly into -stable as it has been
rewritten in -current and those changes are deemed too
intrusive for -stable consumption right now.

Reviewed by: bms
1a4d0035866239e2e48cc9303290fa20c54b7c00 17-May-2007 thompsa <thompsa@FreeBSD.org> MFC the lagg(4) driver which provides link aggregation, failover and fault
tolerance.
ec3f5ae79f183ed96738b13d970bdf0219a1eca5 16-May-2007 brooks <brooks@FreeBSD.org> The struct if_data members ifi_recvquota and ifi_xmitquota have been
unused for ages. Rename them to ifi_spare_char1 and ifi_spare_char2
respectively to indicate this face.
5fc175b7b49fb508d186e2eadee5104c2f774e24 17-Apr-2007 thompsa <thompsa@FreeBSD.org> Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@
0f00c64853f65a32f7ca644870fd6ceba5b3ce7d 10-Apr-2007 thompsa <thompsa@FreeBSD.org> Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.
a5925f917ca324e75df91e49409a06e1617c7ae7 20-Mar-2007 bms <bms@FreeBSD.org> Fix tinderbox; ng_ether needs to see if_findmulti().
4ffc00490175f1ea8b4a87149bed2b0076df6f3b 20-Mar-2007 bms <bms@FreeBSD.org> Implement reference counting for ifmultiaddr, in_multi, and in6_multi
structures. Detect when ifnet instances are detached from the network
stack and perform appropriate cleanup to prevent memory leaks.

This has been implemented in such a way as to be backwards ABI compatible.
Kernel consumers are changed to use if_delmulti_ifma(); in_delmulti()
is unable to detect interface removal by design, as it performs searches
on structures which are removed with the interface.

With this architectural change, the panics FreeBSD users have experienced
with carp and pfsync should be resolved.

Obtained from: p4 branch bms_netdev
Reviewed by: andre
Sponsored by: Garance A Drosehn
Idea from: NetBSD
MFC after: 1 month
2446ee5a540b3ca863f7b699c410b4a671c6d64f 06-Oct-2006 andre <andre@FreeBSD.org> MFC:
- Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.
- Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.
- Check inp_flags instead of inp_vflag for INP_ONESBCAST flag.

PR: kern/99558
Approved by: re (kensmith)
ae5965062b9bbb95c18a0e4e157f2a1c1247263f 06-Sep-2006 andre <andre@FreeBSD.org> Improve description of if_capabilities, if_capenable and ifi_hwassist.

Sponsored by: TCP/IP Optimization Fundraise 2005
f044a1949bf52ae215c04b5885db0a0fa58680cf 06-Sep-2006 andre <andre@FreeBSD.org> Fix the socket option IP_ONESBCAST by giving it its own case in ip_output()
and skip over the normal IP processing.

Add a supporting function ifa_ifwithbroadaddr() to verify and validate the
supplied subnet broadcast address.

PR: kern/99558
Tested by: Andrey V. Elsukov <bu7cher-at-yandex.ru>
Sponsored by: TCP/IP Optimization Fundraise 2005
MFC after: 3 days
bc6ab54808cf20a40cd7ba44043d40db1ec2e78e 04-Aug-2006 brooks <brooks@FreeBSD.org> With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
f5cde2819f76cb3f86ff02a0c422b289ce94a096 19-Jun-2006 mlaier <mlaier@FreeBSD.org> Import interface groups from OpenBSD. This allows to group interfaces in
order to - for example - apply firewall rules to a whole group of
interfaces. This is required for importing pf from OpenBSD 3.9

Obtained from: OpenBSD (with changes)
Discussed on: -net (back in April)
19f8b36e662bea1b79c01dab7717540417040328 30-Jan-2006 glebius <glebius@FreeBSD.org> Merge the //depot/user/yar/vlan branch into CVS. It contains some collective
work by yar, thompsa and myself. The checksum offloading part also involves
work done by Mihail Balikov.

The most important changes:

o Instead of global linked list of all vlan softc use a per-trunk
hash. The size of hash is dynamically adjusted, depending on
number of entries. This changes struct ifnet, replacing counter
of vlans with a pointer to trunk structure. This change is an
improvement for setups with big number of VLANs, several interfaces
and several CPUs. It is a small regression for a setup with a single
VLAN interface.
An alternative to dynamic hash is a per-trunk static array with
4096 entries, which is a compile time option - VLAN_ARRAY. In my
experiments the array is not an improvement, probably because such
a big trunk structure doesn't fit into CPU cache.
o Introduce an UMA zone for VLAN tags. Since drivers depend on it,
the zone is declared in kern_mbuf.c, not in optional vlan(4) driver.
This change is a big improvement for any setup utilizing vlan(4).
o Use rwlock(9) instead of mutex(9) for locking. We are the first
ones to do this! :)
o Some drivers can do hardware VLAN tagging + hardware checksum
offloading. Add an infrastructure for this. Whenever vlan(4) is
attached to a parent or parent configuration is changed, the flags
on vlan(4) interface are updated.

In collaboration with: yar, thompsa
In collaboration with: Mihail Balikov <mihail.balikov interbgc.com>
f70f525b491a7d4a0a4f60eb7d69095f6f6e12e4 11-Nov-2005 ru <ru@FreeBSD.org> - Store pointer to the link-level address right in "struct ifnet"
rather than in ifindex_table[]; all (except one) accesses are
through ifp anyway. IF_LLADDR() works faster, and all (except
one) ifaddr_byindex() users were converted to use ifp->if_addr.

- Stop storing a (pointer to) Ethernet address in "struct arpcom",
and drop the IFP2ENADDR() macro; all users have been converted
to use IF_LLADDR() instead.
48c0bcb5c218c021c1da0e53702d2e83708a471a 08-Nov-2005 thompsa <thompsa@FreeBSD.org> Move the cloned interface list management in to if_clone. For some drivers the
softc lists and associated mutex are now unused so these have been removed.

Calling if_clone_detach() will now destroy all the cloned interfaces for the
driver and in most cases is all thats needed to unload.

Idea by: brooks
Reviewed by: brooks
97d261903e19a9ccc27633a4c5757894fc682cb5 07-Oct-2005 glebius <glebius@FreeBSD.org> Big overall MFC of polling(4) cleanup:

o First attempt on removing Giant from polling. Details:
http://lists.freebsd.org/pipermail/cvs-src/2005-September/051848.html
o Second attempt, and big polling cleanup including:
- Functinal approach to turning polling on/off
- Deprecating of poll_in_trap
- Removal of ifnet knowledge from kern_poll.c
Details:
http://lists.freebsd.org/pipermail/cvs-src/2005-October/053267.html
o Improved checking of user configurable sysctls. Details:
http://lists.freebsd.org/pipermail/cvs-src/2005-October/053351.html
o Moving DEVICE_POLLING from opt_global.h to opt_device_polling.h:
http://lists.freebsd.org/pipermail/cvs-src/2005-October/053479.html

o All related documentation fixes.

Approved by: re (kensmith)
Thanks to: everyone, who helped with testing
f41a83bf429b15386f43f43f3f5326d4ece7bfce 01-Oct-2005 glebius <glebius@FreeBSD.org> Big polling(4) cleanup.

o Axe poll in trap.

o Axe IFF_POLLING flag from if_flags.

o Rework revision 1.21 (Giant removal), in such a way that
poll_mtx is not dropped during call to polling handler.
This fixes problem with idle polling.

o Make registration and deregistration from polling in a
functional way, insted of next tick/interrupt.

o Obsolete kern.polling.enable. Polling is turned on/off
with ifconfig.

Detailed kern_poll.c changes:
- Remove polling handler flags, introduced in 1.21. The are not
needed now.
- Forget and do not check if_flags, if_capenable and if_drv_flags.
- Call all registered polling handlers unconditionally.
- Do not drop poll_mtx, when entering polling handlers.
- In ether_poll() NET_LOCK_GIANT prior to locking poll_mtx.
- In netisr_poll() axe the block, where polling code asks drivers
to unregister.
- In netisr_poll() and ether_poll() do polling always, if any
handlers are present.
- In ether_poll_[de]register() remove a lot of error hiding code. Assert
that arguments are correct, instead.
- In ether_poll_[de]register() use standard return values in case of
error or success.
- Introduce poll_switch() that is a sysctl handler for kern.polling.enable.
poll_switch() goes through interface list and enabled/disables polling.
A message that kern.polling.enable is deprecated is printed.

Detailed driver changes:
- On attach driver announces IFCAP_POLLING in if_capabilities, but
not in if_capenable.
- On detach driver calls ether_poll_deregister() if polling is enabled.
- In polling handler driver obtains its lock and checks IFF_DRV_RUNNING
flag. If there is no, then unlocks and returns.
- In ioctl handler driver checks for IFCAP_POLLING flag requested to
be set or cleared. Driver first calls ether_poll_[de]register(), then
obtains driver lock and [dis/en]ables interrupts.
- In interrupt handler driver checks IFCAP_POLLING flag in if_capenable.
If present, then returns.This is important to protect from spurious
interrupts.

Reviewed by: ru, sam, jhb
8f8fa61d9b908491aa25431ee2d79e509e637f4a 25-Aug-2005 rwatson <rwatson@FreeBSD.org> Merge if.c:1.242, if.h:1.97, if_var.h:1.102, rtsock.c:1.125 from HEAD
to RELENG_6:

Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz

Approved by: re (scottl)
c5a05437c6ae77ab3ccbf7f1215e660794cf42f2 24-Aug-2005 rwatson <rwatson@FreeBSD.org> Merge if_var.h:1.101 from HEAD to RELENG_6:

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>

Approved by: re (scottl)
487c3ba9fb098aab522853db4c99348ef9f88a6d 20-Aug-2005 rwatson <rwatson@FreeBSD.org> Merge if_var.h:1.99 from HEAD to RELENG_6:

Allocate one of the spare ifnet integer fields to hold if_drv_flags,
which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have
its access synchronized by the device driver rather than the
protocol stack. This will avoid potential races in the management
of flags in if_flags.

Discussed with: various (scottl, jhb, ...)

Approved by: re (kensmith)
57631d747449a0656dabcdeb9d949a7e4c278413 16-Aug-2005 rwatson <rwatson@FreeBSD.org> Merge if_var.h:1.100 from HEAD to RELENG_6:

Add if_addr_mtx to struct ifnet, a mutex to protect ifnet-related address
lists. Add accessor macros.

This changes the size of struct ifnet, but ideally, all ifnet consumers
are now using if_alloc() to allocate these structures rather than
embedding them into device driver softc's, so this won't modify the
network device driver ABI.

Approved by: re (hrs)
74759aaa78777146f23aa05c856f574efdfb41d9 09-Aug-2005 rwatson <rwatson@FreeBSD.org> Rename IFF_RUNNING to IFF_DRV_RUNNING, IFF_OACTIVE to IFF_DRV_OACTIVE,
and move both flags from ifnet.if_flags to ifnet.if_drv_flags, making
and documenting the locking of these flags the responsibility of the
device driver, not the network stack. The flags for these two fields
will be mutually exclusive so that they can be exposed to user space as
though they were stored in the same variable.

Provide #defines to provide the old names #ifndef _KERNEL, so that user
applications (such as ifconfig) can use the old flag names. Using the
old names in a device driver will result in a compile error in order to
help device driver writers adopt the new model.

When exposing the interface flags to user space, via interface ioctls
or routing sockets, or the two fields together. Since the driver flags
cannot currently be set for user space, no new logic is currently
required to handle this case.

Add some assertions that general purpose network stack routines, such
as if_setflags(), are not improperly used on driver-owned flags.

With this change, a large number of very minor network stack races are
closed, subject to correct device driver locking. Most were likely
never triggered.

Driver sweep to follow; many thanks to pjd and bz for the line-by-line
review they gave this patch.

Reviewed by: pjd, bz
MFC after: 7 days
127682bc8cc0193fbd197a841ceae23e224bfe24 02-Aug-2005 rwatson <rwatson@FreeBSD.org> Protect link layer network interface multicast address list manipulation
using ifp->if_addr_mtx:

- Initialize if_addr_mtx when ifnet is initialized.

- Destroy if_addr_mtx when ifnet is torn down.

- Rename ifmaof_ifpforaddr() to if_findmulti(); assert if_addr_mtx.
Staticize.

- Extract ifmultiaddr allocation and initialization into if_allocmulti();
accept a 'mflags' argument to indicate whether or not sleeping is
permitted. This centralizes error handling and address duplication.

- Extract ifmultiaddr tear-down and deallocation in if_freemulti().

- Re-structure if_addmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Make use of non-sleeping allocations. Annotate the fact that we only
generate routing socket events for explicit address addition, not
implicit link layer address addition.

- Re-structure if_delmulti() to hold if_addr_mtx around manipulation of
the ifnet multicast address list and reference count manipulation.
Annotate the lack of a routing socket event for implicit link layer
address removal.

- De-spl all and sundry.

Problem reported by: Ed Maste <emaste at phaedrus dot sandvine dot ca>
MFC after: 1 week
a3335c93b9a924f9255eabefe20044ffee3ea74d 02-Aug-2005 rwatson <rwatson@FreeBSD.org> Add if_addr_mtx to struct ifnet, a mutex to protect ifnet-related address
lists. Add accessor macros.

This changes the size of struct ifnet, but ideally, all ifnet consumers
are now using if_alloc() to allocate these structures rather than
embedding them into device driver softc's, so this won't modify the
network device driver ABI.

MFC after: 1 week
2a95094158e7b7d7e48a7f1a912f5d34d40026b0 21-Jul-2005 rwatson <rwatson@FreeBSD.org> Allocate one of the spare ifnet integer fields to hold if_drv_flags,
which in the future will hold IFF_OACTIVE and IFF_RUNNING, and have
its access synchronized by the device driver rather than the
protocol stack. This will avoid potential races in the management
of flags in if_flags.

Discussed with: various (scottl, jhb, ...)
MFC after: 1 week
567ba9b00a248431e7c1147c4e079fd7a11b9ecf 10-Jun-2005 brooks <brooks@FreeBSD.org> Stop embedding struct ifnet at the top of driver softcs. Instead the
struct ifnet or the layer 2 common structure it was embedded in have
been replaced with a struct ifnet pointer to be filled by a call to the
new function, if_alloc(). The layer 2 common structure is also allocated
via if_alloc() based on the interface type. It is hung off the new
struct ifnet member, if_l2com.

This change removes the size of these structures from the kernel ABI and
will allow us to better manage them as interfaces come and go.

Other changes of note:
- Struct arpcom is no longer referenced in normal interface code.
Instead the Ethernet address is accessed via the IFP2ENADDR() macro.
To enforce this ac_enaddr has been renamed to _ac_enaddr.
- The second argument to ether_ifattach is now always the mac address
from driver private storage rather than sometimes being ac_enaddr.

Reviewed by: sobomax, sam
6c5bdda300f45e4abacd6f3dbf4663bbfdfefa35 05-Jun-2005 thompsa <thompsa@FreeBSD.org> Add hooks into the networking layer to support if_bridge. This changes struct
ifnet so a buildworld is necessary.

Approved by: mlaier (mentor)
Obtained from: NetBSD
5a3d27ed916b9040827bc3a4fc4941a367cecb7a 25-May-2005 peadar <peadar@FreeBSD.org> Separate out address-detaching part of if_detach into if_purgeaddrs,
so if_tap doesn't need to rely on locally-rolled code to do same.

The observable symptom of if_tap's bzero'ing the address details
was a crash in "ifconfig tap0" after an if_tap device was closed.

Reported By: Matti Saarinen (mjsaarin at cc dot helsinki dot fi)
5f725a70e06f9948ad77ff806c8a2a993fdefec5 20-Apr-2005 glebius <glebius@FreeBSD.org> Do not call all link state callbacks directly, but schedule
a taskqueue(9) task. This fixes LORs and adds possibility
to serve such events pseudorecursively, when link state
change of interface causes subsequent change on other
interfaces.

Sponsored by: Rambler
Reviewed by: sam, brooks, mux
ea3bf9bbdddea0177c61e7e86335bf19e319dc7d 01-Mar-2005 glebius <glebius@FreeBSD.org> Revert change to struct ifnet. Use ifnet pointer in softc. Embedding
ifnet into smth will soon be removed.

Requested by: brooks
e553dfbef0f3fb12062d3d02c0380662d43abe11 26-Feb-2005 glebius <glebius@FreeBSD.org> Remove carp_softc.sc_ifp member in favor of union pointers in struct ifnet.

Obtained from: OpenBSD
e1d22638d0a8257ed01b7f95d1b6d5cef74ebd07 22-Feb-2005 glebius <glebius@FreeBSD.org> Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)
a50ffc29129a52835a39bf4868cd5facdc7dce30 07-Jan-2005 imp <imp@FreeBSD.org> /* -> /*- for license, minor formatting changes
2c929f635e2ee240aed007e68c7125eeb4f0426f 08-Dec-2004 sam <sam@FreeBSD.org> Cleanup link state change notification:
o add new if_link_state_change routine that deals with link state changes
o change mii to use if_link_state_change
b188666781eb6b720e2adf60f48518a8cc9fd34c 09-Nov-2004 mlaier <mlaier@FreeBSD.org> Remove the #if 0 wrapping around !ALTQ stuff that can't be used due to ABI
stability anyway.
f71b496ed7e5680b0f0e4a9da173613f1e0ab32c 30-Oct-2004 rwatson <rwatson@FreeBSD.org> Move if_handoff() from an inline in if_var.h to a function to if.c
in orden to harden the ABI for 5.x; this will permit us to modify
the locking in the ifnet packet dispatch without requiring drivers
to be recompiled.

MFC after: 3 days
Discussed at: EuroBSDCon Developer's Summit
a9f55430f9865dec07d8070795006e65ce506868 30-Oct-2004 rwatson <rwatson@FreeBSD.org> Add additional "spare" fields to 'struct ifnet' in order to improve
the resistance of the network driver ABI to changes that will be
required as we optimize locking.

MFC after: 3 days
Discussed at: Developer Summit
6cd4381f71c46ead3be7a5abd3665469815fdb28 25-Oct-2004 jmg <jmg@FreeBSD.org> use NULL instead of 0 when casting/comparing w/ a pointer...
2496b0e6308e3546b3700f250b41eef9319b8715 19-Oct-2004 rwatson <rwatson@FreeBSD.org> Define IFF_LOCKGIANT() and IFF_UNLOCKGIANT() macros, which conditionally
acquire Giant if the passed interface has IFF_NEEDSGIANT set on it.
Modify calls into (ifp)->if_ioctl() in if.c to use these macros in order
to ensure that Giant is held.

MFC after: 3 days
Bumped into by: jmg
bc1805c6e871c178d0b6516c3baa774ffd77224a 15-Aug-2004 jmg <jmg@FreeBSD.org> Add locking to the kqueue subsystem. This also makes the kqueue subsystem
a more complete subsystem, and removes the knowlege of how things are
implemented from the drivers. Include locking around filter ops, so a
module like aio will know when not to be unloaded if there are outstanding
knotes using it's filter ops.

Currently, it uses the MTX_DUPOK even though it is not always safe to
aquire duplicate locks. Witness currently doesn't support the ability
to discover if a dup lock is ok (in some cases).

Reviewed by: green, rwatson (both earlier versions)
00ecbb6a923cd0df68d4335aa53ffe483b495c39 07-Aug-2004 mlaier <mlaier@FreeBSD.org> Add a "void *if_carp" placeholder to struct ifnet with prospect to bring in
the "Common address redundancy protocol" (CARP) during the 5-STABLE cycle.
Hence doing the ABI break now.

Approved by: re (scottl)
b463bc6c336f88c5c53b54a13c72ffd11be29e4e 27-Jul-2004 rwatson <rwatson@FreeBSD.org> Add a new network interface flag, IFF_NEEDSGIANT, which will allow
device drivers to declare that the ifp->if_start() method implemented
by the driver requires Giant in order to operate correctly.

Add a 'struct task' to 'struct ifnet' that can be used to execute a
deferred ifp->if_start() in the event that if_start needs to be called
in a Giant-free environment. To do this, introduce if_start(), a
wrapper function for ifp->if_start(). If the interface can run MPSAFE,
it directly dispatches into the interface start routine. If it can't
run MPSAFE, we're running with debug.mpsafenet != 0, and Giant isn't
currently held, the task is queued to execute in a swi holding Giant
via if_start_deferred().

Modify if_handoff() to use if_start() instead of direct dispatch.
Modify 802.11 to use if_start() instead of direct dispatch.

This is intended to provide increased compatibility for non-MPSAFE
network device drivers in the presence of Giant-free operation via
asynchronous dispatch. However, this commit does not mark any network
interfaces as IFF_NEEDSGIANT.
d42002971fd9756c47034ab48dba6cd4b93e7651 14-Jul-2004 mlaier <mlaier@FreeBSD.org> Fix a copy-and-paste-o in IFQ_DRV_PREPEND - all pointyhats to me.
While here also fix a (not less stupid) braino in IFQ_DRV_PURGE.

Reported-by: clement
Tested-by: clement (_PREPEND in sis(4))
e1dd867b5532da103ae1459a89ca3df2b8b6f0f6 22-Jun-2004 brooks <brooks@FreeBSD.org> Major overhaul of pseudo-interface cloning. Highlights include:

- Split the code out into if_clone.[ch].
- Locked struct if_clone. [1]
- Add a per-cloner match function rather then simply matching names of
the form <name><unit> and <name>.
- Use the match function to allow creation of <interface>.<tag>
vlan interfaces. The old way is preserved unchanged!
- Also the match function to allow creation of stf(4) interfaces named
stf0, stf, or 6to4. This is the only major user visible change in
that "ifconfig stf" creates the interface stf rather then stf0 and
does not print "stf0" to stdout.
- Allow destroy functions to fail so they can refuse to delete
interfaces. Currently, we forbid the deletion of interfaces which
were created in the init function, particularly lo0, pflog0, and
pfsync0. In the case of lo0 this was a panic implementation so it
does not count as a user visiable change. :-)
- Since most interfaces do not need the new functionality, an family of
wrapper functions, ifc_simple_*(), were created to wrap old style
cloner functions.
- The IF_CLONE_INITIALIZER macro is replaced with a new incompatible
IFC_CLONE_INITIALIZER and ifc_simple consumers use IFC_SIMPLE_DECLARE
instead.

Submitted by: Maurycy Pawlowski-Wieronski <maurycy at fouk.org> [1]
Reviewed by: andre, mlaier
Discussed on: net
dfd1f7fd50fffaf75541921fcf86454cd8eb3614 16-Jun-2004 phk <phk@FreeBSD.org> Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.
586342bb6a2e371cab9eccaa57393f6aecbe3c3c 15-Jun-2004 mlaier <mlaier@FreeBSD.org> Fix a typeo in IFQ_HANDOFF.
de92edb6b40acca5f92bbd5dd5fc05f52923f0cd 15-Jun-2004 mlaier <mlaier@FreeBSD.org> Transform tbr_dequeue into a function pointer in order to build drivers with
ALTQ enabled versions of IFQ_* macros by default, as requested by serveral
others. This is a follow-up to the quick fix I committed yesterday which
turned off the ALTQ checks for non-ALTQ kernels.
131fb63c625cc6708ab771b6e0975a3b44e908c1 14-Jun-2004 mlaier <mlaier@FreeBSD.org> Unbreak non-ALTQ kernel linking. I forgot about tbr_dequeue.

In the end drivers should be building with ALTQ checks by default, but for
now build them with the old macros for non-ALTQ kernels.

Note: Check new features w/ LINT *and* w/ LINT minus the new feature.

Found-by: rwatson
977d97b004a1ae5bbd9d42eae28386f8e2372068 13-Jun-2004 mlaier <mlaier@FreeBSD.org> Link ALTQ to the build and break with ABI for struct ifnet. Please recompile
your (network) modules as well as any userland that might make sense of
sizeof(struct ifnet).
This does not change the queueing yet. These changes will follow in a
seperate commit. Same with the driver changes, which need case by case
evaluation.

__FreeBSD_version bump will follow.

Tested-by: (i386)LINT
25ae331e1251bd9562a10f53fdb2f054f7b21dfe 03-May-2004 andre <andre@FreeBSD.org> Link state change notification of ethernet media to the routing socket.

o Extend the if_data structure with an ifi_link_state field and
provide the corresponding defines for the valid states.

o The mii_linkchg() callback updates the ifi_link_state field
and calls rt_ifmsg() to notify listeners on the routing socket
in addition to the kqueue KNOTE.

o If vlans are configured on a physical interface notify and update
all vlan pseudo devices as well with the vlan_link_state() callback.

No objections by: sam, wpaul, ru, bms
Brucification by: bde
22ff34b571813d64c42e2ad042f2972a997d0b0d 18-Apr-2004 mlaier <mlaier@FreeBSD.org> Make if_(un)route static in if.c as they are called from if_up/if_down only.
This is also cleanup to make locking easier.

Reviewed by: luigi
Approved by: bms(mentor)
9cffdfc5cab4dbd32d0b1fe590e1d003682a5359 18-Apr-2004 luigi <luigi@FreeBSD.org> + rename and document an unused field in struct arpcom (field is still
there so there are no ABI changes);
+ replace 5 redefinitions of the IPF2AC macro with one in if_arp.h

Eventually (but before freezing the ABI) we need to get rid of
struct arpcom (initially with the help of some smart #defines
to avoid having to touch each and every driver, see below).

Apart from the struct ifnet, struct arpcom now only stores a copy
of the MAC address (ac_enaddr, but we already have another copy in
the struct ifnet -- if_addrhead), and a netgraph-specific field
which is _always_ accessed through the ifp, so it might well go
into the struct ifnet too (where, besides, there is already an entry
for AF_NETGRAPH data...)

Too bad ac_enaddr is widely referenced by all drivers. But
this can be fixed as follows:

#define ac_enaddr ac_if.the_original_ac_enaddr_in_struct_ifnet

(note that the right hand side would likely be a pointer rather than
the base address of an array.)
ea6500e14f579d261f82752639bdf3e577fc5c30 16-Apr-2004 luigi <luigi@FreeBSD.org> Documented the intended usage of if_addrhead and ifaddr_byindex()
This commit only changes comments. Nothing to recompile.
fa9222585e2ae94c9da702e4b6a4fd959416bd33 15-Apr-2004 luigi <luigi@FreeBSD.org> Document the way if_addrhead and struct ifaddr are used.
Remove a member from 'struct ifaddr' which has been in an
#ifdef notdef block since rev 1.1

No ABI changes -- no need to recompile anything.
14233a5a33b48cba3a578f53a32ee727e8f622a4 12-Apr-2004 ru <ru@FreeBSD.org> Count outgoing link-level broadcast packets in if_omcasts.
I'm not sure this is completely correct but at least this
is consistent with the accounting of incoming broadcasts.

PR: kern/65273
Submitted by: David J Duchscher <daved@tamu.edu>
f8b24f78ad8e651688702e1a60029ec67bf54396 11-Apr-2004 rwatson <rwatson@FreeBSD.org> In 4.x, if_ipending is used to track network interrupt state. In 5.x,
it is no longer used, so GC the ifnet.if_ipending field.
b49b7fe7994689a25dfc2162fe02f1d030360089 07-Apr-2004 imp <imp@FreeBSD.org> Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
c54de1f76f2070447f83f4b5cf8c2d046a55f032 04-Apr-2004 luigi <luigi@FreeBSD.org> + arpresolve(): remove an unused argument
+ struct ifnet: remove unused fields, move ipv6-related field close
to each other, add a pointer to l3<->l2 translation tables (arp,nd6,
etc.) for future use.

+ struct route: remove an unused field, move close to each
other some fields that might likely go away in the future
cb7aea29b8d913b12052d75a14a11b0ac819476c 13-Mar-2004 brooks <brooks@FreeBSD.org> Remove if_withname. It came in with the KAME import, but never got
used. Should someone need its functionality, it's a really expensive
implementation of:
ifnet_byindex(sdl->sdl_index)

Reviewed by: bde, ume
d937176b3481cd970c24fba3eddbe098a1fe564f 26-Feb-2004 mlaier <mlaier@FreeBSD.org> Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
072feb7e76a2c3f2fa0605e3f50bddf760b5aa8a 07-Dec-2003 imp <imp@FreeBSD.org> Make the if_broadcastaddr const. All the drivers in the tree which
violated the constness were corrected before the freeze. This was
suggested by mdodd@, I think, and sam@ and others have signed off on
this if I recall my conversations with them correctly.
77ed6e2d1cbbf9a46dd5ae6d089eeb45ab81fbcb 12-Nov-2003 rwatson <rwatson@FreeBSD.org> Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories
f1e94c6f29b079e4ad9d9305ef3e90a719bcbbda 31-Oct-2003 brooks <brooks@FreeBSD.org> Replace the if_name and if_unit members of struct ifnet with new members
if_xname, if_dname, and if_dunit. if_xname is the name of the interface
and if_dname/unit are the driver name and instance.

This change paves the way for interface renaming and enhanced pseudo
device creation and configuration symantics.

Approved By: re (in principle)
Reviewed By: njl, imp
Tested On: i386, amd64, sparc64
Obtained From: NetBSD (if_xname)
eae980f878292d49d725e3ef2b1b3b15c2914013 24-Oct-2003 ume <ume@FreeBSD.org> Since dp->dom_ifattach calls malloc() with M_WAITOK, we cannot
use mutex lock directly here. Protect ifp->if_afdata instead.

Reported by: grehan
babf2c3ec01f429fc11fe95261ac8db6488c3788 17-Oct-2003 ume <ume@FreeBSD.org> - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME
d3367c5f5d3ddcc6824d8f41c4cf179f9a5588f8 01-Jan-2003 schweikh <schweikh@FreeBSD.org> Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,
especially in troff files.
86f7487fb6a0b8dd9e3a699ad48d6e99504a67ff 30-Dec-2002 schweikh <schweikh@FreeBSD.org> Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/
Add FreeBSD Id tag where missing.
da4baeac50574ca93ca24b64d5a9978e22c7c7d0 27-Dec-2002 hsu <hsu@FreeBSD.org> Long chain of calls starting with bridge_on(), going through IPv6, and
ending up at ifa_ifwithdstaddr() could lead to a recursive lock of
the ifnet list mutex.
82e1e3bab0d3abe1018a0b56559c154485f2f676 22-Dec-2002 hsu <hsu@FreeBSD.org> SMP locking for ifnet list.
160081aef68b0dc49b79d3b7702b75671d9c06ff 18-Dec-2002 hsu <hsu@FreeBSD.org> Switch to the conventional reference counting scheme.
c3153934cb24d911042c92eedf9e5dd6d7be07e1 18-Dec-2002 hsu <hsu@FreeBSD.org> Lock up ifaddr reference counts.
f868f190bf1238315b629c408ba562297d32c04a 14-Nov-2002 sam <sam@FreeBSD.org> o add if_nvlans member to track the number of vlans active on an interface
o add if_input member for interface drivers to call through to pass packets "up"
o remove ethernet-specific function decls (moved to ethernet.h)

Reviewed by: many
Approved by: re
1d25e6987dc3ab0f981ceb350eb3bf943ce786b8 29-Sep-2002 bde <bde@FreeBSD.org> Fixed some of the namespace pollution in rev.1.33. <sys/systm.h> was
included here because it was once a prerequisite of <sys/mutex.h>
although that bug was fixed long ago.
d039f38d0d117aa7a539cbc6c6b1fb592050eb12 24-Sep-2002 brooks <brooks@FreeBSD.org> Add a new helper function if_printf() modeled on device_printf(). The
function takes a struct ifnet pointer followed by the usual printf
arguments and prints "<interfacename>: " before the results of printf.
Since this is the primary form of printf calls in network device drivers
and accounts for most uses of the ifnet menber if_unit, this
significantly simplifies many printf()s.
f6cebc060671b6c67f52080c35a0e55d5498cbf0 18-Aug-2002 sobomax <sobomax@FreeBSD.org> Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid
breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in
SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.

Reviewed by: -hackers, -net
f183894893c95323406ff434337f2d2ca502be5b 14-Aug-2002 rwatson <rwatson@FreeBSD.org> Move to nested include of _label.h instead of mac.h, reducing namespace
pollution.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Suggested by: bde
f476cee6025c9a56bad1af5eefc46de8cae24547 30-Jul-2002 rwatson <rwatson@FreeBSD.org> Introduce support for Mandatory Access Control and extensible
kernel access control.

Label network interface structures, permitting security features to
be maintained on those objects. if_label will be used to authorize
data flow using the network interface. if_label will be protected
using the same synchronization primitives as other mutable entries
in struct ifnet.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
186864a31c71ad217e013a85742d2b83f95f278b 07-May-2002 imp <imp@FreeBSD.org> Minor style nit
c9985516e46bc6cccc11eac067da81d7968b7700 19-Mar-2002 alfred <alfred@FreeBSD.org> Remove __P.
0a6314db1d60a180eaf619a8da115c45587cb4ae 14-Dec-2001 jlemon <jlemon@FreeBSD.org> whitespace fixes.
f8ad22919e217e5aa0f3f7a246fc37aaee182364 14-Dec-2001 luigi <luigi@FreeBSD.org> Device Polling code for -current.

Non-SMP, i386-only, no polling in the idle loop at the moment.

To use this code you must compile a kernel with

options DEVICE_POLLING

and at runtime enable polling with

sysctl kern.polling.enable=1

The percentage of CPU reserved to userland can be set with

sysctl kern.polling.user_frac=NN (default is 50)

while the remainder is used by polling device drivers and netisr's.
These are the only two variables that you should need to touch. There
are a few more parameters in kern.polling but the default values
are adequate for all purposes. See the code in kern_poll.c for
more details on them.

Polling in the idle loop will be implemented shortly by introducing
a kernel thread which does the job. Until then, the amount of CPU
dedicated to polling will never exceed (100-user_frac).
The equivalent (actually, better) code for -stable is at

http://info.iet.unipi.it/~luigi/polling/

and also supports polling in the idle loop.

NOTE to Alpha developers:
There is really nothing in this code that is i386-specific.
If you move the 2 lines supporting the new option from
sys/conf/{files,options}.i386 to sys/conf/{files,options} I am
pretty sure that this should work on the Alpha as well, just that
I do not have a suitable test box to try it. If someone feels like
trying it, I would appreciate it.

NOTE to other developers:
sure some things could be done better, and as always I am open to
constructive criticism, which a few of you have already given and
I greatly appreciated.
However, before proposing radical architectural changes, please
take some time to possibly try out this code, or at the very least
read the comments in kern_poll.c, especially re. the reason why I
am using a soft netisr and cannot (I believe) replace it with a
simple timeout.

Quick description of files touched by this commit:

sys/conf/files.i386
new file kern/kern_poll.c
sys/conf/options.i386
new option
sys/i386/i386/trap.c
poll in trap (disabled by default)
sys/kern/kern_clock.c
initialization and hardclock hooks.
sys/kern/kern_intr.c
minor swi_net changes
sys/kern/kern_poll.c
the bulk of the code.
sys/net/if.h
new flag
sys/net/if_var.h
declaration for functions used in device drivers.
sys/net/netisr.h
NETISR_POLL
sys/dev/fxp/if_fxp.c
sys/dev/fxp/if_fxpvar.h
sys/pci/if_dc.c
sys/pci/if_dcreg.h
sys/pci/if_sis.c
sys/pci/if_sisreg.h
device driver modifications
f5781681dfcb479518be4dd0ef3747bdcb2a27fa 22-Nov-2001 luigi <luigi@FreeBSD.org> Expand the comment on the layout of softc, arpcom and ifnet structures,
and list the places where the assumption is used.
866e8e774b05729a229b02750f88c34a58caac8e 14-Nov-2001 jhb <jhb@FreeBSD.org> Remove ifnet.if_mpsafe for now. If this is needed, it won't be needed
until much later when the network stack locking is farther along.

Approved by: jlemon
ecb4d3d05f89eabc8020bb6563d903164e3002a1 17-Oct-2001 ru <ru@FreeBSD.org> Pull post-4.4BSD change to sys/net/route.c from BSD/OS 4.2.

Have sys/net/route.c:rtrequest1(), which takes ``rt_addrinfo *''
as the argument. Pass rt_addrinfo all the way down to rtrequest1
and ifa->ifa_rtrequest. 3rd argument of ifa->ifa_rtrequest is now
``rt_addrinfo *'' instead of ``sockaddr *'' (almost noone is
using it anyways).

Benefit: the following command now works. Previously we needed
two route(8) invocations, "add" then "change".
# route add -inet6 default ::1 -ifp gif0

Remove unsafe typecast in rtrequest(), from ``rtentry *'' to
``sockaddr *''. It was introduced by 4.3BSD-Reno and never
corrected.

Obtained from: BSD/OS, NetBSD
MFC after: 1 month
PR: kern/28360
8ef8a1b13f436d2f49044e9cab13f39e36047b92 14-Oct-2001 fjoe <fjoe@FreeBSD.org> bring in ARP support for variable length link level addresses

Reviewed by: jdp
Approved by: jdp
Obtained from: NetBSD
MFC after: 6 weeks
531fdd5ce2b53bc6d651e40ea25ece37c4abbe42 02-Oct-2001 mjacob <mjacob@FreeBSD.org> Documentation comment: note that the each NIC's softc is assumed to start
with an ifnet structure.

MFC after: 1 week
131e3ad4ce7736c260d62c4f5f2ff41a46dc7de0 18-Sep-2001 jlemon <jlemon@FreeBSD.org> Add two fields to the ifnet structure indicating what extra capabilities
a network device has, and which ones are enabled.
5596676e6c6c1e81e899cd0531f9b1c28a292669 12-Sep-2001 julian <julian@FreeBSD.org> KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha
f729fe0a4a07f77cf2a60a88614a01b6bd649256 06-Sep-2001 jlemon <jlemon@FreeBSD.org> Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.
5da97d80e2d7042b9d86959519aca3d58066ca21 02-Jul-2001 brooks <brooks@FreeBSD.org> Add kernel infrastructure for network device cloning.

Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week
b47bfbe544d34ff21bc24b57c556621eb2355e45 28-Mar-2001 jhb <jhb@FreeBSD.org> Catch up to header include changes:
- <sys/mutex.h> now requires <sys/systm.h>
- <sys/mutex.h> and <sys/sx.h> now require <sys/lock.h>
f364d4ac3621ae2689a3cc1b82c73eb491475a24 09-Feb-2001 bmilekic <bmilekic@FreeBSD.org> Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)
d214ae21714540c7d954bd861ed084fb157abeb9 06-Feb-2001 phk <phk@FreeBSD.org> Convert if_multiaddrs from LIST to TAILQ so that it can be traversed
backwards in the three drivers which want to do that.

Reviewed by: mikeh
20862096d0755befb9dccbb3c923851266bd3598 29-Jan-2001 peter <peter@FreeBSD.org> Make the number of loopback interfaces dynamically tunable. Why one
would *want* to is a different story, but it used to be able to be done
statically. Get rid of #include "loop.h" and struct ifnet loif[NLOOP];
This could be used as an example of how to do this in other drivers,
for example: ccd.
55440769df15d5d23396bda182c424daec15df24 26-Nov-2000 jlemon <jlemon@FreeBSD.org> Unbreak world; #include <sys/mutex.h> instead of <machine/mutex.h>
Only include <sys/mbuf.h> when building kernel sources. This should
probably be changed to require callers to include it themselves.
954e1d2ccdb661d5c8b7f69340d118fa7ba7fb85 25-Nov-2000 jlemon <jlemon@FreeBSD.org> Lock down the network interface queues. The queue mutex must be obtained
before adding/removing packets from the queue. Also, the if_obytes and
if_omcasts fields should only be manipulated under protection of the mutex.

IF_ENQUEUE, IF_PREPEND, and IF_DEQUEUE perform all necessary locking on
the queue. An IF_LOCK macro is provided, as well as the old (mutex-less)
versions of the macros in the form _IF_ENQUEUE, _IF_QFULL, for code which
needs them, but their use is discouraged.

Two new macros are introduced: IF_DRAIN() to drain a queue, and IF_HANDOFF,
which takes care of locking/enqueue, and also statistics updating/start
if necessary.
c6d8349444659c4a281b31a709377e8c71a90ad9 19-Oct-2000 joe <joe@FreeBSD.org> Augment the 'ifaddr' structure with a 'struct if_data' to keep
statistics on a per network address basis.

Teach the IPv4 and IPv6 input/output routines to log packets/bytes
against the network address connected to the flow.

Teach netstat to display the per-address stats for IP protocols
when 'netstat -i' is evoked, instead of displaying the per-interface
stats.
63e7d24009d010dea2cd23b7a1b4d2ebd2a1aadd 15-Aug-2000 archie <archie@FreeBSD.org> Export the functionality of SIOCSIFLLADDR with if_setlladdr()
and add some more rigorous sanity checking in the process.

Reviewed by: freebsd-net
7357df6b4854f9914c605ad7c7cf3c01ea7700fd 13-Jul-2000 archie <archie@FreeBSD.org> Make all Ethernet drivers attach using ether_ifattach() and detach using
ether_ifdetach().

The former consolidates the operations of if_attach(), ng_ether_attach(),
and bpfattach(). The latter consolidates the corresponding detach operations.

Reviewed by: julian, freebsd-net
f22c6e6da07f7a579a5cc1736bb33635fc191f16 29-Jun-2000 archie <archie@FreeBSD.org> Fix kernel build breakage when 'device ether' was not included.
9af816e94656f88812336df4be5d2ddac233080f 26-Jun-2000 archie <archie@FreeBSD.org> Make the ng_ether(4) node type dynamically loadable like the rest.
This means 'options NETGRAPH' is no longer necessary in order to get
netgraph-enabled Ethernet interfaces. This supports loading/unloading
the ng_ether.ko and attaching/detaching the Ethernet interface in any
order.

Add two new hooks 'upper' and 'lower' to allow access to the protocol
demux engine and the raw device, respectively. This enables bridging
to be defined as a netgraph node, if so desired.

Reviewed by: freebsd-net@freebsd.org
961b97d43458f3c57241940cabebb3bedf7e4c00 26-May-2000 jake <jake@FreeBSD.org> Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others
4ba625d0ce34b70f741bf77f6ca791a411865fc2 24-May-2000 archie <archie@FreeBSD.org> Just need to pass the address family to if_simloop(), not the whole sockaddr.
d93fbc99166053b75c2eeb69b5cb603cfaf79ec0 23-May-2000 jake <jake@FreeBSD.org> Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd
b42951578188c5aab5c9f8cbcde4a743f8092cdc 02-Apr-2000 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'ALSA'.
0dcc5bc0d1cca22e0204f9b9da39474b95100992 27-Mar-2000 jlemon <jlemon@FreeBSD.org> Add support for offloading IP/TCP/UDP checksums to NIC hardware which
supports them.
15b9bcb121e1f3735a2c98a11afdb52a03301d7e 29-Dec-1999 peter <peter@FreeBSD.org> Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
cad2014b2749528351ec5180e88a5929efebbfc4 22-Nov-1999 shin <shin@FreeBSD.org> KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
7efc91cadcfeb421fc4d02ba94db784616f3714c 05-Nov-1999 shin <shin@FreeBSD.org> KAME related header files additions and merges.
(only those which don't affect c source files so much)

Reviewed by: cvs-committers
Obtained from: KAME project
3b842d34e82312a8004a7ecd65ccdb837ef72ac1 28-Aug-1999 peter <peter@FreeBSD.org> $Id$ -> $FreeBSD$
29c67703e3751c283a1bdfe7764effe015c13b83 06-Aug-1999 brian <brian@FreeBSD.org> Define IF_MAXMTU and IF_MINMTU and don't allow MTUs with
out-of-range values.

``comparison is always 0'' warnings are silly !

Ok'd by: wollman, dg
Advised by: bde
695feac96a4d019c8514170c1313ed9bfe9bbefb 16-May-1999 pb <pb@FreeBSD.org> PR: kern/10570
Submitted by: adrian@freebsd.org

Change reference count in struct ifaddr to a u_int, to be able
to handle more than 2^16 routes to the same interface.

Fix suggested by Andrew Bangs <andrewb@demon.net> in PR kern/10570.
Tested by <adrian@freebsd.org> and me under -current.
73556bfee1b1d6dfc2a2f5d400228ca90bb34fc9 06-May-1999 peter <peter@FreeBSD.org> Add sufficient braces to keep egcs happy about potentially ambiguous
if/else nesting.
087d4857e56f150a8f549600150404f273efb895 16-Apr-1999 peter <peter@FreeBSD.org> Bring the 'new-bus' to the i386. This extensively changes the way the
i386 platform boots, it is no longer ISA-centric, and is fully dynamic.
Most old drivers compile and run without modification via 'compatability
shims' to enable a smoother transition. eisa, isapnp and pccard* are
not yet using the new resource manager. Once fully converted, all drivers
will be loadable, including PCI and ISA.

(Some other changes appear to have snuck in, including a port of Soren's
ATA driver to the Alpha. Soren, back this out if you need to.)

This is a checkpoint of work-in-progress, but is quite functional.

The bulk of the work was done over the last few years by Doug Rabson and
Garrett Wollman.

Approved by: core
38464a3bbc7ca1a7686cd5fd5f56c8def4aa5ed8 16-Dec-1998 phk <phk@FreeBSD.org> Generalize the if_up() and if_down() functions under the names
if_route() and if_unroute().

This is first step towards sanitizing IFF_UP and IFF_RUNNING
1ee51dd89f8fa0c587541ec9a5d0e0a307a1319e 12-Jun-1998 julian <julian@FreeBSD.org> Go through the loopback code with a broom..
Remove lots'o'hacks.
looutput is now static.

Other callers who want to use loopback to allow shortcutting
should call the special entrypoint for this, if_simloop(), which is
specifically designed for this purpose. Using looutput for this purpose
was problematic, particularly with bpf and trying to keep track
of whether one should be using the charateristics of the loopback interface
or the interface (e.g. if_ethersubr.c) that was requesting the loopback.
There was a whole class of errors due to this mis-use each of which had
hacks to cover them up.

Consists largly of hack removal :-)
1d5f38ac2264102518a09c66a7b285f57e81e67e 07-Jun-1998 dfr <dfr@FreeBSD.org> This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
b598f559b2947bb9582b53221185bb27d86cd68f 15-Apr-1998 bde <bde@FreeBSD.org> Support compiling with `gcc -ansi'.
0506343883d62f6649f7bbaf1a436133cef6261d 11-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'jb'.
7c6e96080c4fb49bf912942804477d202a53396c 10-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'JB'.
6c90e3528c75ea68f2d3c64aa6029c46fa9a701e 28-Aug-1997 julian <julian@FreeBSD.org> Add a per-interface-address pointer to a function that can be supplied
by a protocol, to detirmine if an address matches the net this address
is part of. This is needed by protocols for which netmasks
"just don't work", for example appletalk.

Also add the code in appletalk to make use of this new feature.
Thsi fixes one of the longest standing bugs in appletalk.
The inability to talk to machines to which the path is via a router
which is on a different net, but the same netrange, as your interface.
Protocols that do not supply this function (e.g. IP) should not be affected.
94b6d727947e1242356988da003ea702d41a97de 22-Feb-1997 peter <peter@FreeBSD.org> Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
808a36ef658c1810327b5d329469bcf5dad24b28 14-Jan-1997 jkh <jkh@FreeBSD.org> Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
18b2a70a7d49403de96eece39647adc7bede4c58 08-Jan-1997 wollman <wollman@FreeBSD.org> Fix a few oversights in the new multicast membership interface.
46b285b874270033b54c235b591d75d3042191ab 07-Jan-1997 wollman <wollman@FreeBSD.org> Checkpoint the beginnings of the new kernel interface for
multicast group memberships. This is not actually operative
at the moment (a lot of other code still needs to be changed), but
this seemed like a useful reference point to check in so that
others (i.e. Bill Fenner) have fair warning of where we are going.
4d20b5bdb1ab98e8e30b49f4823ca47016929850 03-Jan-1997 wollman <wollman@FreeBSD.org> Separate kernel-internal data structures from exposed user interface
to interfaces. (Amazing nobody had done this!)

More commits to fix up user-land to follow.