History log of /freebsd-head/sys/netinet6/in6.c
Revision Date Author Comments
5057bc60d0d2e5cc09b6426236f76c1f8714a7ec 15-Jan-2020 glebius <glebius@FreeBSD.org> Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
data based on the network epoch. The macro reverses the argument
order of epoch_call(9) - first function, then its argument. NFC
1c6615a32e65c143b0e2f07e85e1154433a01175 07-Jan-2020 melifaro <melifaro@FreeBSD.org> Fix rtsock route message generation for interface addresses.

Reviewed by: olivier
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D22974
fb644c472d058bbcea6b741e0f6768e873de3b55 19-Nov-2019 bz <bz@FreeBSD.org> in6: move include

Move the include for sysctl.h out of the middle of the file to the
includes at the beginning. This is will make it easier to add new
sysctls.

No functional changes.

MFC after: 3 weeks
Sponsored by: Netflix
d6d2b9fe519ce0e79e1d4d0c1b54acacef709949 14-Oct-2019 glebius <glebius@FreeBSD.org> in6ifa_llaonifp() is never called from fast path, so do not require
epoch being entered.
39cdd43bf3ef0d0b889a41294d83d96af631156b 13-Oct-2019 glebius <glebius@FreeBSD.org> Don't cover in6_ifattach() with network epoch, as it may call into
network drivers ioctls, that may sleep.

PR: 241223
337378e04f710de54c27deb773c544bf0eeda73a 07-Oct-2019 glebius <glebius@FreeBSD.org> Widen NET_EPOCH coverage.

When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.

However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.

Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.

On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().

This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.

Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.

This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.

Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
f0215c10689bd124d0d38c717c898918a377c3fe 30-Mar-2019 markj <markj@FreeBSD.org> Do not perform DAD on stf(4) interfaces.

stf(4) interfaces are not multicast-capable so they can't perform DAD.
They also did not set IFF_DRV_RUNNING when an address was assigned, so
the logic in nd6_timer() would periodically flag such an address as
tentative, resulting in interface flapping.

Fix the problem by setting IFF_DRV_RUNNING when an address is assigned,
and do some related cleanup:
- In in6if_do_dad(), remove a redundant check for !UP || !RUNNING.
There is only one caller in the tree, and it only looks at whether
the return value is non-zero.
- Have in6if_do_dad() return false if the interface is not
multicast-capable.
- Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface
and the interface goes UP as a result. Note that this is not
sufficient to fix the problem because the new address is marked as
tentative and DAD is started before in6_ifattach() is called.
However, setting no_dad is formally correct.
- Change nd6_timer() to not flag addresses as tentative if no_dad is
set.

This is based on a patch from Viktor Dukhovni.

Reported by: Viktor Dukhovni <ietf-dane@dukhovni.org>
Reviewed by: ae
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19751
acda75d4430c1e117f320ec1c3fe5802f763c1ea 23-Jan-2019 markj <markj@FreeBSD.org> Style.

Reviewed by: bz
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
af69719726578b14e12f34bc4e2636254f038c25 23-Jan-2019 markj <markj@FreeBSD.org> Fix an LLE lookup race.

After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE. The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer. By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.

Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock. If it's not set, the lookup fails.

PR: 234296
Reviewed by: bz
Tested by: sbruno, Victor <chernov_victor@list.ru>,
Mike Andrews <mandrews@bit0.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18906
6d8cc191f953b3680c5e5911afc66b7c1f8e6c4b 09-Jan-2019 glebius <glebius@FreeBSD.org> Mechanical cleanup of epoch(9) usage in network stack.

- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.

- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.

This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.

Discussed with: jtl, gallatin
8d3e25d418ef68d6a15c5e94e25d40be89dadbbe 21-Oct-2018 ae <ae@FreeBSD.org> Add ifaddr_event_ext event. It is similar to ifaddr_event, but the
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.

MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100
d94c744a4088246e9021ab4e03a8be63ceb0a050 17-Jul-2018 ae <ae@FreeBSD.org> Move invoking of callout_stop(&lle->lle_timer) into llentry_free().

This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.

PR: 209682, 225927
Submitted by: hselasky (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4605
c937b516d8b592949105032489fe94e3b7589f23 24-May-2018 mmacy <mmacy@FreeBSD.org> CK: update consumers to use CK macros across the board

r334189 changed the fields to have names distinct from those in queue.h
in order to expose the oversights as compile time errors
ecd6e9d3074a783a743434c51dbfd16571c55fa2 23-May-2018 mmacy <mmacy@FreeBSD.org> UDP: further performance improvements on tx

Cumulative throughput while running 64
netperf -H $DUT -t UDP_STREAM -- -m 1
on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps

Single stream throughput increases from 910kpps to 1.18Mpps

Baseline:
https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg

- Protect read access to global ifnet list with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg

- Protect short lived ifaddr references with epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg

- Convert if_afdata read lock path to epoch
https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg

A fix for the inpcbhash contention is pending sufficient time
on a canary at LLNW.

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15409
7aeac9ef1893e0b29408213e3a320d9d1ef28357 18-May-2018 mmacy <mmacy@FreeBSD.org> ifnet: Replace if_addr_lock rwlock with epoch + mutex

Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32
4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32
4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32
4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32
4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32
4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32
4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32

After the patch

InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree
5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51
5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51
5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51
5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51
5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52
5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by: gallatin
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15366
f7adfa51510975714c0f79de81d4f7a4e596a908 15-May-2018 brooks <brooks@FreeBSD.org> Unwrap some not-so-long lines now that extra tabs been removed.
f5d8b3c62e7892fddb7c372282c702cbe7198b30 15-May-2018 brooks <brooks@FreeBSD.org> Remove stray tabs in in6_lltable_dump_entry(). NFC.
95d56654953a39ca201cb89a3026fc6f54e420f3 08-May-2018 hselasky <hselasky@FreeBSD.org> Fix for missing network interface address event when adding the default IPv6
based link-local address.

The default link local address for IPv6 is added as part of bringing the
network interface up. Move the call to "EVENTHANDLER_INVOKE(ifaddr_event,)"
from the SIOCAIFADDR_IN6 ioctl(2) handler to in6_notify_ifa() which should
catch all the cases of adding IPv6 based addresses to a network interface.
Add a witness warning in case the event handler is not allowed to sleep.

Reviewed by: network (ae), kib
Differential Revision: https://reviews.freebsd.org/D13407
MFC after: 1 week
Sponsored by: Mellanox Technologies
7d4b8facc71c1322df5656f7d007a39939d5d013 02-May-2018 shurd <shurd@FreeBSD.org> Separate list manipulation locking from state change in multicast

Multicast incorrectly calls in to drivers with a mutex held causing drivers
to have to go through all manner of contortions to use a non sleepable lock.
Serialize multicast updates instead.

Submitted by: mmacy <mmacy@mattmacy.io>
Reviewed by: shurd, sbruno
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D14969
26c165ead9a64098315408d16c1186fe1f9cfb3c 13-Apr-2018 brooks <brooks@FreeBSD.org> Remove support for the Arcnet protocol.

While Arcnet has some continued deployment in industrial controls, the
lack of drivers for any of the PCI, USB, or PCIe NICs on the market
suggests such users aren't running FreeBSD.

Evidence in the PR database suggests that the cm(4) driver (our sole
Arcnet NIC) was broken in 5.0 and has not worked since.

PR: 182297
Reviewed by: jhibbits, vangyzen
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15057
6dcf9514b31c54ae191ebc60f3dbeb090475a1f3 11-Apr-2018 brooks <brooks@FreeBSD.org> Remove support for FDDI networks.

Defines in net/if_media.h remain in case code copied from ifconfig is in
use elsewere (supporting non-existant media type is harmless).

Reviewed by: kib, jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15017
9d79658aab1a30f34fee169ce74bdff4ca405c18 06-Apr-2018 brooks <brooks@FreeBSD.org> Move most of the contents of opt_compat.h to opt_global.h.

opt_compat.h is mentioned in nearly 180 files. In-progress network
driver compabibility improvements may add over 100 more so this is
closer to "just about everywhere" than "only some files" per the
guidance in sys/conf/options.

Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of
sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h
is created on all architectures.

Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the
set of compiled files.

Reviewed by: kib, cem, jhb, jtl
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14941
2b96daf50f721a8f7241ae38a58b062c5b23f316 30-Mar-2018 brooks <brooks@FreeBSD.org> Document and enforce assumptions about struct (in6_)ifreq.

- The two types must be type-punnable for shared members of ifr_ifru.
This allows compatibility accessors to be shared.

- There must be no padding gap between ifr_name and ifr_ifru. This is
assumed in tcpdump's use of SIOCGIFFLAGS output which attempts to be
broadly portable. This is true for all current architectures, but very
large (256-bit) fat-pointers could violate this invariant.

Reviewed by: kib
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14910
349ad8a8dedf4739bb074416c3b00f338ca572d4 30-Mar-2018 brooks <brooks@FreeBSD.org> Remove a comment that suggests checking that a non-pointer is non-NULL.

Reviewed by: melifaro, markj, hrs, ume
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14904
a45d44647fa7c75f378716e0db499f05f0800598 28-Mar-2018 brooks <brooks@FreeBSD.org> Remove infrastructure for token-ring networks.

Reviewed by: cem, imp, jhb, jmallett
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14875
7bb5ee0db4f98451224121024ee1fce806266db5 17-Mar-2018 melifaro <melifaro@FreeBSD.org> Fix outgoing TCP/UDP packet drop on arp/ndp entry expiration.

Current arp/nd code relies on the feedback from the datapath indicating
that the entry is still used. This mechanism is incorporated into the
arpresolve()/nd6_resolve() routines. After the inpcb route cache
introduction, the packet path for the locally-originated packets changed,
passing cached lle pointer to the ether_output() directly. This resulted
in the arp/ndp entry expire each time exactly after the configured max_age
interval. During the small window between the ARP/NDP request and reply
from the router, most of the packets got lost.

Fix this behaviour by plugging datapath notification code to the packet
path used by route cache. Unify the notification code by using single
inlined function with the per-AF callbacks.

Reported by: sthaug at nethelp.no
Reviewed by: ae
MFC after: 2 weeks
c9a63ac91076d46683777a2be1291add666712d0 23-Jan-2018 asomers <asomers@FreeBSD.org> sys/netinet6: fix typos in comments. No functional change.

MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
4736ccfd9c3411d50371d7f21f9450a47c19047e 20-Nov-2017 pfg <pfg@FreeBSD.org> sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
d1b74add9a9ee8e208448aa1958ca9d1ca709aa0 17-Mar-2017 asomers <asomers@FreeBSD.org> Constrain IPv6 routes to single FIBs when net.add_addr_allfibs=0

sys/netinet6/icmp6.c
Use the interface's FIB for source address selection in ICMPv6 error
responses.

sys/netinet6/in6.c
In in6_newaddrmsg, announce arrival of local addresses on the
interface's FIB only. In in6_lltable_rtcheck, use a per-fib ND6
cache instead of a single cache.

sys/netinet6/in6_src.c
In in6_selectsrc, use the caller's fib instead of the default fib.
In in6_selectsrc_socket, remove a superfluous check.

sys/netinet6/nd6.c
In nd6_lle_event, use the interface's fib for routing socket
messages. In nd6_is_new_addr_neighbor, check all FIBs when trying
to determine whether an address is a neighbor. Also, simplify the
code for point to point interfaces.

sys/netinet6/nd6.h
sys/netinet6/nd6.c
sys/netinet6/nd6_rtr.c
Make defrouter_select fib-aware, and make all of its callers pass in
the interface fib.

sys/netinet6/nd6_nbr.c
When inputting a Neighbor Solicitation packet, consider the
interface fib instead of the default fib for DAD. Output NS and
Neighbor Advertisement packets on the correct fib.

sys/netinet6/nd6_rtr.c
Allow installing the same host route on different interfaces in
different FIBs. If rt_add_addr_allfibs=0, only install or delete
the prefix route on the interface fib.

tests/sys/netinet/fibs_test.sh
Clear some expected failures, but add a skip for the newly revealed
BUG217871.

PR: 196361
Submitted by: Erick Turnquist <jhujhiti@adjectivism.org>
Reported by: Jason Healy <jhealy@logn.net>
Reviewed by: asomers
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D9451
7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da 28-Feb-2017 imp <imp@FreeBSD.org> Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
91fc509b91f007c2ab6173bd2c3c306c774467fb 28-Jan-2017 avos <avos@FreeBSD.org> Garbage collect IFT_IEEE80211 (but leave the define for possible reuse)

This interface type ("a parent interface of wlanX") is not used since
r287197

Reviewed by: adrian, glebius
Differential Revision: https://reviews.freebsd.org/D9308
5bd3158a3b7184c65d3e1b6d96faf0dd720eb6ac 25-Jan-2017 loos <loos@FreeBSD.org> After the in_control() changes in r257692, an existing address is
(intentionally) deleted first and then completely added again (so all the
events, announces and hooks are given a chance to run).

This cause an issue with CARP where the existing CARP data structure is
removed together with the last address for a given VHID, which will cause
a subsequent fail when the address is later re-added.

This change fixes this issue by adding a new flag to keep the CARP data
structure when an address is not being removed.

There was an additional issue with IPv6 CARP addresses, where the CARP data
structure would never be removed after a change and lead to VHIDs which
cannot be destroyed.

Reviewed by: glebius
Obtained from: pfSense
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC (Netgate)
0e9bbe2171861777034762b3a900c9e3222374a7 07-Oct-2016 markj <markj@FreeBSD.org> Lock the ND prefix list and add refcounting for prefixes.

This change extends the nd6 lock to protect the ND prefix list as well
as the list of advertising routers associated with each prefix. To handle
cases where the nd6 lock must be dropped while iterating over either the
prefix or default router lists, a generation counter is used to track
modifications to the lists. Additionally, a new mutex is used to serialize
prefix on-link/off-link transitions. This mutex must be acquired before
the nd6 lock and is held while updating the routing table in
nd6_prefix_onlink() and nd6_prefix_offlink().

Reviewed by: ae, tuexen (SCTP bits)
Tested by: Jason Wolfe <jason@llnw.com>,
Larry Rosenman <ler@lerctr.org>
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D8125
e89e3efaa67abb63ea49fc0b491b90dc1a4da099 24-Sep-2016 markj <markj@FreeBSD.org> Rename ndpr_refcnt to ndpr_addrcnt.

This field counts derived addresses and is not a true refcount for prefix
objects, so the previous name was misleading.

MFC after: 1 week
562868062bbafb1b50af65d79c593d301b5f3953 28-Jul-2016 sbruno <sbruno@FreeBSD.org> MFC r296063 r297397 r299213

296063:
Lock the NDP default router list and count defrouter references.

This addresses a number of race conditions that can cause crashes as a
result of unsynchronized access to the list.

297397
Modify nd6_llinfo_timer() to acquire the nd6 lock before the LLE lock.

When expiring a neighbour cache entry we may need to look up the associated
default router, which requires the nd6 read lock. To avoid an LOR, the nd6
lock should be acquired first.

299213
Clean up callers of nd6_prelist_add().

nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

Submitted by: Jason Wolfe <j@nitrology.com>
7cdbaef028ebaa6b6a8adce78fc14023033eb1f8 22-Jun-2016 ae <ae@FreeBSD.org> Fix the NULL pointer dereference for unresolved link layer entries in
the netinet6 code. Copy link layer address only when corresponding entry
has LLE_VALID flag.

PR: 210379
Approved by: re (kib)
7a1c0b1ad10703084b50a5b307bbd60603471e1c 21-Jun-2016 bz <bz@FreeBSD.org> Get closer to a VIMAGE network stack teardown from top to bottom rather
than removing the network interfaces first. This change is rather larger
and convoluted as the ordering requirements cannot be separated.

Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and
related modules to their own SI_SUB_PROTO_FIREWALL.
Move initialization of "physical" interfaces to SI_SUB_DRIVERS,
move virtual (cloned) interfaces to SI_SUB_PSEUDO.
Move Multicast to SI_SUB_PROTO_MC.

Re-work parts of multicast initialisation and teardown, not taking the
huge amount of memory into account if used as a module yet.

For interface teardown we try to do as many of them as we can on
SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling
over a higher layer protocol such as IP. In that case the interface
has to go along (or before) the higher layer protocol is shutdown.

Kernel hhooks need to go last on teardown as they may be used at various
higher layers and we cannot remove them before we cleaned up the higher
layers.

For interface teardown there are multiple paths:
(a) a cloned interface is destroyed (inside a VIMAGE or in the base system),
(b) any interface is moved from a virtual network stack to a different
network stack ("vmove"), or (c) a virtual network stack is being shut down.
All code paths go through if_detach_internal() where we, depending on the
vmove flag or the vnet state, make a decision on how much to shut down;
in case we are destroying a VNET the individual protocol layers will
cleanup their own parts thus we cannot do so again for each interface as
we end up with, e.g., double-frees, destroying locks twice or acquiring
already destroyed locks.
When calling into protocol cleanups we equally have to tell them
whether they need to detach upper layer protocols ("ulp") or not
(e.g., in6_ifdetach()).

Provide or enahnce helper functions to do proper cleanup at a protocol
rather than at an interface level.

Approved by: re (hrs)
Obtained from: projects/vnet
Reviewed by: gnn, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D6747
0f12e1993e6b783d07578b3c2f8d53e6fb212728 13-Jun-2016 pfg <pfg@FreeBSD.org> Remove the SIOCSIFALIFETIME_IN6 ioctl.

The SIOCSIFALIFETIME_IN6 provided by the kame project is unused,
it can't really be used safely and has been completely removed from
NetBSD and OpenBSD.

Obtained from: NetBSD (kern/35897)
PR: 210148 (exp-run)
Reviewed by: ae, hrs
Relnotes: yes
Approved by: re (glebius)
Differential Revision: https://reviews.freebsd.org/D5491
00d578928eca75be320b36d37543a7e2a4f9fbdb 27-May-2016 grehan <grehan@FreeBSD.org> Create branch for bhyve graphics import.
c5c6630f07c77ca823837b255d06f6b70df2125e 15-May-2016 markj <markj@FreeBSD.org> Remove an always-false error check in the AIFADDR_IN6 handler.

CID: 1250792
MFC after: 1 week
94a1c2572532b60f69f86b39608742b65fdd2558 07-May-2016 markj <markj@FreeBSD.org> Clean up callers of nd6_prelist_add().

nd6_prelist_add() sets *newp if and only if it is successful, so there's no
need for code that handles the case where the return value is 0 and
*newp == NULL. Fix some style bugs in nd6_prelist_add() while here.

MFC after: 1 week
d9c9113377a2f19d01848ae8dcc470e9306ce932 03-May-2016 pfg <pfg@FreeBSD.org> sys/net*: minor spelling fixes.

No functional change.
23a478288f9297c0a9ae267472cca955556eae96 26-Apr-2016 cem <cem@FreeBSD.org> in_lltable_alloc and in6 copy: Don't leak LLE in error path

Fix a memory leak in error conditions introduced in r292978.

Reported by: Coverity
CIDs: 1347009, 1347010
Sponsored by: EMC / Isilon Storage Division
12232f84636cebecfa250541cfbf09b07fe2f520 15-Apr-2016 pfg <pfg@FreeBSD.org> sys/net* : for pointers replace 0 with NULL.

Mostly cosmetical, no functional change.

Found with devel/coccinelle.
4081472216bb2faccbd9f119e36c664b62a660c4 30-Mar-2016 markj <markj@FreeBSD.org> Fix the lladdr copy in in6_lltable_dump_entry() after r292978.

This bug caused "ndp -a" to show the wrong link layer address for neighbour
cache entries.

PR: 208067
c905b9e75ce7c386db0df386df6927e2afd8508f 17-Feb-2016 glebius <glebius@FreeBSD.org> Ternary operator has lower priority than OR.

Found by: PVS-Studio
e8ebbaf6da046225b7745bbe670a73e8d0531958 21-Jan-2016 bz <bz@FreeBSD.org> MFC 292953:

This code is not in modules that need KPI stability so no need to use
the wrapper functions as used in r252511 (head). We can directly use
the locking macros.
b52158b80d26a6cab7e43346abee96ede1121e31 07-Jan-2016 wollman <wollman@FreeBSD.org> MFH r292836:

in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces

bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than
IFT_ETHER, even though they only support Ethernet-style links. This
caused in6_if2idlen to emit an "unknown link type (209)" warning to
the console every time it was called. Add IFT_BRIDGE to the case
statement in the appropriate place, indicating that it uses the same
IPv6 address format as other Ethernet-like interfaces.
93152c67c93acd0eca913cc1939a3393129c2c4d 31-Dec-2015 melifaro <melifaro@FreeBSD.org> Implement interface link header precomputation API.

Add if_requestencap() interface method which is capable of calculating
various link headers for given interface. Right now there is support
for INET/INET6/ARP llheader calculation (IFENCAP_LL type request).
Other types are planned to support more complex calculation
(L2 multipath lagg nexthops, tunnel encap nexthops, etc..).

Reshape 'struct route' to be able to pass additional data (with is length)
to prepend to mbuf.

These two changes permits routing code to pass pre-calculated nexthop data
(like L2 header for route w/gateway) down to the stack eliminating the
need for other lookups. It also brings us closer to more complex scenarios
like transparently handling MPLS nexthops and tunnel interfaces.
Last, but not least, it removes layering violation introduced by flowtable
code (ro_lle) and simplifies handling of existing if_output consumers.

ARP/ND changes:
Make arp/ndp stack pre-calculate link header upon installing/updating lle
record. Interface link address change are handled by re-calculating
headers for all lles based on if_lladdr event. After these changes,
arpresolve()/nd6_resolve() returns full pre-calculated header for
supported interfaces thus simplifying if_output().
Move these lookups to separate ether_resolve_addr() function which ether
returs error or fully-prepared link header. Add <arp|nd6_>resolve_addr()
compat versions to return link addresses instead of pre-calculated data.

BPF changes:
Raw bpf writes occupied _two_ cases: AF_UNSPEC and pseudo_AF_HDRCMPLT.
Despite the naming, both of there have ther header "complete". The only
difference is that interface source mac has to be filled by OS for
AF_UNSPEC (controlled via BIOCGHDRCMPLT). This logic has to stay inside
BPF and not pollute if_output() routines. Convert BPF to pass prepend data
via new 'struct route' mechanism. Note that it does not change
non-optimized if_output(): ro_prepend handling is purely optional.
Side note: hackish pseudo_AF_HDRCMPLT is supported for ethernet and FDDI.
It is not needed for ethernet anymore. The only remaining FDDI user is
dev/pdq mostly untouched since 2007. FDDI support was eliminated from
OpenBSD in 2013 (sys/net/if_fddisubr.c rev 1.65).

Flowtable changes:
Flowtable violates layering by saving (and not correctly managing)
rtes/lles. Instead of passing lle pointer, pass pointer to pre-calculated
header data from that lle.

Differential Revision: https://reviews.freebsd.org/D4102
be33cc799d8409186373ac352c5ce5146e2feede 30-Dec-2015 bz <bz@FreeBSD.org> This code is not in modules that need KPI stability so no need to use
the wrapper functions as used in r252511. We can directly use the
locking macros.

Reviewed by: jtl, rwatson
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4731
fc57e5b82d72057b5ac303a1428c254ce29034f1 28-Dec-2015 wollman <wollman@FreeBSD.org> in6_if2idlen: treat bridge(4) interfaces like other Ethernet interfaces

bridge(4) interfaces have an if_type of IFT_BRIDGE, rather than
IFT_ETHER, even though they only support Ethernet-style links. This
caused in6_if2idlen to emit an "unknown link type (209)" warning to
the console every time it was called. Add IFT_BRIDGE to the case
statement in the appropriate place, indicating that it uses the same
IPv6 address format as other Ethernet-like interfaces.

MFC after: 1 week
45d5617154226a3aec179038a1be774c890a4f26 17-Dec-2015 smh <smh@FreeBSD.org> Revert r292275 & r292379

glebius has concerns about these changes so reverting those can be discussed
and addressed.

Sponsored by: Multiplay
fb12a509fe8b781ba3e865fcc78af667ed78c66c 16-Dec-2015 melifaro <melifaro@FreeBSD.org> Provide additional lle data in IPv6 lltable dump used by ndp(8).

Before the change, things like lle state were queried via
SIOCGNBRINFO_IN6 by ndp(8) for _each_ lle entry in dump.
This ioctl was added in 1999, probably to avoid touching rtsock code.

This change maps SIOCGNBRINFO_IN6 data to standard rtsock dump the
following way:
expire (already) maps to rtm_rmx.rmx_expire
isrouter -> rtm_flags & RTF_GATEWAY
asked -> rtm_rmx.rmx_pksent
state -> rtm_rmx.rmx_state (maps to rmx_weight via define)

Reviewed by: ae
864cf1812819836284d12030ce553ee743ca10f0 15-Dec-2015 smh <smh@FreeBSD.org> Fix lagg failover due to missing notifications

When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited
Neighbour Advertisements (IPv6) are sent to notify other nodes that the
address may have moved.

This results is slow failover, dropped packets and network outages for the
lagg interface when the primary link goes down.

We now use the new if_link_state_change_cond with the force param set to
allow lagg to force through link state changes and hence fire a
ifnet_link_event which are now monitored by rip and nd6.

Upon receiving these events each protocol trigger the relevant
notifications:
* inet4 => Gratuitous ARP
* inet6 => Unsolicited Neighbour Announce

This also fixes the carp IPv6 NA's that stopped working after r251584 which
added the ipv6_route__llma route.

The new behavour can be controlled using the sysctls:
* net.link.ether.inet.arp_on_link
* net.inet6.icmp6.nd6_on_link

Also removed unused param from lagg_port_state and added descriptions for the
sysctls while here.

PR: 156226
MFC after: 1 month
Sponsored by: Multiplay
Differential Revision: https://reviews.freebsd.org/D4111
30ed5ce1ff6645dcdcc03a75644cb64a37e521f3 14-Dec-2015 kp <kp@FreeBSD.org> inet6: Do not assume every interface has ip6 enabled.

Certain interfaces (e.g. pfsync0) do not have ip6 addresses (in other words,
ifp->if_afdata[AF_INET6] is NULL). Ensure we don't panic when the MTU is
updated.

pfsync interfaces will never have ip6 support, because it's explicitly disabled
in in6_domifattach().

PR: 205194
Reviewed by: melifaro, hrs
Differential Revision: https://reviews.freebsd.org/D4522
5acc305ae04293e9f5d59d5b6782bdf9c72ba05b 13-Dec-2015 melifaro <melifaro@FreeBSD.org> Remove LLE read lock from IPv6 fast path.

LLE structure is mostly unchanged during its lifecycle: there are only 2
things relevant for fast path lookup code:
1) link-level address change. Since r286722, these updates are performed
under AFDATA WLOCK.
2) Some sort of feedback indicating that this particular entry is used so
we send NS to perform reachability verification instead of expiring entry.
The only signal that is needed from fast path is something like binary
yes/no.
The latter is solved by the following changes:

Special r_skip_req (introduced in D3688) value is used for fast path feedback.
It is read lockless by fast path, but updated under req_mutex mutex. If this
field is non-zero, then fast path will acquire lock and set it back to 0.

After transitioning to STALE state, callout timer is armed to run each
V_nd6_delay seconds to make sure that if packet was transmitted at the start
of given interval, we would be able to switch to PROBE state in V_nd6_delay
seconds as user expects.
(in STALE state) timer is rescheduled until original V_nd6_gctimer expires
keeping lle in STALE state (remaining timer value stored in lle_remtime).
(in STALE state) timer is rescheduled if packet was transmitted less that
V_nd6_delay seconds ago to make sure we transition to PROBE state exactly
after V_n6_delay seconds.

As a result, all packets towards lle in REACHABLE/STALE/PROBE states are handled
by fast path without acquiring lle read lock.

Differential Revision: https://reviews.freebsd.org/D3780
c85e616e29482200cd0a47fa6c85165a98a4caaf 10-Dec-2015 ngie <ngie@FreeBSD.org> MFhead @ r292053
2bb0e924cc5d81b56b5eba9cae7011c55a7c6215 09-Dec-2015 melifaro <melifaro@FreeBSD.org> Make in_arpinput(), inp_lookup_mcast_ifp(), icmp_reflect(),
ip_dooptions(), icmp6_redirect_input(), in6_lltable_rtcheck(),
in6p_lookup_mcast_ifp() and in6_selecthlim() use new routing api.

Eliminate now-unused ip_rtaddr().
Fix lookup key fib6_lookup_nh_basic() which was lost diring merge.
Make fib6_lookup_nh_basic() and fib6_lookup_nh_extended() always
return IPv6 destination address with embedded scope. Currently
rw_gateway has it scope embedded, do the same for non-gatewayed
destinations.

Sponsored by: Yandex LLC
dc494194a2039a69fc3073f4ef4d628b955a0b25 13-Nov-2015 rrs <rrs@FreeBSD.org> This fixes several places where callout_stops return is examined. The
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.

MFC after: 1 Month with associated callout change.
595bcb4ce105d94c91a9045beea8a59ae9a64f39 07-Nov-2015 melifaro <melifaro@FreeBSD.org> Unify setting lladdr for AF_INET[6].
4fed811000ba5b64a4b7fba73ed6c0590038ab48 27-Sep-2015 melifaro <melifaro@FreeBSD.org> rtsock requests for deleting interface address lles started to return EPERM
instead of old "ignore-and-return 0" in r287789. This broke arp -da /
ndp -cn behavior (they exit on rtsock command failure). Fix this by
translating LLE_IFADDR to RTM_PINNED flag, passing it to userland and
making arp/ndp ignore these entries in batched delete.

MFC after: 2 weeks
7b9023e28a0afc962de748df57af9704b4bfd73f 22-Sep-2015 garga <garga@FreeBSD.org> Remove extra space introduced in r287734. This is a stable/10 only fix
since original commit (r287094) is correct.

Approved by: loos
Sponsored by: Rubicon Communications (Netgate)
5ad1f2444d736f306932d8a69b8357c63297cc83 14-Sep-2015 melifaro <melifaro@FreeBSD.org> * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3573
bc4a1ace904281f5732175770fe2858d1f9a26da 13-Sep-2015 hrs <hrs@FreeBSD.org> MFC 287094:

- Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa(). An address
which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and
IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.
8de92fe7791878815b9af01d60cb7264e217b976 13-Sep-2015 hrs <hrs@FreeBSD.org> MFC 287095, 287610, 287611, 287617:

Remove obsolete API (SIOCGDRLST_IN6 and SIOCGPRLST_IN6) support.
2f4beaf9e87209b013fcbb56bdbcb4a47c545e23 13-Sep-2015 hrs <hrs@FreeBSD.org> MFC 287609:

Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD.
0529f0c80f86c03666348c4d7ae7f0796b9134ac 10-Sep-2015 hrs <hrs@FreeBSD.org> Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6 forgotten in the previous commit.

MFC after: 3 days
e161347f7d0d8fd05296c74f12f4442caf20f348 10-Sep-2015 hrs <hrs@FreeBSD.org> Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD.

MFC after: 3 days
d013cff635dca5c5d55dea80a0e1d87ec9f8e5cd 05-Sep-2015 melifaro <melifaro@FreeBSD.org> Do not skip entries without LLE_VALID flag.
This one fixes showing incomplete entries in ndp -an.

MFC after: 2 weeks
3e1524c83e71a2bca2e8caa4e42e8054d09cd16d 05-Sep-2015 melifaro <melifaro@FreeBSD.org> Make in6ifa_ifpwithaddr() take const param.
Remove unneded DECONST from in6_lltable_rtcheck().
3f9699bcfe4422e0ba027d297154a947ac656338 31-Aug-2015 melifaro <melifaro@FreeBSD.org> Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in
alloc handler, based on flags.
d27954934dbe90eaadc064d7cbd06ab4114b5ea5 24-Aug-2015 hrs <hrs@FreeBSD.org> - Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa(). An address
which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
when the interface has ND6_IFF_IFDISABLED. And Do IFF_UP and
IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.
54b3b78856caa3ef7df00c807fd13701f84a49cc 20-Aug-2015 melifaro <melifaro@FreeBSD.org> * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
return existing if found, create if not. This behaviour was error-prone
since we had to deal with 'sudden' static<>dynamic lle changes.
This commit fixes bunch of different issues like:
- refcount leak when lle is converted to static.
Simple check case:
console 1:
while true;
do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
done;
done
console 2:
ping -f any-dead-host-in-L2
console 3:
# watch for memory consumption:
vmstat -m | awk '$1~/lltable/{print$2}'
- possible problems in arptimer() / nd6_timer() when dropping/reacquiring
lock.
New logic explicitly handles use-or-create cases in every lla_create
user. Basically, most of the changes are purely mechanical. However,
we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
lle_free_t callback for entry deletion
0c24547a6601393b02ca3eda7ccc3fd0c26ff55c 11-Aug-2015 melifaro <melifaro@FreeBSD.org> Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
d8f92ce2cfe8b9c73a429016a0fa82c19a19230a 11-Aug-2015 melifaro <melifaro@FreeBSD.org> Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by: Yandex LLC
8e6b3a8d59587481c485806c69429bede2a88772 11-Aug-2015 melifaro <melifaro@FreeBSD.org> MFP r276712.

* Split lltable_init() into lltable_allocate_htbl() (alloc
hash table with default callbacks) and lltable_link() (
links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
4f240a9c31022feb60343e2e1108338b44edb083 10-Aug-2015 melifaro <melifaro@FreeBSD.org> Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
format (code is not indented properly to minimize diff). Will be fixed
in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by: ae
fbf037b7860fea316178507725020f03c5d77dbb 08-Aug-2015 marius <marius@FreeBSD.org> Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT.
20bb5966e2075bde042b8b62c236e29d6e8934da 08-Aug-2015 melifaro <melifaro@FreeBSD.org> MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
lla_lookup(LLE_CREATE) -> lla_create
lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by: ae
2cbf021408d8aca2fc55ef11a68faac08842f8fd 05-Aug-2015 ae <ae@FreeBSD.org> MFC r285710:
Invoke LLE event handler when entry is deleted.
75425458ac884224851416fa8b7b06d187702f59 29-Jul-2015 ae <ae@FreeBSD.org> Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.

Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by: gnn (previous version)
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D3149
1bf10917ef2861d659852b889d19c43ecbb905c9 23-Jul-2015 hrs <hrs@FreeBSD.org> MFC r273992:

Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when
the newly-added IPv6 address was /128.

Approved by: re (gjb)
291abe56fe8089e767eab1773d8220d221f824a8 20-Jul-2015 ae <ae@FreeBSD.org> Invoke LLE event handler when entry is deleted.

MFC after: 2 weeks
Sponsored by: Yandex LLC
200ce7d83609576bee49855a6c2967c380f9052b 05-Jun-2015 ae <ae@FreeBSD.org> Rework r281868 to not skip RTM announces for tunneling interfaces.
This is direct commit to stable/10.

Tested by: tuexen@
a1e13e9e590c569e31b8bf4ccd7679b392a2d815 29-May-2015 ae <ae@FreeBSD.org> Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by: tuexen@
MFC after: 1 week
e545513c3ef78baddef748f9ea257633dad59fc9 08-May-2015 hiren <hiren@FreeBSD.org> MFC r261708, r261847, r268525, r274316, r274347, r275593,
r276844, r276847, r279531, r279559, r279564, r279676

A bunch of IPv6 fixes by melifaro, hrs and ae

Major changes:
Simplify nd6_output_lle()
Add refcounting to DAD and fix races and other errors
Implement Enhanced DAD algorithm for IPv6

Suggested by: ae
Tested by: Jason Wolfe <j at nitrology.com>
Sponsored by: Limelight Networks
1ca5d173abc8391e7e3f7e9fc63ec652fed67879 02-May-2015 glebius <glebius@FreeBSD.org> Remove #ifdef IFT_FOO.

Submitted by: Guy Yur <guyyur gmail.com>
a09a1acc0194e0c31432edea821d303e914c34e4 22-Apr-2015 ae <ae@FreeBSD.org> MFC r274988 (with modification):
Skip L2 addresses lookups for tunneling interfaces.

PR: 197286
37c56fd2f815d05ff28e35043429824db263e1a0 17-Apr-2015 glebius <glebius@FreeBSD.org> Fix r281649: don't call in6_clearscope() twice.

Submitted by: ae
14b7122d6dee034c5e3a8364b50fd099c0fed264 17-Apr-2015 glebius <glebius@FreeBSD.org> Provide functions to determine presence of a given address
configured on a given interface.

Discussed with: np
Sponsored by: Nginx, Inc.
218de7f1d6c414ce3c5e25f027e34b3c576cea9e 05-Mar-2015 hrs <hrs@FreeBSD.org> - Implement loopback probing state in enhanced DAD algorithm.

- Add no_dad and ignoreloop per-IF knob. no_dad disables DAD completely,
and ignoreloop is to prevent infinite loop in loopback probing state when
loopback is permanently expected.
3a3039379c4ae88bfbe7b15ad2c3f92f3e976f57 15-Feb-2015 rrs <rrs@FreeBSD.org> MFC of r278472
This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Sponsored by: Netflix Inc.
2d05aee53ac436da02d8368efae120be00fb2709 12-Feb-2015 ae <ae@FreeBSD.org> MFC r278268:
Print IPv6 address in log message instead of address of pointer.
e83a0077986ede1b5481738b285a4612ca8636e1 09-Feb-2015 rrs <rrs@FreeBSD.org> This fixes a bug in the way that the LLE timers for nd6
and arp were being used. They basically would pass in the
mutex to the callout_init. Because they used this method
to the callout system, it was possible to "stop" the callout.
When flushing the table and you stopped the running callout, the
callout_stop code would return 1 indicating that it was going
to stop the callout (that was about to run on the callout_wheel blocked
by the function calling the stop). Now when 1 was returned, it would
lower the reference count one extra time for the stopped timer, then
a few lines later delete the memory. Of course the callout_wheel was
stuck in the lock code and would then crash since it was accessing
freed memory. By using callout_init(c, 1) we always get a 0 back
and the reference counting bug does not rear its head. We do have
to make a few adjustments to the callouts themselves though to make
sure it does the proper thing if rescheduled as well as gets the lock.

Commented upon by hiren and sbruno
See Phabricator D1777 for more details.

Commented upon by hiren and sbruno
Reviewed by: adrian, jhb and bz
Sponsored by: Netflix Inc.
e079051f88d08cef161aea5a920b868143bd4485 05-Feb-2015 ae <ae@FreeBSD.org> Print IPv6 address in log message instead of address of pointer.

MFC after: 1 week
e9254e774f30fa398804dda71999cb295fa1e3e8 10-Nov-2014 ae <ae@FreeBSD.org> Remove link-local multicast routes remnants from in6_purgeaddr.
Also merge in6_purgeaddr_mc with in6_purgeaddr.

Sponsored by: Yandex LLC
f28e25322562395b296ee23a4a14deea4c73e630 10-Nov-2014 glebius <glebius@FreeBSD.org> Consistently use if_link.

Reviewed by: ae, melifaro
b5d711d3a6940afdd3615f7ffc2dcfa3faacd446 09-Nov-2014 melifaro <melifaro@FreeBSD.org> Renove faith(4) and faithd(8) from base. It looks like industry
have chosen different (and more traditional) stateless/statuful
NAT64 as translation mechanism. Last non-trivial commits to both
faith(4) and faithd(8) happened more than 12 years ago, so I assume
it is time to drop RFC3142 in FreeBSD.

No objections from: net@
11af63037f17d7b85036d03dc07687f77171b4b2 06-Nov-2014 melifaro <melifaro@FreeBSD.org> Make checks for rt_mtu generic:

Some virtual if drivers has (ab)used ifa ifa_rtrequest hook to enforce
route MTU to be not bigger that interface MTU. While ifa_rtrequest hooking
might be an option in some situation, it is not feasible to do MTU checks
there: generic (or per-domain) routing code is perfectly capable of doing
this.

We currrently have 3 places where MTU is altered:

1) route addition.
In this case domain overrides radix _addroute callback (in[6]_addroute)
and all necessary checks/fixes are/can be done there.

2) route change (especially, GW change).
In this case, there are no explicit per-domain calls, but one can
override rte by setting ifa_rtrequest hook to domain handler
(inet6 does this).

3) ifconfig ifaceX mtu YYYY
In this case, we have no callbacks, but ip[6]_output performes runtime
checks and decreases rt_mtu if necessary.

Generally, the goals are to be able to handle all MTU changes in
control plane, not in runtime part, and properly deal with increased
interface MTU.

This commit changes the following:
* removes hooks setting MTU from drivers side
* adds proper per-doman MTU checks for case 1)
* adds generic MTU check for case 2)

* The latter is done by using new dom_ifmtu callback since
if_mtu denotes L3 interface MTU, e.g. maximum trasmitted _packet_ size.
However, IPv6 mtu might be different from if_mtu one (e.g. default 1280)
for some cases, so we need an abstract way to know maximum MTU size
for given interface and domain.
* moves rt_setmetrics() before MTU/ifa_rtrequest hooks since it copies
user-supplied data which must be checked.
* removes RT_LOCK_ASSERT() from other ifa_rtrequest hooks to be able to
use this functions on new non-inserted rte.

More changes will follow soon.

MFC after: 1 month
Sponsored by: Yandex LLC
948508e0967ce692816e77054ef9ab5302b9313f 02-Nov-2014 hrs <hrs@FreeBSD.org> Fix a bug which prevented ND6_IFF_IFDISABLED flag from clearing when
the newly-added IPv6 address was /128.

PR: 188032
b27ddf1ff3b61ee99e933d7e2b15b23754120fb9 27-Oct-2014 ae <ae@FreeBSD.org> Do not automatically install routes to link-local and interface-local multicast
addresses.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
8d3b28e72b441800be2d280baced412ea6a78805 27-Oct-2014 ae <ae@FreeBSD.org> Remove unused function.

Sponsored by: Yandex LLC
e12a77b0cbb6b4c18eaeeedf28d9b22db5e5f23e 11-Sep-2014 ae <ae@FreeBSD.org> Add const qualifier to in6_addrhash() function.
Add in6ifa_ifwithaddr() function. It is similar to ifa_ifwithaddr,
but does fast lookup in the hash of inet6 addresses.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
5a094736fbe3fd6bd8c010c175a09079474f21b1 06-Sep-2014 markj <markj@FreeBSD.org> MFC r270348:
Add some missing checks for unsupported interfaces (e.g. pflog(4)) when
handling ioctls. While here, remove duplicated checks for a NULL ifp in
in6_control(): this check is already done near the beginning of the
function.

MFC r270349:
Suppress warnings when retrieving protocol stats from interfaces that
don't support IPv6 (e.g. pflog(4)).

PR: 189117
Approved by: re (gjb)
72d1944ad20d0f4c6f44cd9ec8adbc294db8a5e7 22-Aug-2014 markj <markj@FreeBSD.org> Add some missing checks for unsupported interfaces (e.g. pflog(4)) when
handling ioctls. While here, remove duplicated checks for a NULL ifp in
in6_control(): this check is already done near the beginning of the
function.

PR: 189117
Reviewed by: hrs
MFC after: 2 weeks
d32e428cc37439544fc5159604ba10ed560a88c1 29-Jul-2014 glebius <glebius@FreeBSD.org> Garbage collect couple of unused fields from struct ifaddr:
- ifa_claim_addr() unused since removal of NetAtalk
- ifa_metric seems to be never utilized, always a copy of if_metric
eb1a5f8de9f7ea602c373a710f531abbf81141c4 21-Feb-2014 gjb <gjb@FreeBSD.org> Move ^/user/gjb/hacking/release-embedded up one directory, and remove
^/user/gjb/hacking since this is likely to be merged to head/ soon.

Sponsored by: The FreeBSD Foundation
6ee59753ec1885c64b1d0c025a243ba2500d6777 19-Jan-2014 melifaro <melifaro@FreeBSD.org> Further rework netinet6 address handling code:
* Set ia address/mask values BEFORE attaching to address lists.
Inet6 address assignment is not atomic, so the simplest way to
do this atomically is to fill in ia before attach.
* Validate irfa->ia_addr field before use (we permit ANY sockaddr in old code).
* Do some renamings:
in6_ifinit -> in6_notify_ifa (interaction with other subsystems is here)
in6_setup_ifa -> in6_broadcast_ifa (LLE/Multicast/DaD code)
in6_ifaddloop -> nd6_add_ifa_lle
in6_ifremloop -> nd6_rem_ifa_lle
* Split working with LLE and route announce code for last two.
Add temporary in6_newaddrmsg() function to mimic current rtsock behaviour.
* Call device SIOCSIFADDR handler IFF we're adding first address.
In IPv4 we have to call it on every address change since ARP record
is installed by arp_ifinit() which is called by given handler.
IPv6 stack, on the opposite is responsible to call nd6_add_ifa_lle() so
there is no reason to call SIOCSIFADDR often.
4e822960631d384bad3e81ffafdc5b5e9760cece 18-Jan-2014 melifaro <melifaro@FreeBSD.org> Add in6_prepare_ifra() function to ease preparing in-kernel IPv6
address requests.

MFC after: 2 weeks
9f1142ff9549d6595d71a9c64b7f2cbc660753fc 18-Jan-2014 melifaro <melifaro@FreeBSD.org> Do some style(9) not done in r260851 to improve readability.

MFC after: 2 weeks
9b02dc0fae15492bce7bff1056fa7e040edc0575 18-Jan-2014 melifaro <melifaro@FreeBSD.org> Split in6_update_ifa() into smaller pieces leaving functionality intact.

Discussed with: ae
MFC after: 2 weeks
65169ca8a03275870336017f91b02d8d16abdd24 10-Jan-2014 ae <ae@FreeBSD.org> MFC r260151 (by adrian):
Use an RLOCK here instead of an RWLOCK - matching all the other calls
to lla_lookup().

This drastically reduces the very high lock contention when doing parallel
TCP throughput tests (> 1024 sockets) with IPv6.

MFC r260187:
lla_lookup() does modification only when LLE_CREATE is specified.
Thus we can use IF_AFDATA_RLOCK() instead of IF_AFDATA_LOCK() when doing
lla_lookup() without LLE_CREATE flag.

MFC r260217:
Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.
db2be6a7935bbbcc9f166d7b77bae083a04ecf1c 08-Jan-2014 melifaro <melifaro@FreeBSD.org> Introduce IN6_MASK_ADDR() macro to unify various hand-rolled code
to do IPv6 addr & mask in different places.

MFC after: 2 weeks
941bb837f91faf433dbd176456cf1cd0f16e010d 03-Jan-2014 ae <ae@FreeBSD.org> Add IF_AFDATA_WLOCK_ASSERT() in case lla_lookup() is called with
LLE_CREATE flag.

MFC after: 1 week
6b01bbf146ab195243a8e7d43bb11f8835c76af8 27-Dec-2013 gjb <gjb@FreeBSD.org> Copy head@r259933 -> user/gjb/hacking/release-embedded for initial
inclusion of (at least) arm builds with the release.

Sponsored by: The FreeBSD Foundation
3c1f482e0e0f1e3715112a75435f2e38eeec0519 11-Nov-2013 glebius <glebius@FreeBSD.org> Remove never used ioctls that originate from KAME. The proof
of their zero usage was exp-run from misc/183538.
f469ae1d459eb17461e1fdfa9af613fb107e7be2 28-Oct-2013 glebius <glebius@FreeBSD.org> Include necessary headers that now are available due to pollution
via if_var.h.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
564d02b3040edd78c448583a6b0821a509487497 15-Oct-2013 glebius <glebius@FreeBSD.org> Remove ifa_init() and provide ifa_alloc() that will allocate and setup
struct ifaddr internally.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
21e6fd796b765b83afb4869cb49eb219ade8c54d 10-Sep-2013 des <des@FreeBSD.org> Fix the length calculation for the final block of a sendfile(2)
transmission which could be tricked into rounding up to the nearest
page size, leaking up to a page of kernel memory. [13:11]

In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR
and SIOCSIFNETMASK at the socket layer rather than pass them on to the
link layer without validation or credential checks. [SA-13:12]

Prevent cross-mount hardlinks between different nullfs mounts of the
same underlying filesystem. [SA-13:13]

Security: CVE-2013-5666
Security: FreeBSD-SA-13:11.sendfile
Security: CVE-2013-5691
Security: FreeBSD-SA-13:12.ifioctl
Security: CVE-2013-5710
Security: FreeBSD-SA-13:13.nullfs
Approved by: re
13c1bcf2c1d5fbdca99cdddec726f822b68dddbc 05-Aug-2013 hrs <hrs@FreeBSD.org> - Use time_uptime instead of time_second in data structures for
PF_INET6 in kernel. This fixes various malfunction when the wall time
clock is changed. Bump __FreeBSD_version to 1000041.

- Use clock_gettime(CLOCK_MONOTONIC_FAST) in userland utilities.

MFC after: 1 month
64e5ea06531a17c184bfb8319a36165135f613c5 31-Jul-2013 hrs <hrs@FreeBSD.org> Allocate in6_ifextra (ifp->if_afdata[AF_INET6]) only for IPv6-capable
interfaces. This eliminates unnecessary IPv6 processing for non-IPv6
interfaces.

MFC after: 3 days
9bfe2ac5dd08af286e1fbba35062cd9f06b0e111 09-Jul-2013 ae <ae@FreeBSD.org> Correct the size of allocated memory to store array of counters.
08c6719ac4955adc91cf30ee1de8d52a06baf495 09-Jul-2013 ae <ae@FreeBSD.org> Migrate structs in6_ifstat and icmp6_ifstat to PCPU counters.
50e0add9e4de4d5547753f15e5df2059b5ee1f11 02-Jul-2013 hrs <hrs@FreeBSD.org> - Allow ND6_IFF_AUTO_LINKLOCAL for IFT_BRIDGE. An interface with IFT_BRIDGE
is initialized with !ND6_IFF_AUTO_LINKLOCAL && !ND6_IFF_ACCEPT_RTADV
regardless of net.inet6.ip6.accept_rtadv and net.inet6.ip6.auto_linklocal.
To configure an autoconfigured link-local address (RFC 4862), the
following rc.conf(5) configuration can be used:

ifconfig_bridge0_ipv6="inet6 auto_linklocal"

- if_bridge(4) now removes IPv6 addresses on a member interface to be
added when the parent interface or one of the existing member
interfaces has an IPv6 address. if_bridge(4) merges each link-local
scope zone which the member interfaces form respectively, so it causes
address scope violation. Removal of the IPv6 addresses prevents it.

- if_lagg(4) now removes IPv6 addresses on a member interfaces
unconditionally.

- Set reasonable flags to non-IPv6-capable interfaces. [*]

Submitted by: rpaulo [*]
MFC after: 1 week
40bb6f2505f87a45c239a11997e89d5b593ac37d 19-May-2013 melifaro <melifaro@FreeBSD.org> Really fix netmask address family this time.

MFC with: r250813
9f42266f8dbf5c8d3237c5ada516071f0efb1645 19-May-2013 melifaro <melifaro@FreeBSD.org> Finish r85740 : Make IPv6 netmask has address family set.
This pleases routing daemons like bird.

MFC after: 2 weeks
d9d71436d975a7ec598297c564219c530aa41138 04-May-2013 hrs <hrs@FreeBSD.org> Use FF02:0:0:0:0:2:FF00::/104 prefix for IPv6 Node Information Group
Address. Although KAME implementation used FF02:0:0:0:0:2::/96 based on
older versions of draft-ietf-ipngwg-icmp-name-lookup, it has been changed
in RFC 4620.

The kernel always joins the /104-prefixed address, and additionally does
/96-prefixed one only when net.inet6.icmp6.nodeinfo_oldmcprefix=1.
The default value of the sysctl is 1.

ping6(8) -N flag now uses /104-prefixed one. When this flag is specified
twice, it uses /96-prefixed one instead.

Reviewed by: ume
Based on work by: Thomas Scheffler
PR: conf/174957
MFC after: 2 weeks
9917da6df02790df9aa28e23a92f5ba84e711d30 21-Apr-2013 oleg <oleg@FreeBSD.org> Plug static llentry leak (ipv4 & ipv6 were affected).

PR: kern/172985
MFC after: 1 month
3f8d5a8f513efcfb7addc023709a44add9ec2a60 03-Jan-2013 peter <peter@FreeBSD.org> Temporarily revert rev 244678. This is causing loopback problems with
the lo (loopback) interfaces.
9f622a1b388500854580966c32199975628916ba 25-Dec-2012 glebius <glebius@FreeBSD.org> The SIOCSIFFLAGS ioctl handler runs if_up()/if_down() that notify
all interested parties in case if interface flag IFF_UP has changed.

However, not only SIOCSIFFLAGS can raise the flag, but SIOCAIFADDR
and SIOCAIFADDR_IN6 can, too. The actual |= is done not in the protocol
code, but in code of interface drivers. To fix this historical layering
violation, we will check whether ifp->if_ioctl(SIOCSIFADDR) raised the
IFF_UP flag, and if it did, run the if_up() handler.

This fixes configuring an address under CARP control on an interface
that was initially !IFF_UP.

P.S. I intentionally omitted handling the IFF_SMART flag. This flag was
never ever used in any driver since it was introduced, and since it
means another layering violation, it should be garbage collected instead
of pretended to be supported.
6ae87790328612368a591aafd4b04c0db7aefec6 15-Dec-2012 ae <ae@FreeBSD.org> In additional to the tailq of IPv6 addresses add the hash table.
For now use 256 buckets and fnv_hash function. Use xor'ed 32-bit
s6_addr32 parts of in6_addr structure as a hash key. Update
in6_localip and in6_is_addr_deprecated to use hash table for fastest
lookup.

Sponsored by: Yandex LLC
Discussed with: dwmalone, glebius, bz
377b89c55f75aba70c2c66bfc96fb4c4af7311ae 05-Dec-2012 hrs <hrs@FreeBSD.org> - Move definition of V_deembed_scopeid to scope6_var.h.
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
3948ce713ca0f2b610938ec42f8bd0df007f7e29 22-Oct-2012 delphij <delphij@FreeBSD.org> Remove __P.

Submitted by: kevlo
Reviewed by: md5(1)
MFC after: 2 months
34a9a386cb4df8844bca8e43dae20e4a15710fcc 18-Oct-2012 andre <andre@FreeBSD.org> Mechanically remove the last stray remains of spl* calls from net*/*.
They have been Noop's for a long time now.
abf245020a075c487a1ac4e60c7069e2d8c9c7c3 02-Aug-2012 glebius <glebius@FreeBSD.org> Fix races between in_lltable_prefix_free(), lla_lookup(),
llentry_free() and arptimer():

o Use callout_init_rw() for lle timeout, this allows us safely
disestablish them.
- This allows us to simplify the arptimer() and make it
race safe.
o Consistently use ifp->if_afdata_lock to lock access to
linked lists in the lle hashes.
o Introduce new lle flag LLE_LINKED, which marks an entry that
is attached to the hash.
- Use LLE_LINKED to avoid double unlinking via consequent
calls to llentry_free().
- Mark lle with LLE_DELETED via |= operation istead of =,
so that other flags won't be lost.
o Make LLE_ADDREF(), LLE_REMREF() and LLE_FREE_LOCKED() more
consistent and provide more informative KASSERTs.

The patch is a collaborative work of all submitters and myself.

PR: kern/165863
Submitted by: Andrey Zonov <andrey zonov.org>
Submitted by: Ryan Stone <rysto32 gmail.com>
Submitted by: Eric van Gyzen <eric_van_gyzen dell.com>
588de42f27bcd781118d9f4e5a9c51816a13585f 01-Aug-2012 glebius <glebius@FreeBSD.org> Some more whitespace cleanup.
53cb168f80fa70688bd35678b681684102548ae5 31-Jul-2012 glebius <glebius@FreeBSD.org> Some style(9) and whitespace changes.

Together with: Andrey Zonov <andrey zonov.org>
9cc8fb25a6c747a4ce4eff0392c16c81d2135af8 08-Jul-2012 attilio <attilio@FreeBSD.org> MFC
23dfdbf152d6bb60514603ffb020298ff3d484a0 08-Jul-2012 bz <bz@FreeBSD.org> As mentioned in the commit message of r237571 (copied from a prototype
patch of mine) also check if the 2nd in6_setscope() failed and return
the error in that case.

MFC after: 5 days
dbcf8e863a043122939d99e24a39b91210fcc2db 25-Jun-2012 delphij <delphij@FreeBSD.org> Fix a LOR acquiring the if_afdata lock while holding an rtentry lock.
Possibly do some entra work in case we would not get into the
ifa0 != NULL paths later as we already do for the mltaddr before.

XXX We should possibly error in case in6_setscope fails.

Reference: http://lists.freebsd.org/pipermail/freebsd-net/2011-September/029829.html

Submitted by: bz
MFC after: 1 week
2d7f5ff3a844e1f1c676401221ad845fec62c38b 05-Jun-2012 bz <bz@FreeBSD.org> Plug two interface address refcount leaks in early error return cases
in the ioctl path.

Reported by: rpaulo
Reviewed by: emax
MFC after: 3 days
b57a3bc250a353972d4c15a9a165efc033ec3a73 30-May-2012 emax <emax@FreeBSD.org> When we return deprecated addresses, we need to reference them.

Reviewed by: bz, scottl
MFC after: 3 days
a99e9d281db6624cf9b88f07759a77f7b2d96a33 23-Feb-2012 kmacy <kmacy@FreeBSD.org> When using flowtable llentrys can outlive the interface with which they're associated
at which the lle_tbl pointer points to freed memory and the llt_free pointer is no longer
valid.

Move the free pointer in to the llentry itself and update the initalization sites.

MFC after: 2 weeks
dcdb23291fec1365e927195511d5dfb273901a5d 17-Feb-2012 bz <bz@FreeBSD.org> Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days
e59c01b14f52764214c109603e8dd533dacef5da 24-Jan-2012 bz <bz@FreeBSD.org> Plug a possible ifa_ref leak in case of premature return from in6_purgeaddr().

Reviewed by: rwatson
MFC after: 3 days
728e6ff16e41c425792b3b9c07acdff4489b5f59 24-Jan-2012 pluknet <pluknet@FreeBSD.org> Remove the stale XXX rt_newaddrmsg comment.
A routing socket message is generated since r192282.

Reviewed by: bz
MFC after: 3 days
e8bf1256400ec5a8e0b447cb7a3dfca6f8e57089 24-Jan-2012 bz <bz@FreeBSD.org> Remove unnecessary line break.

MFC after: 3 days
4ef366671a65893ccd19a3fc76c415ab0d9a25f2 05-Jan-2012 jhb <jhb@FreeBSD.org> Convert all users of IF_ADDR_LOCK to use new locking macros that specify
either a read lock or write lock.

Reviewed by: bz
MFC after: 2 weeks
2886f1415cda856a3bebfd157865bae53bcd0acb 04-Jan-2012 glebius <glebius@FreeBSD.org> Use correct locking when traversing interface address list.

Reviewed by: bz
badf97ee75299cfb893ff28bb70d0a5fc731398e 03-Jan-2012 jhb <jhb@FreeBSD.org> Grab a reference on the matching interface address (ifa) in the handling
of the SIOC[DG]LIFADDR icotls before dropping the IF_ADDR_LOCK() and
release the reference after using it. This prevents the address from
being potentially freed out from under the ioctl handler.

Reviewed by: bz
MFC after: 1 week
dd61fe0873effbc4d17cf7dabc455d8155066adb 03-Jan-2012 jhb <jhb@FreeBSD.org> Use TAILQ_FOREACH() instead of TAILQ_FOREACH_SAFE() for some loops that
do not modify the queues they iterate over.

Submitted by: glebius
7a0151720c2fe467008a6d652b662b6bf34f8f4b 29-Dec-2011 jhb <jhb@FreeBSD.org> Use queue(3) macros instead of home-rolled versions in several places in
the INET6 code. This includes retiring the 'ndpr_next' and 'pfr_next'
macros.

Submitted by: pluknet (earlier version)
Reviewed by: pluknet
653f8c5e7181f0fd06ea5451ebb67351c2dd5626 21-Dec-2011 glebius <glebius@FreeBSD.org> Provide ABI compatibility shim to enable configuring of addresses
with ifconfig(8) prior to r228571.

Requested by: brooks
27a36f6ac8242750daa092abd7180b10d16f4508 16-Dec-2011 glebius <glebius@FreeBSD.org> A major overhaul of the CARP implementation. The ip_carp.c was started
from scratch, copying needed functionality from the old implemenation
on demand, with a thorough review of all code. The main change is that
interface layer has been removed from the CARP. Now redundant addresses
are configured exactly on the interfaces, they run on.

The CARP configuration itself is, as before, configured and read via
SIOCSVH/SIOCGVH ioctls. A new prefix created with SIOCAIFADDR or
SIOCAIFADDR_IN6 may now be configured to a particular virtual host id,
which makes the prefix redundant.

ifconfig(8) semantics has been changed too: now one doesn't need
to clone carpXX interface, he/she should directly configure a vhid
on a Ethernet interface.

To supply vhid data from the kernel to an application the getifaddrs(8)
function had been changed to pass ifam_data with each address. [1]

The new implementation definitely closes all PRs related to carp(4)
being an interface, and may close several others. It also allows
to run a single redundant IP per interface.

Big thanks to Bjoern Zeeb for his help with inet6 part of patch, for
idea on using ifam_data and for several rounds of reviewing!

PR: kern/117000, kern/126945, kern/126714, kern/120130, kern/117448
Reviewed by: bz
Submitted by: bz [1]
f44dbaa16a33a54f5c6e66fe4ede974434600683 12-Nov-2011 attilio <attilio@FreeBSD.org> MFC
3b996bbc1149552d433105f70b0d45df8873ed70 11-Nov-2011 qingli <qingli@FreeBSD.org> A default route learned from the RAs could be deleted manually
after its installation. This removal may be accidental and can
prevent the default route from being installed in the future if
the associated default router has the best preference. The cause
is the lack of status update in the default router on the state
of its route installation in the kernel FIB. This patch fixes
the described problem.

Reviewed by: hrs, discussed with hrs
MFC after: 5 days
4fd26a87dc702b4af14187dd0bea5282657c4b6d 16-Oct-2011 qingli <qingli@FreeBSD.org> The code change made in r226040 was incomplete and resulted in
routes such as fe80::1%lo0 no being installed. This patch completes
the original intended fix.

Reviewed by: hrs, bz
MFC after: 3 days
623fcd8af9263433e98341da5cb2c2b551228301 13-Oct-2011 glebius <glebius@FreeBSD.org> Restore functions in6_ifaddloop() and in6_ifremloop() that were
inlined by Qing Li in his big new-ARP commit. I am going to utilize
them in my newcarp work, and also these functions left declared
in in6_var.h for all the time they were absent.

Reviewed by: bz
d9f932fe21c5bbfac0e824f26de1265d4f60f020 05-Oct-2011 qingli <qingli@FreeBSD.org> The IFA_RTSELF instead of the IFA_ROUTE flag should be checked to
determine if a loopback route should be installed for an interface
IPv6 address. Another condition is the address must not belong to a
looopback interface.

Reviewed by: hrs
MFC after: 3 days
fa01a4aee0c4b294791a3b7515a6776750af5f04 20-Aug-2011 bz <bz@FreeBSD.org> Add an in6_localip() helper function as in6_localaddr() is not doing what
people think: returning true for an address in any connected subnet, not
necessarily on the local machine.

Sponsored by: Sandvine Incorporated
MFC after: 2 weeks
Approved by: re (kib)
99a0b299b3e1499b4637e127d2dd98bbf780464e 08-Jul-2011 zec <zec@FreeBSD.org> Permit ARP to proceed for IPv4 host routes for which the gateway is the
same as the host address. This already works fine for INET6 and ND6.

While here, remove two function pointers from struct lltable which are
only initialized but never used.

MFC after: 3 days
fc3c21e7daa042ecd95dabfaba0f5e46f1f54e4a 06-Jun-2011 hrs <hrs@FreeBSD.org> Merge from HEAD@222728,222730.
acbda2ccc11fcdfe5fa5175c97ac29b4bb729bb5 06-Jun-2011 hrs <hrs@FreeBSD.org> - Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens. The changes are:

1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
an attempt to clear the ND6_IFF_IFDISABLED flag.

2. When an AF_INET6 address is added successfully to an interface and
it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
ND6_IFF_IFDISABLED happens.

This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
in most cases manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
a link-local address.

This change ensures link-local address configuration when "ifconfig IF inet6"
command is invoked. For example, "ifconfig IF inet6 auto_linklocal" now
always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
set to 1 (i.e. down/up cycle is no longer needed).

Reviewed by: bz
40c523401dc269a1442426928fd7bc0c3bf4f559 03-Jun-2011 hrs <hrs@FreeBSD.org> - style(9) fixes.
- Comment rewording.

Submitted by: bz
d729d55b0d1c91bed335593c6d8e8ea4bde37ef8 01-Jun-2011 hrs <hrs@FreeBSD.org> - Make the code more proactively clear an ND6_IFF_IFDISABLED flag when
an explicit action for INET6 configuration happens. The changes are:

1. When an ND6 flag is changed via SIOCSIFINFO_FLAGS ioctl,
setting ND6_IFF_ACCEPT_RTADV and/or ND6_IFF_AUTO_LINKLOCAL now triggers
an attempt to clear the ND6_IFF_IFDISABLED flag.

2. When an AF_INET6 address is added successfully to an interface and
it is marked as ND6_IFF_IFDISABLED, an attempt to clear the
ND6_IFF_IFDISABLED happens.

This simplifies ND6_IFF_IFDISABLED flag manipulation by users via ifconfig(8);
in most cases the manual configuration is no longer needed.

- When ND6_IFF_AUTO_LINKLOCAL is set and no link-local address is assigned to
an interface, SIOCSIFINFO_FLAGS ioctl now calls in6_ifattach() to configure
a link-local address.

This change ensures link-local address configuration when "ifconfig IF inet6"
command is invoked. For example, "ifconfig IF inet6 auto_linklocal" now
always try to configure an LL addr even if ND6_IFF_AUTO_LINKLOCAL is already
set to 1 (i.e. down/up cycle is no longer needed).
a1bf1a258207345435ea10acd5842a3edd836a66 20-May-2011 qingli <qingli@FreeBSD.org> The statically configured (permanent) ARP entries are removed when an
interface is brought down, even though the interface address is still
valid. This patch maintains the permanent ARP entries as long as the
interface address (having the same prefix as that of the ARP entries)
is valid.

Reviewed by: delphij
MFC after: 5 days
2d7d8c05e7404fbebf1f0fe24c13bc5bb58d2338 21-Mar-2011 jeff <jeff@FreeBSD.org> - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.
a8c33e5555d9b1e9ce5d952b613cd8f8cc493b07 29-Nov-2010 bz <bz@FreeBSD.org> Plug well observed races on la_hold entries with the callout handler.

Call the handler function with the lock held, return unlocked as we
might free the entry. Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.

Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.

We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there. This also resulted in a push down of local
variable scopes into smaller blocks.

Reported by: many
PR: kern/148857
Submitted by: Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After: 4 days
09f9c897d33c41618ada06fbbcf1a9b3812dee53 19-Oct-2010 jamie <jamie@FreeBSD.org> A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.
1e2de7fa4db8695b6f004bb6a47dfe80e6332068 19-May-2010 alfred <alfred@FreeBSD.org> Fix our version of IPv6 address representation.

We do not respect rules 3 and 4 in the required list:

1. omit leading zeros

2. "::" used to their maximum extent whenever possible

3. "::" used where shortens address the most

4. "::" used in the former part in case of a tie breaker

5. do not shorten one 16 bit 0 field

6. use lower case

http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04.html

Submitted by: Kalluru Abhiram @ Juniper Networks
Obtained from: Juniper Networks
Reviewed by: hrs, dougb
ec9ede8f010d7bccf9cbdcea28346761bc7b2f93 11-May-2010 kib <kib@FreeBSD.org> MFC r207268:
Provide 32bit compat for SIOCGDEFIFACE_IN6.
85688162953c4cf38c355bcc60c0ab181a3bfa2e 27-Apr-2010 kib <kib@FreeBSD.org> Provide 32bit compat for SIOCGDEFIFACE_IN6.

Based on submission by: pluknet gmail com
Reviewed by: emaste
MFC after: 2 weeks
e36601cbc07715392430d5c4e0028505d13e6466 21-Apr-2010 bz <bz@FreeBSD.org> MFC r206481:

Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing the entire tables make sure that in case we
cancel a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
d7a91dc6bf166a266421facb5e7cc8067695b03b 11-Apr-2010 bz <bz@FreeBSD.org> Plug reference leaks in the link-layer code ("new-arp") that previously
prevented the link-layer entry from being freed.

In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.

In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.

In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().

In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.

Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
f1216d1f0ade038907195fc114b7e630623b402c 19-Mar-2010 delphij <delphij@FreeBSD.org> Create a custom branch where I will be able to do the merge.
ea5192e625ebf087e54f747a66b38f4034431708 05-Jan-2010 qingli <qingli@FreeBSD.org> MFC r201282, r201543

r201282
-------
The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

r201543
-------
The IFA_RTSELF address flag marks a loopback route has been installed
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.

PR: ports/141342, kern/141134
ed965a92bc17f25c5049fbd529d10a9e94f8a3a7 30-Dec-2009 qingli <qingli@FreeBSD.org> The proxy arp entries could not be added into the system over the
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.

MFC after: 5 days
b8b6fd92da99c878d706d74fe9d66594ed099d6b 28-Oct-2009 qingli <qingli@FreeBSD.org> MFC r198418

Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
c96d27ad8035ca68a2d9bc04be787ef6416d47f9 23-Oct-2009 qingli <qingli@FreeBSD.org> Use the correct option name in the preprocessor command to enable
or disable diagnostic messages.

Reviewed by: ru
MFC after: 3 days
ceec1be0ff52ee7829036be5125f8e0795e26acd 15-Sep-2009 qingli <qingli@FreeBSD.org> MFC r197227

Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
Approved by: re
3a82e44273f4a5c05d848c2959b2a9d8188b1ba0 15-Sep-2009 qingli <qingli@FreeBSD.org> Self pointing routes are installed for configured interface addresses
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.

Reviewed by: bz
MFC after: immediately
2eb62239d7432351eb544690c4ea2fc648ae2abc 12-Sep-2009 hrs <hrs@FreeBSD.org> Improve flexibility of receiving Router Advertisement and
automatic link-local address configuration:

- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
global knob. The default value of the sysctl is 0.

- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
sysctl net.inet6.ip6.auto_linklocal to one for its default
value. The default value of the sysctl is 1.

- Make ND6_IFF_IFDISABLED more robust. It can be used to disable
IPv6 functionality of an interface now.

- Receiving RA is allowed if ip6_forwarding==0 *and*
ND6_IFF_ACCEPT_RTADV is set on that interface. The former
condition will be revisited later to support a "host + router" box
like IPv6 CPE router. The current behavior is compatible with
the older releases of FreeBSD.

- The ifconfig(8) now supports these ND6 flags as well as "nud",
"prefer_source", and "disabled" in ndp(8). The ndp(8) now
supports "auto_linklocal".

Discussed with: bz and jinmei
Reviewed by: bz
MFC after: 3 days
bf55d011e10802798b8073a157412cab5a590399 05-Sep-2009 qingli <qingli@FreeBSD.org> MFC r196871

The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by: bz
Approved by: re
5b9cf14b541867aeeeabf34edade1a0f5c528580 05-Sep-2009 qingli <qingli@FreeBSD.org> The addresses that are assigned to the loopback interface
should be part of the kernel routing table.

Reviewed by: bz
MFC after: immediately
854d705b6c5997a91e7f08ef9f3d078180076d1c 05-Sep-2009 qingli <qingli@FreeBSD.org> MFC r196864

This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
Approved by: re
0cca60c70d1c3115577b3e8fa7f015d4d9eb8904 05-Sep-2009 qingli <qingli@FreeBSD.org> This patch fixes the following issues:
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.

Reviewed by: bz
MFC after: immediately
2fa517c9663f5b2833301bfd5c3a0325796c3989 28-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge r196535 from head to stable/8:

Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy, qingli

Approved by: re (kib)
464ba339f01ef15c3bef8d990c268431fe769b42 28-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge r196481 from head to stable/8:

Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian

Approved by: re (kib)
544dfa0789561f2dbb98a7740453fce7aeea4f5b 25-Aug-2009 rwatson <rwatson@FreeBSD.org> Use locks specific to the lltable code, rather than borrow the ifnet
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.

Reviewed by: bz, kmacy
MFC after: 3 days
ef8d755d4df716bf13f8a1833f7dd1db0b78c569 23-Aug-2009 rwatson <rwatson@FreeBSD.org> Rework global locks for interface list and index management, correcting
several critical bugs, including race conditions and lock order issues:

Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.

Reviewed by: bz, julian
MFC after: 3 days
4407106c855741c1e6b14f51396ea1d16e04386a 12-Aug-2009 qingli <qingli@FreeBSD.org> MFC r196152

A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.

Approved by: re
d7ddc7ccbe531019ee08753204ac3d9210ea215f 12-Aug-2009 qingli <qingli@FreeBSD.org> A piece of code was added to install a host route when an IPv6 interface
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.

Reviewed by: bz
Approved by: re
fb9ffed6504601ed9da2c6b9a620b133c838964c 01-Aug-2009 rwatson <rwatson@FreeBSD.org> Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by: bz
Approved by: re (vimage blanket)
8c1899d9347988f10f8675cc62ab08f26bb9f2d7 27-Jul-2009 qingli <qingli@FreeBSD.org> This patch does the following:

- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.

Reviewed by: bz
Approved by: re
57ca4583e728cab422fba8f15de10bd0b637b3dd 14-Jul-2009 rwatson <rwatson@FreeBSD.org> Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)
1d57d752af3da7674b59d9eb4f4e39ba06c1aea2 12-Jul-2009 qingli <qingli@FreeBSD.org> This patch adds a host route to an interface address (that is assigned
to a non loopback/ppp link type) through the loopback interface. Prior
to the new L2/L3 rewrite, this host route was explicitly created when
processing the IPv6 address assignment. This loopback host route is
deleted when that IPv6 address is removed from the interface.

Reviewed by: bz, gnn
Approved by: re
3b6551a921beb7f1408f05c3730aa5802bd6e79c 27-Jun-2009 rwatson <rwatson@FreeBSD.org> In in6_update_ifa(), jump to 'cleanup' rather than returning directly
in one additional case, avoiding an ifaddr reference leak.

Defer releasing the in6_ifaddr's in6_ifaddrhead reference until the
end of in6_unlink_ifa(), as callers are inconsistent regarding whether
or not they hold a reference across the call. This avoids using the
ifaddr after it may have been freed.

Reported by: tegge
Reviewed by: tegge
Approved by: re (blanket)
MFC after: 6 weeks
bd6eb7be79d81290efa6dcaa9f492a05b1966344 25-Jun-2009 rwatson <rwatson@FreeBSD.org> Add address list locking for in6_ifaddrhead/ia_link: as with locking
for in_ifaddrhead, we stick with an rwlock for the time being, which
we will revisit in the future with a possible move to rmlocks.

Some pieces of code require significant further reworking to be
safe from all classes of writer-writer races.

Reviewed by: bz
MFC after: 6 weeks
4cf6a458ca11df1df230a7c76e0ea742011bd2b0 25-Jun-2009 rwatson <rwatson@FreeBSD.org> Clean up reference management in in6_update_ifa and in6_unlink_ifa, and
in particular, add a reference for in6_ifaddrhead since we do remove a
reference for it when an IPv6 address is removed. This fixes ifconfig
delete of an IPv6 alias.

Reported by: tegge
MFC after: 6 weeks
9c4380a8eea873952968c44b6e2567cd55ba5011 24-Jun-2009 rwatson <rwatson@FreeBSD.org> Convert netinet6 to using queue(9) rather than hand-crafted linked lists
for the global IPv6 address list (in6_ifaddr -> in6_ifaddrhead). Adopt
the code styles and conventions present in netinet where possible.

Reviewed by: gnn, bz
MFC after: 6 weeks (possibly not MFCable?)
c9ef486fe1d7da6a2212a337eacc5ed5b40f85d9 23-Jun-2009 rwatson <rwatson@FreeBSD.org> Modify most routines returning 'struct ifaddr *' to return references
rather than pointers, requiring callers to properly dispose of those
references. The following routines now return references:

ifaddr_byindex
ifa_ifwithaddr
ifa_ifwithbroadaddr
ifa_ifwithdstaddr
ifa_ifwithnet
ifaof_ifpforaddr
ifa_ifwithroute
ifa_ifwithroute_fib
rt_getifa
rt_getifa_fib
IFP_TO_IA
ip_rtaddr
in6_ifawithifp
in6ifa_ifpforlinklocal
in6ifa_ifpwithaddr
in6_ifadd
carp_iamatch6
ip6_getdstifaddr

Remove unused macro which didn't have required referencing:

IFP_TO_IA6

This closes many small races in which changes to interface
or address lists while an ifaddr was in use could lead to use of freed
memory (etc). In a few cases, add missing if_addr_list locking
required to safely acquire references.

Because of a lack of deep copying support, we accept a race in which
an in6_ifaddr pointed to by mbuf tags and extracted with
ip6_getdstifaddr() doesn't hold a reference while in transmit. Once
we have mbuf tag deep copy support, this can be fixed.

Reviewed by: bz
Obtained from: Apple, Inc. (portions)
MFC after: 6 weeks (portions)
1f7e54e8c51edb13935d195e0c1f2ec68c672794 21-Jun-2009 rwatson <rwatson@FreeBSD.org> Clean up common ifaddr management:

- Unify reference count and lock initialization in a single function,
ifa_init().
- Move tear-down from a macro (IFAFREE) to a function ifa_free().
- Move reference count bump from a macro (IFAREF) to a function ifa_ref().
- Instead of using a u_int protected by a mutex to refcount(9) for
reference count management.

The ifa_mtx is now used for exactly one ioctl, and possibly should be
removed.

MFC after: 3 weeks
632fa4557466f1f20190899b29b3863089eb768f 10-Jun-2009 cperciva <cperciva@FreeBSD.org> Prevent integer overflow in direct pipe write code from circumventing
virtual-to-physical page lookups. [09:09]

Add missing permissions check for SIOCSIFINFO_IN6 ioctl. [09:10]

Fix buffer overflow in "autokey" negotiation in ntpd(8). [09:11]

Approved by: so (cperciva)
Approved by: re (not really, but SVN wants this...)
Security: FreeBSD-SA-09:09.pipe
Security: FreeBSD-SA-09:10.ipv6
Security: FreeBSD-SA-09:11.ntpd
b7ff2bdc204ec5e815f8123552bb0bee31638f8e 08-Jun-2009 bz <bz@FreeBSD.org> After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
a013e0afcbb44052a86a7977277d669d8883b7e7 27-May-2009 jamie <jamie@FreeBSD.org> Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)
e6b86b7c8fc96c72c1f5df5a94c60e96783ecaac 20-May-2009 qingli <qingli@FreeBSD.org> When an interface address is removed and the last prefix
route is also being deleted, the link-layer address table
(arp or nd6) will flush those L2 llinfo entries that match
the removed prefix.

Reviewed by: kmacy
a501d9a071f1dfe8471bed22e728be03188465b2 18-May-2009 qingli <qingli@FreeBSD.org> This patch resolves the following issues:

-- A routing socket message is not generated when an IPv6 address is
either inserted or deleted from an interface. The missing routing
message problem was discovered by Randall Stewart and Michael Tuxen
during SCTP testing.

-- Previously when an IPv6 address is configured on an interface, if the
prefix length is /128, then a host route is instaleld in the kernel
for this address. But this host route is not deleted when that IPv6
address is removed from the interface.

-- Routes to the link-local all-nodes multicast address and the
interface-local all-nodes multicast address are not removed when
the last IPv6 address is removed from an interface.

Reviewed by: bz, gnn
32a71137f08bc028578417de36a241d7e6011f58 29-Apr-2009 bms <bms@FreeBSD.org> Bite the bullet, and make the IPv6 SSM and MLDv2 mega-commit:
import from p4 bms_netdev. Summary of changes:

* Connect netinet6/in6_mcast.c to build.
The legacy KAME KPIs are mostly preserved.
* Eliminate now dead code from ip6_output.c.
Don't do mbuf bingo, we are not going to do RFC 2292 style
CMSG tricks for multicast options as they are not required
by any current IPv6 normative reference.
* Refactor transports (UDP, raw_ip6) to do own mcast filtering.
SCTP, TCP unaffected by this change.
* Add ip6_msource, in6_msource structs to in6_var.h.
* Hookup mld_ifinfo state to in6_ifextra, allocate from
domifattach path.
* Eliminate IN6_LOOKUP_MULTI(), it is no longer referenced.
Kernel consumers which need this should use in6m_lookup().
* Refactor IPv6 socket group memberships to use a vector (like IPv4).
* Update ifmcstat(8) for IPv6 SSM.
* Add witness lock order for IN6_MULTI_LOCK.
* Move IN6_MULTI_LOCK out of lower ip6_output()/ip6_input() paths.
* Introduce IP6STAT_ADD/SUB/INC/DEC as per rwatson's IPv4 cleanup.
* Update carp(4) for new IPv6 SSM KPIs.
* Virtualize ip6_mrouter socket.
Changes mostly localized to IPv6 MROUTING.
* Don't do a local group lookup in MROUTING.
* Kill unused KAME prototypes in6_purgemkludge(), in6_restoremkludge().
* Preserve KAME DAD timer jitter behaviour in MLDv1 compatibility mode.
* Bump __FreeBSD_version to 800084.
* Update UPDATING.

NOTE WELL:
* This code hasn't been tested against real MLDv2 queriers
(yet), although the on-wire protocol has been verified in Wireshark.
* There are a few unresolved issues in the socket layer APIs to
do with scope ID propagation.
* There is a LOR present in ip6_output()'s use of
in6_setscope() which needs to be resolved. See comments in mld6.c.
This is believed to be benign and can't be avoided for the moment
without re-introducing an indirect netisr.

This work was mostly derived from the IGMPv3 implementation, and
has been sponsored by a third party.
22bdc8dd64339b3690de0df36d39d43406b6c318 20-Apr-2009 rwatson <rwatson@FreeBSD.org> Prefer structure fields (ifa_link) to macro aliases for them
(ifa_list).

MFC after: 2 weeks
6fc60785e72960c0c2c1b7cda150d79811e9dab9 20-Apr-2009 rwatson <rwatson@FreeBSD.org> Acquire interface address list lock around access to if_addrhead,
closing several writer-writer races, and some read-write races.

MFC after: 2 weeks
084ce14c28bf83c651c7963eacdea97f7f32d914 20-Apr-2009 rwatson <rwatson@FreeBSD.org> Use TAILQ_FOREACH() and TAILQ_FOREACH_SAFE() rather than manually
accessing queue(9) structure fields for if_addrhead.

Prefer FreeBSD field name if_addrhead to compatibility macro
if_addrlist.

MFC after: 2 weeks
eb422ada74096ff9364e6e87dfe2b2a7390a437e 20-Apr-2009 rwatson <rwatson@FreeBSD.org> Close some but not all writer-writer races when maintaining IPv6
interface address lists by locking the interface address list lock.

MFC after: 2 weeks
70b6a8119c02ed07bc12918814c950d358cb1885 15-Mar-2009 rwatson <rwatson@FreeBSD.org> Remove IFF_NEEDSGIANT, a compatibility infrastructure introduced
in FreeBSD 5.x to allow network device drivers to run with Giant
despite the network stack being Giant-free. This significantly
simplifies calls into ioctl() on network interfaces, especially
in the multicast code, as well as eliminates deferred invocation
of interface if_start routines.

Disable the build on device drivers still depending on
IFF_NEEDSGIANT as they no longer compile. They will be removed
in a few weeks if they haven't been made MPSAFE in that time.
Disabled drivers:

if_ar
if_axe
if_aue
if_cdce
if_cue
if_kue
if_ray
if_rue
if_rum
if_sr
if_udav
if_ural
if_zyd

Drivers that were already disabled because of tty changes:

if_ppp
if_sl

Discussed on: arch@
df2be82cecfdcfe4fe66cafe9b35f2eb7121b532 27-Feb-2009 bz <bz@FreeBSD.org> For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
12bbe1869f5926ca7e3457f5424afdca31a1189b 05-Feb-2009 jamie <jamie@FreeBSD.org> Standardize the various prison_foo_ip[46] functions and prison_if to
return zero on success and an error code otherwise. The possible errors
are EADDRNOTAVAIL if an address being checked for doesn't match the
prison, and EAFNOSUPPORT if the prison doesn't have any addresses in
that address family. For most callers of these functions, use the
returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or
EINVAL.

Always include a jailed() check in these functions, where a non-jailed
cred always returns success (and makes no changes). Remove the explicit
jailed() checks that preceded many of the function calls.

Approved by: bz (mentor)
226b2a700eecfdf598bf77f229d3a378d11844b4 31-Jan-2009 bz <bz@FreeBSD.org> Like with r185713 make sure to not leak a lock as rtalloc1(9) returns
a locked route. Thus we have to use RTFREE_LOCKED(9) to get it unlocked
and rtfree(9)d rather than just rtfree(9)d.

Since the PR was filed, new places with the same problem were added
with new code. Also check that the rt is valid before freeing it
either way there.

PR: kern/129793
Submitted by: Dheeraj Reddy <dheeraj@ece.gatech.edu>
MFC after: 2 weeks
Committed from: Bugathon #6
751dff36104851edee18dc55bd333b085a912d68 12-Jan-2009 qingli <qingli@FreeBSD.org> Revive the RTF_LLINFO flag in route.h. The kernel code is guarded
by the new kernel option COMPAT_ROUTE_FLAGS for binary backward
compatibility. The RTF_LLDATA flag maps to the same value as RTF_LLINFO.
RTF_LLDATA is used by the arp and ndp utilities. The RTF_LLDATA flag is
always returned to the userland regardless whether the COMPAT_ROUTE_FLAGS
is defined.
ffd24214075016efd0b3aac50a2a5127600c3a77 09-Jan-2009 bz <bz@FreeBSD.org> Restrict arp, ndp and theoretically the FIB listing (if not
read with libkvm) to the addresses of a prison, when inside a
jail. [1]
As the patch from the PR was pre-'new-arp', add checks to the
llt_dump handlers as well.

While touching RTM_GET in route_output(), consistently use
curthread credentials rather than the creds from the socket
there. [2]

PR: kern/68189
Submitted by: Mark Delany <sxcg2-fuwxj@qmda.emu.st> [1]
Discussed with: rwatson [2]
Reviewed by: rwatson
MFC after: 4 weeks
60c950d4ff05ccd0e88e7333e5b3d64868165843 09-Jan-2009 bz <bz@FreeBSD.org> Make SIOCGIFADDR and related, as well as SIOCGIFADDR_IN6 and related
jail-aware. Up to now we returned the first address of the interface
for SIOCGIFADDR w/o an ifr_addr in the query. This caused problems for
programs querying for an address but running inside a jail, as the
address returned usually did not belong to the jail.
Like for v6, if there was an ifr_addr given on v4, you could probe
for more addresses on the interfaces that you were not allowed to see
from inside a jail. Return an error (EADDRNOTAVAIL) in that case
now unless the address is on the given interface and valid for the
jail.

PR: kern/114325
Reviewed by: rwatson
MFC after: 4 weeks
efe3f87721e5c915985776d2d88cb173737e8258 03-Jan-2009 qingli <qingli@FreeBSD.org> Some modules such as SCTP supplies a valid route entry as an input argument
to ip_output(). The destionation is represented in a sockaddr{} object
that may contain other pieces of information, e.g., port number. This
same destination sockaddr{} object may be passed into L2 code, which
could be used to create a L2 entry. Since there exists a L2 table per
address family, the L2 lookup function can make address family specific
comparison instead of the generic bcmp() operation over the entire
sockaddr{} structure.

Note in the IPv6 case the sin6_scope_id is not compared because the
address is currently stored in the embedded form inside the kernel.
The in6_lltable_lookup() has to account for the scope-id if this
storage format were to change in the future.
1d851edfc076189dc55b168c116ce058e169b3b5 26-Dec-2008 qingli <qingli@FreeBSD.org> This checkin addresses a couple of issues:
1. The "route" command allows route insertion through the interface-direct
option "-iface". During if_attach(), an sockaddr_dl{} entry is created
for the interface and is part of the interface address list. This
sockaddr_dl{} entry describes the interface in detail. The "route"
command selects this entry as the "gateway" object when the "-iface"
option is present. The "arp" and "ndp" commands also interact with the
kernel through the routing socket when adding and removing static L2
entries. The static L2 information is also provided through the
"gateway" object with an AF_LINK family type, similar to what is
provided by the "route" command. In order to differentiate between
these two types of operations, a RTF_LLDATA flag is introduced. This
flag is set by the "arp" and "ndp" commands when issuing the add and
delete commands. This flag is also set in each L2 entry returned by the
kernel. The "arp" and "ndp" command follows a convention where a RTM_GET
is issued first followed by a RTM_ADD/DELETE. This RTM_GET request fills
in the fields for a "rtm" object, which is reinjected into the kernel by
a subsequent RTM_ADD/DELETE command. The entry returend from RTM_GET
is a prefix route, so the RTF_LLDATA flag must be specified when issuing
the RTM_ADD/DELETE messages.

2. Enforce the convention that NET_RT_FLAGS with a 0 w_arg is the
specification for retrieving L2 information. Also optimized the
code logic.

Reviewed by: julian
8f88fc89cbc9435c5588b14d879e730df10dc156 22-Dec-2008 qingli <qingli@FreeBSD.org> Similar to the INET case, do not destroy the nd6 entries for
interface addresses until those addresses are removed. I already
made the patch in INET but forgot to bring the code over for
INET6.
3bfc2293f206bd8d4ae763d541e3355bd3abf983 17-Dec-2008 qingli <qingli@FreeBSD.org> A couple of files were not meant to be committed.
c6a0a000ca142a4c7062f6f2fc0c31b888309b18 17-Dec-2008 qingli <qingli@FreeBSD.org> in6_clsroute() was applied to prefix routes causing some
of them to expire. in6_clsroute() was only applied to
cloned routes that are no longer applicable after the
arp-v2 commit.
54c2e2ce52698c56848b58421ca70373e949d04f 16-Dec-2008 kmacy <kmacy@FreeBSD.org> check return from lla_lookup against NULL not zero
c9eebde165e93fa0717dca30fdfe0c687483b38a 16-Dec-2008 kmacy <kmacy@FreeBSD.org> unlock and destroy an llentry's lock before freeing

Found by: sam
ec826ad5c7f97de814529d3b3bae7950f91d9a5d 15-Dec-2008 qingli <qingli@FreeBSD.org> This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
the last piece of the puzzle, Kip has also been conducting
active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
me maintaining that branch before the svn conversion
604d89458ab94ec81eaefa2d55ef219cba461e31 02-Dec-2008 bz <bz@FreeBSD.org> Rather than using hidden includes (with cicular dependencies),
directly include only the header files needed. This reduces the
unneeded spamming of various headers into lots of files.

For now, this leaves us with very few modules including vnet.h
and thus needing to depend on opt_route.h.

Reviewed by: brooks, gnn, des, zec, imp
Sponsored by: The FreeBSD Foundation
19b6af98ec71398e77874582eb84ec5310c7156f 22-Nov-2008 dfr <dfr@FreeBSD.org> Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64
performance in Xen's HVM mode.
66f807ed8b3634dc73d9f7526c484e43f094c0ee 23-Oct-2008 des <des@FreeBSD.org> Retire the MALLOC and FREE macros. They are an abomination unto style(9).

MFC after: 3 months
cf5320822f93810742e3d4a1ac8202db8482e633 19-Oct-2008 lulf <lulf@FreeBSD.org> - Import the HEAD csup code which is the basis for the cvsmode work.
8797d4caecd5881e312923ee1d07be3de68755dc 02-Oct-2008 zec <zec@FreeBSD.org> Step 1.5 of importing the network stack virtualization infrastructure
from the vimage project, as per plan established at devsummit 08/08:
http://wiki.freebsd.org/Image/Notes200808DevSummit

Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator
macros, and CURVNET_SET() context setting macros, all currently
resolving to NOPs.

Prepare for virtualization of selected SYSCTL objects by introducing a
family of SYSCTL_V_*() macros, currently resolving to their global
counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().

Move selected #defines from sys/sys/vimage.h to newly introduced header
files specific to virtualized subsystems (sys/net/vnet.h,
sys/netinet/vinet.h etc.).

All the changes are verified to have zero functional impact at this
point in time by doing MD5 comparision between pre- and post-change
object files(*).

(*) netipsec/keysock.c did not validate depending on compile time options.

Implemented by: julian, bz, brooks, zec
Reviewed by: julian, bz, brooks, kris, rwatson, ...
Approved by: julian (mentor)
Obtained from: //depot/projects/vimage-commit2/...
X-MFC after: never
Sponsored by: NLnet Foundation, The FreeBSD Foundation
e50cf84d907eb995bd9c9178b99174a82bca91d5 01-Sep-2008 obrien <obrien@FreeBSD.org> MFC: r175162 & r174510: un-__P() & clean up VCS Ids.
514ed29cc16f2eec87fbe25719e68d40730c429d 01-Sep-2008 obrien <obrien@FreeBSD.org> MFC: r174510: Clean up VCS Ids.
9d4691b50a43690c13afaa81e34d87ccdbf8a80b 01-Sep-2008 obrien <obrien@FreeBSD.org> MFC: r175162: un-__P()
03a5241ea0a9d2648fc3b74a2062c7c06a155381 20-Aug-2008 julian <julian@FreeBSD.org> Fix some of the formatting fixes.. It's amazing how some thing stand out
in a commit message.
0592958505e144fa8a1cdff63ecc2e605ac5e407 20-Aug-2008 julian <julian@FreeBSD.org> A bunch of formatting fixes brough to light by, or created by the Vimage commit
a few days ago.
1021d43b569bfc8d2c5544bde2f540fa432b011f 17-Aug-2008 bz <bz@FreeBSD.org> Commit step 1 of the vimage project, (network stack)
virtualization work done by Marko Zec (zec@).

This is the first in a series of commits over the course
of the next few weeks.

Mark all uses of global variables to be virtualized
with a V_ prefix.
Use macros to map them back to their global names for
now, so this is a NOP change only.

We hope to have caught at least 85-90% of what is needed
so we do not invalidate a lot of outstanding patches again.

Obtained from: //depot/projects/vimage-commit2/...
Reviewed by: brooks, des, ed, mav, julian,
jamie, kris, rwatson, zec, ...
(various people I forgot, different versions)
md5 (with a bit of help)
Sponsored by: NLnet Foundation, The FreeBSD Foundation
X-MFC after: never
V_Commit_Message_Reviewed_By: more people than the patch
dc8d54c205784683ec1aae7ecf1f24fe1f6cb2c0 24-Jul-2008 julian <julian@FreeBSD.org> MFC an ABI compatible implementation of Multiple routing tables.
See the commit message for
http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/net/route.c
version 1.129 (svn change # 178888) for more info.

Obtained from: Ironport (Cisco Systems)
051819b84758e212ecd632e9bd6f47e70f37aa3a 05-Jul-2008 rwatson <rwatson@FreeBSD.org> Introduce a new lock, hostname_mtx, and use it to synchronize access
to global hostname and domainname variables. Where necessary, copy
to or from a stack-local buffer before performing copyin() or
copyout(). A few uses, such as in cd9660 and daemon_saver, remain
under-synchronized and will require further updates.

Correct a bug in which a failed copyin() of domainname would leave
domainname potentially corrupted.

MFC after: 3 weeks
1dfc5c98a4f7c32163dfdc61e390ccf805385108 09-May-2008 julian <julian@FreeBSD.org> Add code to allow the system to handle multiple routing tables.
This particular implementation is designed to be fully backwards compatible
and to be MFC-able to 7.x (and 6.x)

Currently the only protocol that can make use of the multiple tables is IPv4
Similar functionality exists in OpenBSD and Linux.

From my notes:

-----

One thing where FreeBSD has been falling behind, and which by chance I
have some time to work on is "policy based routing", which allows
different
packet streams to be routed by more than just the destination address.

Constraints:
------------

I want to make some form of this available in the 6.x tree
(and by extension 7.x) , but FreeBSD in general needs it so I might as
well do it in -current and back port the portions I need.

One of the ways that this can be done is to have the ability to
instantiate multiple kernel routing tables (which I will now
refer to as "Forwarding Information Bases" or "FIBs" for political
correctness reasons). Which FIB a particular packet uses to make
the next hop decision can be decided by a number of mechanisms.
The policies these mechanisms implement are the "Policies" referred
to in "Policy based routing".

One of the constraints I have if I try to back port this work to
6.x is that it must be implemented as a EXTENSION to the existing
ABIs in 6.x so that third party applications do not need to be
recompiled in timespan of the branch.

This first version will not have some of the bells and whistles that
will come with later versions. It will, for example, be limited to 16
tables in the first commit.
Implementation method, Compatible version. (part 1)
-------------------------------
For this reason I have implemented a "sufficient subset" of a
multiple routing table solution in Perforce, and back-ported it
to 6.x. (also in Perforce though not always caught up with what I
have done in -current/P4). The subset allows a number of FIBs
to be defined at compile time (8 is sufficient for my purposes in 6.x)
and implements the changes needed to allow IPV4 to use them. I have not
done the changes for ipv6 simply because I do not need it, and I do not
have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.

Other protocol families are left untouched and should there be
users with proprietary protocol families, they should continue to work
and be oblivious to the existence of the extra FIBs.

To understand how this is done, one must know that the current FIB
code starts everything off with a single dimensional array of
pointers to FIB head structures (One per protocol family), each of
which in turn points to the trie of routes available to that family.

The basic change in the ABI compatible version of the change is to
extent that array to be a 2 dimensional array, so that
instead of protocol family X looking at rt_tables[X] for the
table it needs, it looks at rt_tables[Y][X] when for all
protocol families except ipv4 Y is always 0.
Code that is unaware of the change always just sees the first row
of the table, which of course looks just like the one dimensional
array that existed before.

The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign()
are all maintained, but refer only to the first row of the array,
so that existing callers in proprietary protocols can continue to
do the "right thing".
Some new entry points are added, for the exclusive use of ipv4 code
called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(),
which have an extra argument which refers the code to the correct row.

In addition, there are some new entry points (currently called
rtalloc_fib() and friends) that check the Address family being
looked up and call either rtalloc() (and friends) if the protocol
is not IPv4 forcing the action to row 0 or to the appropriate row
if it IS IPv4 (and that info is available). These are for calling
from code that is not specific to any particular protocol. The way
these are implemented would change in the non ABI preserving code
to be added later.

One feature of the first version of the code is that for ipv4,
the interface routes show up automatically on all the FIBs, so
that no matter what FIB you select you always have the basic
direct attached hosts available to you. (rtinit() does this
automatically).

You CAN delete an interface route from one FIB should you want
to but by default it's there. ARP information is also available
in each FIB. It's assumed that the same machine would have the
same MAC address, regardless of which FIB you are using to get
to it.

This brings us as to how the correct FIB is selected for an outgoing
IPV4 packet.

Firstly, all packets have a FIB associated with them. if nothing
has been done to change it, it will be FIB 0. The FIB is changed
in the following ways.

Packets fall into one of a number of classes.

1/ locally generated packets, coming from a socket/PCB.
Such packets select a FIB from a number associated with the
socket/PCB. This in turn is inherited from the process,
but can be changed by a socket option. The process in turn
inherits it on fork. I have written a utility call setfib
that acts a bit like nice..

setfib -3 ping target.example.com # will use fib 3 for ping.

It is an obvious extension to make it a property of a jail
but I have not done so. It can be achieved by combining the setfib and
jail commands.

2/ packets received on an interface for forwarding.
By default these packets would use table 0,
(or possibly a number settable in a sysctl(not yet)).
but prior to routing the firewall can inspect them (see below).
(possibly in the future you may be able to associate a FIB
with packets received on an interface.. An ifconfig arg, but not yet.)

3/ packets inspected by a packet classifier, which can arbitrarily
associate a fib with it on a packet by packet basis.
A fib assigned to a packet by a packet classifier
(such as ipfw) would over-ride a fib associated by
a more default source. (such as cases 1 or 2).

4/ a tcp listen socket associated with a fib will generate
accept sockets that are associated with that same fib.

5/ Packets generated in response to some other packet (e.g. reset
or icmp packets). These should use the FIB associated with the
packet being reponded to.

6/ Packets generated during encapsulation.
gif, tun and other tunnel interfaces will encapsulate using the FIB
that was in effect withthe proces that set up the tunnel.
thus setfib 1 ifconfig gif0 [tunnel instructions]
will set the fib for the tunnel to use to be fib 1.

Routing messages would be associated with their
process, and thus select one FIB or another.
messages from the kernel would be associated with the fib they
refer to and would only be received by a routing socket associated
with that fib. (not yet implemented)

In addition Netstat has been edited to be able to cope with the
fact that the array is now 2 dimensional. (It looks in system
memory using libkvm (!)). Old versions of netstat see only the first FIB.

In addition two sysctls are added to give:
a) the number of FIBs compiled in (active)
b) the default FIB of the calling process.

Early testing experience:
-------------------------

Basically our (IronPort's) appliance does this functionality already
using ipfw fwd but that method has some drawbacks.

For example,
It can't fully simulate a routing table because it can't influence the
socket's choice of local address when a connect() is done.

Testing during the generating of these changes has been
remarkably smooth so far. Multiple tables have co-existed
with no notable side effects, and packets have been routes
accordingly.

ipfw has grown 2 new keywords:

setfib N ip from anay to any
count ip from any to any fib N

In pf there seems to be a requirement to be able to give symbolic names to the
fibs but I do not have that capacity. I am not sure if it is required.

SCTP has interestingly enough built in support for this, called VRFs
in Cisco parlance. it will be interesting to see how that handles it
when it suddenly actually does something.

Where to next:
--------------------

After committing the ABI compatible version and MFCing it, I'd
like to proceed in a forward direction in -current. this will
result in some roto-tilling in the routing code.

Firstly: the current code's idea of having a separate tree per
protocol family, all of the same format, and pointed to by the
1 dimensional array is a bit silly. Especially when one considers that
there is code that makes assumptions about every protocol having the
same internal structures there. Some protocols don't WANT that
sort of structure. (for example the whole idea of a netmask is foreign
to appletalk). This needs to be made opaque to the external code.

My suggested first change is to add routing method pointers to the
'domain' structure, along with information pointing the data.
instead of having an array of pointers to uniform structures,
there would be an array pointing to the 'domain' structures
for each protocol address domain (protocol family),
and the methods this reached would be called. The methods would have
an argument that gives FIB number, but the protocol would be free
to ignore it.

When the ABI can be changed it raises the possibilty of the
addition of a fib entry into the "struct route". Currently,
the structure contains the sockaddr of the desination, and the resulting
fib entry. To make this work fully, one could add a fib number
so that given an address and a fib, one can find the third element, the
fib entry.

Interaction with the ARP layer/ LL layer would need to be
revisited as well. Qing Li has been working on this already.

This work was sponsored by Ironport Systems/Cisco

Reviewed by: several including rwatson, bz and mlair (parts each)
Obtained from: Ironport systems/Cisco
0a9ea7791b443048bf662a7b6037576d8caf720b 09-Mar-2008 bz <bz@FreeBSD.org> MFC:
1.75 sys/kern/kern_jail.c
1.8 sys/netinet/ip_options.c
1.78 sys/netinet6/in6.c
1.113 sys/netinet6/ip6_output.c
1.41 sys/netinet6/ip6_var.h
1.76 sys/netinet6/raw_ip6.c
1.85 sys/netinet6/udp6_usrreq.c
[ previously MFCed by rwatson 1.18 sys/sys/priv.h belonging to this change ]

Replace the last susers calls in netinet6/ with privilege checks.

Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments that we would like to address later.

Discussed with: rwatson
1c376286e046dbe30549b705bd310d6218ffc824 24-Jan-2008 bz <bz@FreeBSD.org> Replace the last susers calls in netinet6/ with privilege checks.

Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by: rwatson (older version, before addressing his comments)
7eb385c2d8022032727eafe8b8ff36a386939680 08-Jan-2008 obrien <obrien@FreeBSD.org> un-__P()
0d684d927bf42ec9c53b0f921e6aaa42e7236cd7 10-Dec-2007 obrien <obrien@FreeBSD.org> Clean up VCS Ids.
e38fed7fb732760cb72fc9df6cfc8fd3400a1a8e 06-Dec-2007 julian <julian@FreeBSD.org> Remove more dup'd code
MFC After: 1 week
87a49d3e6e7d8ca032cadea086d3c228ee85b345 06-Dec-2007 julian <julian@FreeBSD.org> remove duped code

Reviewed By: gnn
MRC after: 1 week
42fe5e7f836765d5e16a16ae1d90eb6a6e034549 05-Jul-2007 delphij <delphij@FreeBSD.org> Space cleanup

Approved by: re (rwatson)
e6f8b0995d59e493018009921005c7f50759dc53 05-Jul-2007 delphij <delphij@FreeBSD.org> ANSIfy[1] plus some style cleanup nearby.

Discussed with: gnn, rwatson
Submitted by: Karl Sj?dahl - dunceor <dunceor gmail com> [1]
Approved by: re (rwatson)
f3d705311485b0fe3eee55c99457c8d45cf70dc0 10-Jun-2007 jinmei <jinmei@FreeBSD.org> MFC:
fixed memory leak for IPv6 multicast membership information associated
with interface addresses.

Approved by: gnn (mentor)

src/sys/netinet6/in6.c: 1.71
src/sys/netinet6/in6_ifattach.c: 1.36
src/sys/netinet6/in6_var.h: 1.31
6d89652bc0f7d0aaa4494c739108f5adc374a1bc 02-Jun-2007 jinmei <jinmei@FreeBSD.org> fixed memory leak for IPv6 multicast membership information associated
with interface addresses.

Approved by: gnn (mentor)
MFC after: 1 week
ac701ac964be6305982bbc81a67581b4fc708ec2 02-Jun-2007 jinmei <jinmei@FreeBSD.org> simplified the fix in rev. 1.69 by replacing RT_REMREF+RT_UNLOCK with
RTFREE_LOCKED.

Approved by: gnn (mentor)
b780a97c819314fb7693d73ab6abeaace981fc35 25-May-2007 jinmei <jinmei@FreeBSD.org> do not directly call rtfree() to meet an assumption in the callee.
(this fix suppresses a warning message appearing in the boot time on
IPv6-enabled systems)

Approved by: gnn (mentor)
3e83ac665326a6a015b2179fc28157d9fb7433e2 24-Feb-2007 bms <bms@FreeBSD.org> Make IPv6 multicast forwarding dynamically loadable from a GENERIC kernel.
It is built in the same module as IPv4 multicast forwarding, i.e. ip_mroute.ko,
if and only if IPv6 support is enabled for loadable modules.
Export IPv6 forwarding structs to userland netstat(1) via sysctl(9).
2ea10fad249fb6f33048faaa674b5782fd5feaae 16-Dec-2006 bz <bz@FreeBSD.org> In ip6_sprintf print the addresses in a more common/readable
format eliminating leading zeros like in :0001 -> :1.

Reviewed by: mlaier
297206ec2ac5b34686aaf531476b1b737df9bbd7 12-Dec-2006 bz <bz@FreeBSD.org> MFp4: 92972, 98913 + one more change

In ip6_sprintf no longer use and return one of eight static buffers
for printing/logging ipv6 addresses.
The caller now has to hand in a sufficiently large buffer as first
argument.
10d0d9cf473dc5f0ce1bf263ead445ffe7819154 06-Nov-2006 rwatson <rwatson@FreeBSD.org> Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
7db27341e03c3b01c3eeb8999e45320bea24386a 29-Sep-2006 suz <suz@FreeBSD.org> MFC Rev. 1.64
(fixed a bug that IPv6 packets arriving to stf are not accepted)

Approved by: re (kensmith)
b119ccaff8e08d3a4f04f62511870cea84f6d484 22-Sep-2006 suz <suz@FreeBSD.org> fixed a bug that IPv6 packets arriving to stf are not accepted.
(a degrade introduced in in6.c Rev 1.61)

PR: kern/103415
Submitted by: JINMEI Tatuya
MFC after: 1 week
bc6ab54808cf20a40cd7ba44043d40db1ec2e78e 04-Aug-2006 brooks <brooks@FreeBSD.org> With exception of the if_name() macro, all definitions in net_osdep.h
were unused or already in if_var.h so add if_name() to if_var.h and
remove net_osdep.h along with all references to it.

Longer term we may want to kill off if_name() entierly since all modern
BSDs have if_xname variables rendering it unnecessicary.
ba19b1ecd4d6f2950416c0596e5007ae8ffb5360 29-Jun-2006 yar <yar@FreeBSD.org> There is a consensus that ifaddr.ifa_addr should never be NULL,
except in places dealing with ifaddr creation or destruction; and
in such special places incomplete ifaddrs should never be linked
to system-wide data structures. Therefore we can eliminate all the
superfluous checks for "ifa->ifa_addr != NULL" and get ready
to the system crashing honestly instead of masking possible bugs.

Suggested by: glebius, jhb, ru
2443e1854153b6f6f489dd3b346f2cb75e191403 17-Jun-2006 gnn <gnn@FreeBSD.org> MFC 1.61 in6.c
1.65 nd6.c

Fix spurious warnings from neighbor discovery when working with IPv6 over
point to point tunnels (gif).

PR: 93220
Submitted by: Jinmei Tatuya
3b364427e5617217905e74b60e4576f9a1af18a4 08-Jun-2006 gnn <gnn@FreeBSD.org> Fix spurious warnings from neighbor discovery when working with IPv6 over
point to point tunnels (gif).

PR: 93220
Submitted by: Jinmei Tatuya
MFC after: 1 week
05eae11405006420437ca6529029b77c086e14d1 09-Mar-2006 gnn <gnn@FreeBSD.org> Merge in6.c:1.60 from HEAD to RELENG_6

Fix for an inappropriate bzero of the ICMPv6 stats. The code was zero'ing the wrong structure member but setting the correct one.

Submitted by: James dot Juran at baesystems dot com
Approved by: re (scottl)
4d8070a25b5c7585b416e8a3db16fc15f2e3cd13 08-Feb-2006 gnn <gnn@FreeBSD.org> Fix for an inappropriate bzero of the ICMPv6 stats. The code was zero'ing the wrong structure member but setting the correct one.

Submitted by: James dot Juran at baesystems dot com
Reviewed by: gnn
MFC after: 1 week
675445a4d64dfe09f45b571ba229d2e921a48134 25-Dec-2005 suz <suz@FreeBSD.org> MFC: sync with KAME regarding NDP

- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners
- supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt>
- better prefix lifetime management
- more spec-comformant DAD advertisement
- updated RFC/internet-draft revisions
- renamed a macro IPV6_DADOUTPUT to IPV6_UNSPECSRC
- plugged a possible memory leak

share/doc/IMPLEMENTATIONS Rev.1.9
sys/netinet/icmp6.h Rev.1.20
sys/netinet6/icmp6.c Rev.1.69
sys/netinet6/in6.c Rev.1.57,58
sys/netinet6/in6.h Rev.1.41,42
sys/netinet6/in6_ifattach.c Rev.1.31
sys/netinet6/in6_ifattach.h Rev.1.7
sys/netinet6/in6_src.c Rev.1.36
sys/netinet6/in6_var.h Rev.1.27
sys/netinet6/ip6_var.h Rev.1.36
sys/netinet6/ip6_output.c Rev.1.99,100
sys/netinet6/mld6.c Rev.1.24,25
sys/netinet6/mld6_var.h Rev.1.7
sys/netinet6/nd6.c Rev.1.59,61,62
sys/netinet6/nd6.h Rev.1.21
sys/netinet6/nd6_nbr.c Rev.1.34,37,38,39
sys/netinet6/nd6_rtr.c Rev.1.30,31
645f921931c659873126b39b82f83c5035437018 25-Dec-2005 suz <suz@FreeBSD.org> MFC: changes malloc type (M_IPMADDR->M_IP6MADDR, M_IPMOPTS->M_IP6MOPTS, M_MRTABL
E->M_MRTABLE6)

Rev.1.54 in6.c
Rev.1.64 in6_pcb.c
Rev.1.25 in6_var.h
Rev.1.33 ip6_mroute.c
Rev.1.98 ip6_output.c
Rev.1.23 mld6.c
6812ce3e6ea99cfd6401b17894ca7d04fd809f29 25-Dec-2005 suz <suz@FreeBSD.org> MFC the following KAME sync work.
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space

Revision Path
1.67 src/sys/netinet6/icmp6.c
1.55 src/sys/netinet6/in6.c
1.29 src/sys/netinet6/in6_ifattach.c
1.56 src/sys/netinet6/nd6.c
1.35 src/sys/netinet6/nd6_nbr.c
1.29 src/sys/netinet6/nd6_rtr.c
038918b0d7c1edab7f866039faef6042c337edf9 05-Nov-2005 suz <suz@FreeBSD.org> MFC: added an ioctl option in kernel so that ndp/rtadvd can change some NDP-related kernel variables based on their configurations (RFC2461 p.43 6.2.1 mandates this for IPv6 routers)

Revision Changes Path
1.56 +1 -0 src/sys/netinet6/in6.c
1.26 +1 -0 src/sys/netinet6/in6_var.h
1.57 +28 -0 src/sys/netinet6/nd6.c
1.17 +21 -8 src/usr.sbin/ndp/ndp.8
1.17 +31 -2 src/usr.sbin/ndp/ndp.c
1.25 +30 -0 src/usr.sbin/rtadvd/config.c
d87e40fcf561f0cc54c76cc51d30c32297841e21 04-Nov-2005 ume <ume@FreeBSD.org> MFC: scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

sys/net/if_gif.c: 1.53
sys/net/if_spppsubr.c: 1.120
sys/netinet/icmp6.h: 1.19
sys/netinet/ip_carp.c: 1.28,1.29
sys/netinet/ip_fw2.c: 1.107
sys/netinet/tcp_subr.c: 1.230,1.231,1.235
sys/netinet/tcp_usrreq.c: 1.125
sys/netinet6/ah_core.c: 1.26
sys/netinet6/icmp6.c: 1.63,1.64
sys/netinet6/in6.c: 1.52
sys/netinet6/in6.h: 1.38
sys/netinet6/in6_cksum.c: 1.11
sys/netinet6/in6_ifattach.c: 1.27
sys/netinet6/in6_pcb.c: 1.63
sys/netinet6/in6_proto.c: 1.33
sys/netinet6/in6_src.c: 1.31,1.32
sys/netinet6/in6_var.h: 1.22
sys/netinet6/ip6_forward.c: 1.29
sys/netinet6/ip6_input.c: 1.83
sys/netinet6/ip6_mroute.c: 1.30
sys/netinet6/ip6_output.c: 1.95
sys/netinet6/ip6_var.h: 1.33
sys/netinet6/ipsec.c: 1.43
sys/netinet6/mld6.c: 1.21
sys/netinet6/nd6.c: 1.50
sys/netinet6/nd6_nbr.c: 1.30
sys/netinet6/nd6_rtr.c: 1.27
sys/netinet6/raw_ip6.c: 1.54
sys/netinet6/route6.c: 1.12
sys/netinet6/scope6.c: 1.13,1.14,1.15
sys/netinet6/scope6_var.h: 1.5
sys/netinet6/udp6_output.c: 1.23
sys/netinet6/udp6_usrreq.c: 1.55
sys/netkey/key.c: 1.72,1.73
e085cc468b752cc10f8ebb25f04110326688cdf8 01-Nov-2005 suz <suz@FreeBSD.org> MFC 1.59
statically configured IPv6 address is properly added/deleted now

Approved by: re(scottl)
419a678b5ed2726669802447de5ae70e551bdc4c 31-Oct-2005 suz <suz@FreeBSD.org> statically configured IPv6 address is properly added/deleted now

Obtained from: KAME
Reported in: freebsd-net@freebsd
MFC after: 1 day
55b3e47503e40acc7443656e35f538b6f48eb899 22-Oct-2005 suz <suz@FreeBSD.org> fixed a compilation failure on amd64/sparc64/ia64

Submitted by: max
MFC after: 2 month
c2b19f24a4ba01108e047a35a4a060cbfdf28a17 21-Oct-2005 suz <suz@FreeBSD.org> sync with KAME regarding NDP

- introduced fine-grain-timer to manage ND-caches and IPv6 Multicast-Listeners
- supports Router-Preference <draft-ietf-ipv6-router-selection-07.txt>
- better prefix lifetime management
- more spec-comformant DAD advertisement
- updated RFC/internet-draft revisions

Obtained from: KAME
Reviewed by: ume, gnn
MFC after: 2 month
7cb7aed97b0e1c3253828f94db5e8b65429fded3 19-Oct-2005 suz <suz@FreeBSD.org> added an ioctl option in kernel so that ndp/rtadvd can change some NDP-related kernel variables based on their configurations (RFC2461 p.43 6.2.1 mandates this for IPv6 routers)

Obtained from: KAME
Reviewd by: ume, gnn
MFC after: 2 weeks
21f42e535fa6a4e118a72f5e8130dfff7456036a 19-Oct-2005 suz <suz@FreeBSD.org> sync with KAME in the following points:
- fixed typos
- improved some comment descriptions
- use NULL, instead of 0, to denote a NULL pointer
- avoid embedding a magic number in the code
- use nd6log() instead of log() to record NDP-specific logs
- nuked an unnecessay white space

Obtained from: KAME
MFC after: 1 day
c532dfe7456f7cfd69cc1875646ebbafd781ae5c 07-Sep-2005 obrien <obrien@FreeBSD.org> IPv6 was improperly defining its malloc type the same as IPv4 (M_IPMADDR,
M_IPMOPTS, M_MRTABLE). Thus we had conflicting instantiations.
Create an IPv6-specific type to overcome this.
e13b2df85475abdf709032bd3aeae292e5e8c579 25-Aug-2005 rwatson <rwatson@FreeBSD.org> Merge linux_ioctl.c:1.128 svr4_sockio.c:1.17 altq_cbq.c:1.3 if_oltr.c:1.38
if_pflog.c:1.14 if_pfsync.c:1.21 if_an.c:1.70 if_ar.c:1.72 if_arl.c:1.11
amrr.c:1.10 onoe.c:1.10 if_ath.c:1.101 awi.c:1.41 if_bfe.c:1.27
if_bge.c:1.93 if_cm_isa.c:1.7 smc90cx6.c:1.16 if_cnw.c:1.20 if_cp.c:1.25
if_cs.c:1.42 if_ct.c:1.26 if_cx.c:1.46 if_ed.c:1.256 if_em.c:1.68
if_en_pci.c:1.37 midway.c:1.66 if_ep.c:1.143 if_ex.c:1.58 if_fatm.c:1.20
if_fe.c:1.93 if_fwe.c:1.38 if_fwip.c:1.8 if_fxp.c:1.244 if_gem.c:1.33
if_hatm.c:1.25 if_hatm_intr.c:1.20 if_hatm_ioctl.c:1.13 if_hatm_rx.c:1.10
if_hatm_tx.c:1.14 if_hme.c:1.39 if_ie.c:1.104 if_ndis.c:1.101
if_ic.c:1.24 if_ipw.c:1.10 if_iwi.c:1.10 if_ixgb.c:1.13 if_lge.c:1.41
if_lnc.c:1.113 if_my.c:1.31 if_nge.c:1.77 if_nve.c:1.10 if_owi.c:1.12
if_patm.c:1.9 if_patm_intr.c:1.6 if_patm_ioctl.c:1.10 if_patm_tx.c:1.10
pdq_ifsubr.c:1.28 if_plip.c:1.38 if_ral.c:1.12 if_ral_pci.c:1.2
if_ray.c:1.81 if_rayvar.h:1.22 if_re.c:1.49 if_sbni.c:1.21 if_sbsh.c:1.14
if_sn.c:1.48 dp83932.c:1.21 if_snc_pccard.c:1.9 if_sr.c:1.70 if_tx.c:1.91
if_txp.c:1.33 if_aue.c:1.92 if_axe.c:1.32 if_cdce.c:1.8 if_cue.c:1.59
if_kue.c:1.66 if_rue.c:1.23 if_udav.c:1.16 if_ural.c:1.12 if_vge.c:1.16
if_vx.c:1.58 if_wi.c:1.185 if_wi_pci.c:1.26 if_wl.c:1.68 if_xe.c:1.60
if_xe_pccard.c:1.30 if_el.c:1.68 i4b_ipr.c:1.35 i4b_isppp.c:1.31
kern_poll.c:1.20 bridge.c:1.94 bridgestp.c:1.4 if_arcsubr.c:1.27
if_atm.h:1.24 if_atmsubr.c:1.40 if_bridge.c:1.16 if_ef.c:1.35
if_ethersubr.c:1.196 if_faith.c:1.37 if_fddisubr.c:1.100 if_fwsubr.c:1.14
if_gif.c:1.54 if_gre.c:1.34 if_iso88025subr.c:1.70 if_loop.c:1.107
if_ppp.c:1.106 if_spppsubr.c:1.121 if_tap.c:1.57 if_tun.c:1.154
if_vlan.c:1.80 ppp_tty.c:1.67 ieee80211_ioctl.c:1.32 atm_if.c:1.31
ng_eiface.c:1.33 ng_ether.c:1.50 ng_fec.c:1.19 ng_iface.c:1.44
ng_sppp.c:1.9 ip_carp.c:1.30 ip_fastfwd.c:1.30 in6.c:1.53 nd6_nbr.c:1.31
natm.c:1.40 if_dc.c:1.162 if_de.c:1.168 if_pcn.c:1.72 if_rl.c:1.154
if_sf.c:1.84 if_sis.c:1.135 if_sk.c:1.108 if_ste.c:1.86 if_ti.c:1.109
if_tl.c:1.101 if_vr.c:1.106 if_wb.c:1.81 if_xl.c:1.194 from HEAD to
RELENG_6:

Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz

Approved by: re (scottl)
5d770a09e85126b8b3e9fe302c36350a90210cbe 09-Aug-2005 rwatson <rwatson@FreeBSD.org> Propagate rename of IFF_OACTIVE and IFF_RUNNING to IFF_DRV_OACTIVE and
IFF_DRV_RUNNING, as well as the move from ifnet.if_flags to
ifnet.if_drv_flags. Device drivers are now responsible for
synchronizing access to these flags, as they are in if_drv_flags. This
helps prevent races between the network stack and device driver in
maintaining the interface flags field.

Many __FreeBSD__ and __FreeBSD_version checks maintained and continued;
some less so.

Reviewed by: pjd, bz
MFC after: 7 days
da2cf62b280b8450d5f8e0d810e810cdcc59a8c0 25-Jul-2005 ume <ume@FreeBSD.org> scope cleanup. with this change
- most of the kernel code will not care about the actual encoding of
scope zone IDs and won't touch "s6_addr16[1]" directly.
- similarly, most of the kernel code will not care about link-local
scoped addresses as a special case.
- scope boundary check will be stricter. For example, the current
*BSD code allows a packet with src=::1 and dst=(some global IPv6
address) to be sent outside of the node, if the application do:
s = socket(AF_INET6);
bind(s, "::1");
sendto(s, some_global_IPv6_addr);
This is clearly wrong, since ::1 is only meaningful within a single
node, but the current implementation of the *BSD kernel cannot
reject this attempt.

Submitted by: JINMEI Tatuya <jinmei__at__isl.rdc.toshiba.co.jp>
Obtained from: KAME
7de9a3957f49a3d2defeb896784a44358d279dd0 02-Jun-2005 iedowse <iedowse@FreeBSD.org> Use IFF_LOCKGIANT/IFF_UNLOCKGIANT around calls to the interface
if_ioctl routine. This should fix a number of code paths through
soo_ioctl() that could call into Giant-locked network drivers without
first acquiring Giant.
e1d22638d0a8257ed01b7f95d1b6d5cef74ebd07 22-Feb-2005 glebius <glebius@FreeBSD.org> Add CARP (Common Address Redundancy Protocol), which allows multiple
hosts to share an IP address, providing high availability and load
balancing.

Original work on CARP done by Michael Shalayeff, with many
additions by Marco Pfatschbacher and Ryan McBride.

FreeBSD port done solely by Max Laier.

Patch by: mlaier
Obtained from: OpenBSD (mickey, mcbride)
2b54eeafaedc0507e064739f8fb8e239948c373c 07-Jan-2005 imp <imp@FreeBSD.org> /* -> /*- for license, minor formatting changes, separate for KAME
b1d9338b73aee1198b2b79536f98fad04e7b33c7 23-Aug-2004 rwatson <rwatson@FreeBSD.org> Remove in6_prefix.[ch] and the contained router renumbering capability.
The prefix management code currently resides in nd6, leaving only the
unused router renumbering capability in the in6_prefix files. Removing
it will make it easier for us to provide locking for the remainder of
IPv6 by reducing the number of objects requiring synchronized access.

This functionality has also been removed from NetBSD and OpenBSD.

Submitted by: George Neville-Neil <gnn at neville-neil.com>
Discussed with/approved by: suz, keiichi at kame.net, core at kame.net
b49b7fe7994689a25dfc2162fe02f1d030360089 07-Apr-2004 imp <imp@FreeBSD.org> Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson
996df05b7818a9d71c34cee0968322c76d8a0ff2 04-Mar-2004 ume <ume@FreeBSD.org> move in6_addmulti()/in6_delmulti() into mld6.c

Obtained from: KAME
450e7b33b66881879fbf58001e0459335a249899 04-Mar-2004 ume <ume@FreeBSD.org> missing splx().

Obtained from: KAME
MFC after: 3 days
b71d3614683c573bf04af9d83bfb05b15183fceb 03-Mar-2004 ume <ume@FreeBSD.org> - stlye and comments
- variable name change (scopeid -> zoneid)
- u_short -> u_int16_t, u_char -> u_int8_t

Obtained from: KAME
d937176b3481cd970c24fba3eddbe098a1fe564f 26-Feb-2004 mlaier <mlaier@FreeBSD.org> Bring eventhandler callbacks for pf.
This enables pf to track dynamic address changes on interfaces (dailup) with
the "on (<ifname>)"-syntax. This also brings hooks in anticipation of
tracking cloned interfaces, which will be in future versions of pf.

Approved by: bms(mentor)
9678edca4faf1b61c1895429426ffb0eb7dcf14a 24-Feb-2004 cperciva <cperciva@FreeBSD.org> Fix array overflow: If len=128, don't access [16] of a 16-byte IPv6
address, even if we subsequently ignore its value by applying a >>8
to it.

Reported by: "Ted Unangst" <tedu@coverity.com>
Approved by: rwatson (mentor), {ume, suz} (KAME)
c757933596d28b29d15291cb92520d1100cec3af 10-Jan-2004 ume <ume@FreeBSD.org> try rtinit() only when the route is not installed.
this allows, e.g., duplicated attempts of 'ifconfig lo0 ::1'
like for IPv4.

Obtained from: KAME
MFC after: 1 week
c997776d7c832608d60560c380ff43549d2dbe3a 08-Nov-2003 sam <sam@FreeBSD.org> replace explicit changes to rt_refcnt by RT_ADDREF and RT_REMREF
macros that expand to include assertions when the system is built
with INVARIANTS

Supported by: FreeBSD Foundation
d2486e878b8753b281c2dbc6a91c6652e643d840 05-Nov-2003 ume <ume@FreeBSD.org> byebye in6_ifawithscope(). it was a function for old source
address selection.

Obtained from: KAME
6c2d58b0b2656d0b4dcdaeddcff7c47cec968774 04-Nov-2003 ume <ume@FreeBSD.org> use nd6log().

Obtained from: KAME
f06677c31d30ff047ed2e60d7ac2736e110b6a6c 30-Oct-2003 ume <ume@FreeBSD.org> add management part of address selection policy described in
RFC3484.

Obtained from: KAME
f188fceee73166a2c7dc46b964dffa3eb6f8de97 29-Oct-2003 sam <sam@FreeBSD.org> correct LOR by using a local variable to hold result
instead of holding a lock while calling out of view

Supported by: FreeBSD Foundation
5199c863f8f9e7f134710595dee582ad037bffb6 21-Oct-2003 ume <ume@FreeBSD.org> - change scope to zone.
- change node-local to interface-local.
- better error handling of address-to-scope mapping.
- use in6_clearscope().

Obtained from: KAME
1bfb4986099befab26dc0c1e40e47e89f92f62fb 20-Oct-2003 ume <ume@FreeBSD.org> correct linkmtu handling.

Obtained from: KAME
31759c05252abfcf6f62fa446d979c6c1d38485b 17-Oct-2003 ume <ume@FreeBSD.org> nuke duplicate function and unused function.

Obtained from: KAME
89eb79f30badbc3ba88323c9af00988329c4e051 17-Oct-2003 ume <ume@FreeBSD.org> revert wrongly dropped null check by previous commit.
babf2c3ec01f429fc11fe95261ac8db6488c3788 17-Oct-2003 ume <ume@FreeBSD.org> - add dom_if{attach,detach} framework.
- transition to use ifp->if_afdata.

Obtained from: KAME
a72f1bdb767fa08d3ce42494c037364f31421fb8 10-Oct-2003 ume <ume@FreeBSD.org> nuke SCOPEDROUTING. Though it was there for a long time,
it was never enabled.
399a4e7221768809ef6b40116b578c0cced268a9 07-Oct-2003 ume <ume@FreeBSD.org> - fix typo in comment.
- style.

Obtained from: KAME
6c1377b9efb980f7722b089efd455c1362419b76 06-Oct-2003 ume <ume@FreeBSD.org> return(code) -> return (code)
(reduce diffs against KAME)
9d93fce265aeeeb266999d5092d6d4224cc16829 04-Oct-2003 sam <sam@FreeBSD.org> Locking for updates to routing table entries. Each rtentry gets a mutex
that covers updates to the contents. Note this is separate from holding
a reference and/or locking the routing table itself.

Other/related changes:

o rtredirect loses the final parameter by which an rtentry reference
may be returned; this was never used and added unwarranted complexity
for locking.
o minor style cleanups to routing code (e.g. ansi-fy function decls)
o remove the logic to bump the refcnt on the parent of cloned routes,
we assume the parent will remain as long as the clone; doing this avoids
a circularity in locking during delete
o convert some timeouts to MPSAFE callouts

Notes:

1. rt_mtx in struct rtentry is guarded by #ifdef _KERNEL as user-level
applications cannot/do-no know about mutex's. Doing this requires
that the mutex be the last element in the structure. A better solution
is to introduce an externalized version of struct rtentry but this is
a major task because of the intertwining of rtentry and other data
structures that are visible to user applications.
2. There are known LOR's that are expected to go away with forthcoming
work to eliminate many held references. If not these will be resolved
prior to release.
3. ATM changes are untested.

Sponsored by: FreeBSD Foundation
Obtained from: BSD/OS (partly)
cf874b345d0f766fb64cf4737e1c85ccc78d2bee 19-Feb-2003 imp <imp@FreeBSD.org> Back out M_* changes, per decision of the TRB.

Approved by: trb
bf8e8a6e8f0bd9165109f0a258730dd242299815 21-Jan-2003 alfred <alfred@FreeBSD.org> Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
2719b5ea8cf3644f875ec6613f32325ff39ad4fe 25-Dec-2002 ru <ru@FreeBSD.org> If the caller of rtrequest*(RTM_DELETE, ...) asked for a copy of
the entry being removed (ret_nrt != NULL), increment the entry's
rt_refcnt like we do it for RTM_ADD and RTM_RESOLVE, rather than
messing around with 1->0 transitions for rtfree() all over.
82e1e3bab0d3abe1018a0b56559c154485f2f676 22-Dec-2002 hsu <hsu@FreeBSD.org> SMP locking for ifnet list.
c3153934cb24d911042c92eedf9e5dd6d7be07e1 18-Dec-2002 hsu <hsu@FreeBSD.org> Lock up ifaddr reference counts.
553226e8e16639b00d61d81e0125330dbfb7eed8 19-Apr-2002 suz <suz@FreeBSD.org> just merged cosmetic changes from KAME to ease sync between KAME and FreeBSD.
(based on freebsd4-snap-20020128)

Reviewed by: ume
MFC after: 1 week
dc2e474f79c1287592679cd5e0c4c2307feccd60 01-Apr-2002 jhb <jhb@FreeBSD.org> Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@
9da46874816c62804f43960181ead4438296d499 27-Feb-2002 alfred <alfred@FreeBSD.org> Fix warnings caused by discarding const.

Hairy Eyeball At: peter
74063dd723dfad807cddf4ebc4e8bec0a0400b08 25-Sep-2001 brooks <brooks@FreeBSD.org> Make faith loadable, unloadable, and clonable.
5596676e6c6c1e81e899cd0531f9b1c28a292669 12-Sep-2001 julian <julian@FreeBSD.org> KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha
f729fe0a4a07f77cf2a60a88614a01b6bd649256 06-Sep-2001 jlemon <jlemon@FreeBSD.org> Wrap array accesses in macros, which also happen to be lvalues:

ifnet_addrs[i - 1] -> ifaddr_byindex(i)
ifindex2ifnet[i] -> ifnet_byindex(i)

This is intended to ease the conversion to SMPng.
f62663bb069962df0f18083699b88c314bc77bf4 03-Aug-2001 ume <ume@FreeBSD.org> When global anycast address was assigned to lo0, wrong source
address was selected.

Reported by: Shingo WATANABE <nabe@nabechan.org>
Submitted by: JINMEI Tatuya <jinmei@isl.rdc.toshiba.co.jp>
MFC after: 3 days
e5aac591c67fbbf63002ece46525a1d7c5b1c90f 15-Jul-2001 ume <ume@FreeBSD.org> do not M_WAITOK in in6_update_ifa(), since this function can be called
under splnet(). (some comment was added by KAME)

PR: 28927
MFC after: 1 week
e7b9bc714f516674681231706c48e6aaa41c1d52 02-Jul-2001 brooks <brooks@FreeBSD.org> gif(4) and stf(4) modernization:

- Remove gif dependencies from stf.
- Make gif and stf into modules
- Make gif cloneable.

PR: kern/27983
Reviewed by: ru, ume
Obtained from: NetBSD
MFC after: 1 week
832f8d224926758a9ae0b23a6b45353e44fbc87a 11-Jun-2001 ume <ume@FreeBSD.org> Sync with recent KAME.
This work was based on kame-20010528-freebsd43-snap.tgz and some
critical problem after the snap was out were fixed.
There are many many changes since last KAME merge.

TODO:
- The definitions of SADB_* in sys/net/pfkeyv2.h are still different
from RFC2407/IANA assignment because of binary compatibility
issue. It should be fixed under 5-CURRENT.
- ip6po_m member of struct ip6_pktopts is no longer used. But, it
is still there because of binary compatibility issue. It should
be removed under 5-CURRENT.

Reviewed by: itojun
Obtained from: KAME
MFC after: 3 weeks
32557560ce757249be86798636bfb4e91b3b05c4 18-Jan-2001 itojun <itojun@FreeBSD.org> workaround; be sure to initialize nd6 interface information when IPv6
interface address gets added. this will avoid presenting EMSGSIZE when
outgoing interface is down (and never brought up).

sync with kame.
3b0629faf732df6cf8dd2a9f9f7e5125f988e278 12-Jul-2000 itojun <itojun@FreeBSD.org> correct rtentry reference count in in6_ifloop_request().
if you reconfigure inet6 too much, the reference count can go
into negative by mistake. KAME in6.c 1.98 -> 1.99.
6b46aa142473cd502bcfb82a2a66baf44fcf4d31 07-Jul-2000 grog <grog@FreeBSD.org> Suppress a warning message about trigraphs.

Approved-by: itojun
5f4e854de19331a53788d6100bbcd42845056bc1 04-Jul-2000 itojun <itojun@FreeBSD.org> sync with kame tree as of july00. tons of bug fixes/improvements.

API changes:
- additional IPv6 ioctls
- IPsec PF_KEY API was changed, it is mandatory to upgrade setkey(8).
(also syntax change)
b42951578188c5aab5c9f8cbcde4a743f8092cdc 02-Apr-2000 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'ALSA'.
f70608d097453462b734b50effa7e049dd9c2bae 07-Feb-2000 shin <shin@FreeBSD.org> Permit site local addr in IPv6 source address selection rule.

KAME source addr selection rule had a problem to treat IPv6 site
local addr.
The rule is completely rewritten recently and the above problem
is also fixed, but rewriting same code part in freebsd4.0 is too
dangerous in this stage, so just add workaround to avoid
the problem. Just add code for IPv6 site local addresses into IPv6
source addr selection algorythm part.
2ef83ec04a5ee12ee2b1960b1bee4d9ee87223f1 27-Jan-2000 shin <shin@FreeBSD.org> Added ip6_forwarding check when prefix related ioctl is called.
(prefix related ioctl should only be called on router,
because host use dynamic address and prefix configuration mechanism,
and those prefix are managed separately with ones whih are assined
manually.)
8813e718dc87a6dcf42bd2743686c7a74df222ca 13-Jan-2000 shin <shin@FreeBSD.org> Change struct sockaddr_storage member name, because following change
is very likely to become consensus as recent ietf/ipng mailing list
discussion. Also recent KAME repository and other KAME patched BSDs
also applied it.

s/__ss_family/ss_family/
s/__ss_len/ss_len/

Makeworld is confirmed, and no application should be affected by this change
yet.
96ab44233fdd2ba4bab636d3901703530a2085ed 03-Jan-2000 shin <shin@FreeBSD.org> prevent kernel panic at suspend/resume.

confirmed by: sanpei, joe

PR: kern/15742
70f0bdf6818a73c858bc47a23afc1e9d7c56d716 07-Dec-1999 shin <shin@FreeBSD.org> udp IPv6 support, IPv6/IPv4 tunneling support in kernel,
packet divert at kernel for IPv6/IPv4 translater daemon

This includes queue related patch submitted by jburkhol@home.com.

Submitted by: queue related patch from jburkhol@home.com
Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project
9298d6c8c5d97e7f35ba25b4eb81fb2231002e86 30-Nov-1999 shin <shin@FreeBSD.org> Just to avoid warning message about trigraph.

Commented by: green
cad2014b2749528351ec5180e88a5929efebbfc4 22-Nov-1999 shin <shin@FreeBSD.org> KAME netinet6 basic part(no IPsec,no V6 Multicast Forwarding, no UDP/TCP
for IPv6 yet)

With this patch, you can assigne IPv6 addr automatically, and can reply to
IPv6 ping.

Reviewed by: freebsd-arch, cvs-committers
Obtained from: KAME project