1d04ccbb3ScarlsonjCDDL HEADER START 2d04ccbb3Scarlsonj 3d04ccbb3ScarlsonjThe contents of this file are subject to the terms of the 4d04ccbb3ScarlsonjCommon Development and Distribution License (the "License"). 5d04ccbb3ScarlsonjYou may not use this file except in compliance with the License. 6d04ccbb3Scarlsonj 7d04ccbb3ScarlsonjYou can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 8d04ccbb3Scarlsonjor http://www.opensolaris.org/os/licensing. 9d04ccbb3ScarlsonjSee the License for the specific language governing permissions 10d04ccbb3Scarlsonjand limitations under the License. 11d04ccbb3Scarlsonj 12d04ccbb3ScarlsonjWhen distributing Covered Code, include this CDDL HEADER in each 13d04ccbb3Scarlsonjfile and include the License file at usr/src/OPENSOLARIS.LICENSE. 14d04ccbb3ScarlsonjIf applicable, add the following below this CDDL HEADER, with the 15d04ccbb3Scarlsonjfields enclosed by brackets "[]" replaced with your own identifying 16d04ccbb3Scarlsonjinformation: Portions Copyright [yyyy] [name of copyright owner] 17d04ccbb3Scarlsonj 18d04ccbb3ScarlsonjCDDL HEADER END 19d04ccbb3Scarlsonj 20d04ccbb3ScarlsonjCopyright 2007 Sun Microsystems, Inc. All rights reserved. 21d04ccbb3ScarlsonjUse is subject to license terms. 22d04ccbb3Scarlsonj 23e704a8f2Smeem 24e704a8f2Smeem** PLEASE NOTE: 25e704a8f2Smeem** 26e704a8f2Smeem** This document discusses aspects of the DHCPv4 client design that have 27e704a8f2Smeem** since changed (e.g., DLPI is no longer used). However, since those 28e704a8f2Smeem** aspects affected the DHCPv6 design, the discussion has been left for 29e704a8f2Smeem** historical record. 30e704a8f2Smeem 31e704a8f2Smeem 32d04ccbb3ScarlsonjDHCPv6 Client Low-Level Design 33d04ccbb3Scarlsonj 34d04ccbb3ScarlsonjIntroduction 35d04ccbb3Scarlsonj 36d04ccbb3Scarlsonj This project adds DHCPv6 client-side (not server) support to 37d04ccbb3Scarlsonj Solaris. Future projects may add server-side support as well as 38d04ccbb3Scarlsonj enhance the basic capabilities added here. These future projects 39d04ccbb3Scarlsonj are not discussed in detail in this document. 40d04ccbb3Scarlsonj 41d04ccbb3Scarlsonj This document assumes that the reader is familiar with the following 42d04ccbb3Scarlsonj other documents: 43d04ccbb3Scarlsonj 44d04ccbb3Scarlsonj - RFC 3315: the primary description of DHCPv6 45d04ccbb3Scarlsonj - RFCs 2131 and 2132: IPv4 DHCP 46d04ccbb3Scarlsonj - RFCs 2461 and 2462: IPv6 NDP and stateless autoconfiguration 47d04ccbb3Scarlsonj - RFC 3484: IPv6 default address selection 48*bbf21555SRichard Lowe - ifconfig(8): Solaris IP interface configuration 49*bbf21555SRichard Lowe - in.ndpd(8): Solaris IPv6 Neighbor and Router Discovery daemon 50*bbf21555SRichard Lowe - dhcpagent(8): Solaris DHCP client 51d04ccbb3Scarlsonj - dhcpinfo(1): Solaris DHCP parameter utility 52*bbf21555SRichard Lowe - ndpd.conf(5): in.ndpd configuration file 53*bbf21555SRichard Lowe - netstat(8): Solaris network status utility 54*bbf21555SRichard Lowe - snoop(8): Solaris network packet capture and inspection 55d04ccbb3Scarlsonj - "DHCPv6 Client High-Level Design" 56d04ccbb3Scarlsonj 57d04ccbb3Scarlsonj Several terms from those documents (such as the DHCPv6 IA_NA and 58d04ccbb3Scarlsonj IAADDR options) are used without further explanation in this 59d04ccbb3Scarlsonj document; see the reference documents above for details. 60d04ccbb3Scarlsonj 61d04ccbb3Scarlsonj The overall plan is to enhance the existing Solaris dhcpagent so 62d04ccbb3Scarlsonj that it is able to process DHCPv6. It would also have been possible 63d04ccbb3Scarlsonj to create a new, separate daemon process for this, or to integrate 64d04ccbb3Scarlsonj the feature into in.ndpd. These alternatives, and the reason for 65d04ccbb3Scarlsonj the chosen design, are discussed in Appendix A. 66d04ccbb3Scarlsonj 67d04ccbb3Scarlsonj This document discusses the internal design issues involved in the 68d04ccbb3Scarlsonj protocol implementation, and with the associated components (such as 69d04ccbb3Scarlsonj in.ndpd, snoop, and the kernel's source address selection 70d04ccbb3Scarlsonj algorithm). It does not discuss the details of the protocol itself, 71d04ccbb3Scarlsonj which are more than adequately described in the RFC, nor the 72d04ccbb3Scarlsonj individual lines of code, which will be in the code review. 73d04ccbb3Scarlsonj 74d04ccbb3Scarlsonj As a cross-reference, Appendix B has a summary of the components 75d04ccbb3Scarlsonj involved and the changes to each. 76d04ccbb3Scarlsonj 77d04ccbb3Scarlsonj 78d04ccbb3ScarlsonjBackground 79d04ccbb3Scarlsonj 80d04ccbb3Scarlsonj In order to discuss the design changes for DHCPv6, it's necessary 81d04ccbb3Scarlsonj first to talk about the current IPv4-only design, and the 82d04ccbb3Scarlsonj assumptions built into that design. 83d04ccbb3Scarlsonj 84d04ccbb3Scarlsonj The main data structure used in dhcpagent is the 'struct ifslist'. 85d04ccbb3Scarlsonj Each instance of this structure represents a Solaris logical IP 86d04ccbb3Scarlsonj interface under DHCP's control. It also represents the shared state 87d04ccbb3Scarlsonj with the DHCP server that granted the address, the address itself, 88d04ccbb3Scarlsonj and copies of the negotiated options. 89d04ccbb3Scarlsonj 90d04ccbb3Scarlsonj There is one list in dhcpagent containing all of the IP interfaces 91d04ccbb3Scarlsonj that are under DHCP control. IP interfaces not under DHCP control 92d04ccbb3Scarlsonj (for example, those that are statically addressed) are not included 93d04ccbb3Scarlsonj in this list, even when plumbed on the system. These ifslist 94d04ccbb3Scarlsonj entries are chained like this: 95d04ccbb3Scarlsonj 96d04ccbb3Scarlsonj ifsheadp -> ifslist -> ifslist -> ifslist -> NULL 97d04ccbb3Scarlsonj net0 net0:1 net1 98d04ccbb3Scarlsonj 99d04ccbb3Scarlsonj Each ifslist entry contains the address, mask, lease information, 100d04ccbb3Scarlsonj interface name, hardware information, packets, protocol state, and 101d04ccbb3Scarlsonj timers. The name of the logical IP interface under DHCP's control 102d04ccbb3Scarlsonj is also the name used in the administrative interfaces (dhcpinfo, 103d04ccbb3Scarlsonj ifconfig) and when logging events. 104d04ccbb3Scarlsonj 105d04ccbb3Scarlsonj Each entry holds open a DLPI stream and two sockets. The DLPI 106d04ccbb3Scarlsonj stream is nulled-out with a filter when not in use, but still 107d04ccbb3Scarlsonj consumes system resources. (Most significantly, it causes data 108d04ccbb3Scarlsonj copies in the driver layer that end up sapping performance.) 109d04ccbb3Scarlsonj 110d04ccbb3Scarlsonj The entry storage is managed by a insert/hold/release/remove model 111d04ccbb3Scarlsonj and reference counts. In this model, insert_ifs() allocates a new 112d04ccbb3Scarlsonj ifslist entry and inserts it into the global list, with the global 113d04ccbb3Scarlsonj list holding a reference. remove_ifs() removes it from the global 114d04ccbb3Scarlsonj list and drops that reference. hold_ifs() and release_ifs() are 115d04ccbb3Scarlsonj used by data structures that refer to ifslist entries, such as timer 116d04ccbb3Scarlsonj entries, to make sure that the ifslist entry isn't freed until the 117d04ccbb3Scarlsonj timer has been dispatched or deleted. 118d04ccbb3Scarlsonj 119d04ccbb3Scarlsonj The design is single-threaded, so code that walks the global list 120d04ccbb3Scarlsonj needn't bother taking holds on the ifslist structure. Only 121d04ccbb3Scarlsonj references that may be used at a different time (i.e., pointers 122d04ccbb3Scarlsonj stored in other data structures) need to be recorded. 123d04ccbb3Scarlsonj 124d04ccbb3Scarlsonj Packets are handled using PKT (struct dhcp; <netinet/dhcp.h>), 125d04ccbb3Scarlsonj PKT_LIST (struct dhcp_list; <dhcp_impl.h>), and dhcp_pkt_t (struct 126d04ccbb3Scarlsonj dhcp_pkt; "packet.h"). PKT is just the RFC 2131 DHCP packet 127d04ccbb3Scarlsonj structure, and has no additional information, such as packet length. 128d04ccbb3Scarlsonj PKT_LIST contains a PKT pointer, length, decoded option arrays, and 129d04ccbb3Scarlsonj linkage for putting the packet in a list. Finally, dhcp_pkt_t has a 130d04ccbb3Scarlsonj PKT pointer and length values suitable for modifying the packet. 131d04ccbb3Scarlsonj 132d04ccbb3Scarlsonj Essentially, PKT_LIST is a wrapper for received packets, and 133d04ccbb3Scarlsonj dhcp_pkt_t is a wrapper for packets to be sent. 134d04ccbb3Scarlsonj 135d04ccbb3Scarlsonj The basic PKT structure is used in dhcpagent, inetboot, in.dhcpd, 136bf5d9f18SAndy Fiddaman libdhcpagent, libdhcputil, and others. PKT_LIST is used 137d04ccbb3Scarlsonj in a similar set of places, including the kernel NFS modules. 138d04ccbb3Scarlsonj dhcp_pkt_t is (as the header file implies) limited to dhcpagent. 139d04ccbb3Scarlsonj 140d04ccbb3Scarlsonj In addition to these structures, dhcpagent maintains a set of 141d04ccbb3Scarlsonj internal supporting abstractions. Two key ones involved in this 142d04ccbb3Scarlsonj project are the "async operation" and the "IPC action." An async 143d04ccbb3Scarlsonj operation encapsulates the actions needed for a given operation, so 144d04ccbb3Scarlsonj that if cancellation is needed, there's a single point where the 145d04ccbb3Scarlsonj associated resources can be freed. An IPC action represents the 146d04ccbb3Scarlsonj user state related to the private interface used by ifconfig. 147d04ccbb3Scarlsonj 148d04ccbb3Scarlsonj 149d04ccbb3ScarlsonjDHCPv6 Inherent Differences 150d04ccbb3Scarlsonj 151d04ccbb3Scarlsonj DHCPv6 naturally has some commonality with IPv4 DHCP, but also has 152d04ccbb3Scarlsonj some significant differences. 153d04ccbb3Scarlsonj 154d04ccbb3Scarlsonj Unlike IPv4 DHCP, DHCPv6 relies on link-local IP addresses to do its 155d04ccbb3Scarlsonj work. This means that, on Solaris, the client doesn't need DLPI to 156d04ccbb3Scarlsonj perform any of the I/O; regular IP sockets will do the job. It also 157d04ccbb3Scarlsonj means that, unlike IPv4 DHCP, DHCPv6 does not need to obtain a lease 158d04ccbb3Scarlsonj for the address used in its messages to the server. The system 159d04ccbb3Scarlsonj provides the address automatically. 160d04ccbb3Scarlsonj 161d04ccbb3Scarlsonj IPv4 DHCP expects some messages from the server to be broadcast. 162d04ccbb3Scarlsonj DHCPv6 has no such mechanism; all messages from the server to the 163d04ccbb3Scarlsonj client are unicast. In the case where the client and server aren't 164d04ccbb3Scarlsonj on the same subnet, a relay agent is used to get the unicast replies 165d04ccbb3Scarlsonj back to the client's link-local address. 166d04ccbb3Scarlsonj 167d04ccbb3Scarlsonj With IPv4 DHCP, a single address plus configuration options is 168d04ccbb3Scarlsonj leased with a given client ID and a single state machine instance, 169d04ccbb3Scarlsonj and the implementation binds that to a single IP logical interface 170d04ccbb3Scarlsonj specified by the user. The lease has a "Lease Time," a required 171d04ccbb3Scarlsonj option, as well as two timers, called T1 (renew) and T2 (rebind), 172d04ccbb3Scarlsonj which are controlled by regular options. 173d04ccbb3Scarlsonj 174d04ccbb3Scarlsonj DHCPv6 uses a single client/server session to control the 175d04ccbb3Scarlsonj acquisition of configuration options and "identity associations" 176d04ccbb3Scarlsonj (IAs). The identity associations, in turn, contain lists of 177d04ccbb3Scarlsonj addresses for the client to use and the T1/T2 timer values. Each 178d04ccbb3Scarlsonj individual address has its own preferred and valid lifetime, with 179d04ccbb3Scarlsonj the address being marked "deprecated" at the end of the preferred 180d04ccbb3Scarlsonj interval, and removed at the end of the valid interval. 181d04ccbb3Scarlsonj 182d04ccbb3Scarlsonj IPv4 DHCP leaves many of the retransmit decisions up to the client, 183d04ccbb3Scarlsonj and some things (such as RELEASE and DECLINE) are sent just once. 184d04ccbb3Scarlsonj Others (such as the REQUEST message used for renew and rebind) are 185d04ccbb3Scarlsonj dealt with by heuristics. DHCPv6 treats each message to the server 186d04ccbb3Scarlsonj as a separate transaction, and resends each message using a common 187d04ccbb3Scarlsonj retransmission mechanism. DHCPv6 also has separate messages for 188d04ccbb3Scarlsonj Renew, Rebind, and Confirm rather than reusing the Request 189d04ccbb3Scarlsonj mechanism. 190d04ccbb3Scarlsonj 191d04ccbb3Scarlsonj The set of options (which are used to convey configuration 192d04ccbb3Scarlsonj information) for each protocol are distinct. Notably, two of the 193d04ccbb3Scarlsonj mistakes from IPv4 DHCP have been fixed: DHCPv6 doesn't carry a 194d04ccbb3Scarlsonj client name, and doesn't attempt to impersonate a routing protocol 195d04ccbb3Scarlsonj by setting a "default route." 196d04ccbb3Scarlsonj 197d04ccbb3Scarlsonj Another welcome change is the lack of a netmask/prefix length with 198d04ccbb3Scarlsonj DHCPv6. Instead, the client uses the Router Advertisement prefixes 199d04ccbb3Scarlsonj to set the correct interface netmask. This reduces the number of 200d04ccbb3Scarlsonj databases that need to be kept in sync. (The equivalent mechanism 201d04ccbb3Scarlsonj in IPv4 would have been the use of ICMP Address Mask Request / 202d04ccbb3Scarlsonj Reply, but the BOOTP designers chose to embed it in the address 203d04ccbb3Scarlsonj assignment protocol itself.) 204d04ccbb3Scarlsonj 205d04ccbb3Scarlsonj Otherwise, DHCPv6 is similar to IPv4 DHCP. The same overall 206d04ccbb3Scarlsonj renew/rebind and lease expiry strategy is used, although the state 207d04ccbb3Scarlsonj machine events must now take into account multiple IAs and the fact 208d04ccbb3Scarlsonj that each can cause RENEWING or REBINDING state independently. 209d04ccbb3Scarlsonj 210d04ccbb3Scarlsonj 211d04ccbb3ScarlsonjDHCPv6 And Solaris 212d04ccbb3Scarlsonj 213d04ccbb3Scarlsonj The protocol distinctions above have several important implications. 214d04ccbb3Scarlsonj For the logical interfaces: 215d04ccbb3Scarlsonj 216d04ccbb3Scarlsonj - Because Solaris uses IP logical interfaces to configure 217d04ccbb3Scarlsonj addresses, we must have multiple IP logical interfaces per IA 218d04ccbb3Scarlsonj with IPv6. 219d04ccbb3Scarlsonj 220d04ccbb3Scarlsonj - Because we need to support multiple addresses (and thus multiple 221d04ccbb3Scarlsonj IP logical interfaces) per IA and multiple IAs per client/server 222d04ccbb3Scarlsonj session, the IP logical interface name isn't a unique name for 223d04ccbb3Scarlsonj the lease. 224d04ccbb3Scarlsonj 225d04ccbb3Scarlsonj As a result, IP logical interfaces will come and go with DHCPv6, 226d04ccbb3Scarlsonj just as happens with the existing stateless address 227d04ccbb3Scarlsonj autoconfiguration support in in.ndpd. The logical interface names 228d04ccbb3Scarlsonj (visible in ifconfig) have no administrative significance. 229d04ccbb3Scarlsonj 230d04ccbb3Scarlsonj Fortunately, DHCPv6 does end up with one fixed name that can be used 231d04ccbb3Scarlsonj to identify a session. Because DHCPv6 uses link local addresses for 232d04ccbb3Scarlsonj communication with the server, the name of the IP logical interface 233d04ccbb3Scarlsonj that has this link local address (normally the same as the IP 234d04ccbb3Scarlsonj physical interface) can be used as an identifier for dhcpinfo and 235d04ccbb3Scarlsonj logging purposes. 236d04ccbb3Scarlsonj 237d04ccbb3Scarlsonj 238d04ccbb3ScarlsonjDhcpagent Redesign Overview 239d04ccbb3Scarlsonj 240d04ccbb3Scarlsonj The redesign starts by refactoring the IP interface representation. 241d04ccbb3Scarlsonj Because we need to have multiple IP logical interfaces (LIFs) for a 242d04ccbb3Scarlsonj single identity association (IA), we should not store all of the 243d04ccbb3Scarlsonj DHCP state information along with the LIF information. 244d04ccbb3Scarlsonj 245d04ccbb3Scarlsonj For DHCPv6, we will need to keep LIFs on a single IP physical 246d04ccbb3Scarlsonj interface (PIF) together, so this is probably also a good time to 247d04ccbb3Scarlsonj reconsider the way dhcpagent represents physical interfaces. The 248d04ccbb3Scarlsonj current design simply replicates the state (notably the DLPI stream, 249d04ccbb3Scarlsonj but also the hardware address and other bits) among all of the 250d04ccbb3Scarlsonj ifslist entries on the same physical interface. 251d04ccbb3Scarlsonj 252d04ccbb3Scarlsonj The new design creates two lists of dhcp_pif_t entries, one list for 253d04ccbb3Scarlsonj IPv4 and the other for IPv6. Each dhcp_pif_t represents a PIF, with 254d04ccbb3Scarlsonj a list of dhcp_lif_t entries attached, each of which represents a 255d04ccbb3Scarlsonj LIF used by dhcpagent. This structure mirrors the kernel's ill_t 256d04ccbb3Scarlsonj and ipif_t interface representations. 257d04ccbb3Scarlsonj 258d04ccbb3Scarlsonj Next, the lease-tracking needs to be refactored. DHCPv6 is the 259d04ccbb3Scarlsonj functional superset in this case, as it has two lifetimes per 260d04ccbb3Scarlsonj address (LIF) and IA groupings with shared T1/T2 timers. To 261d04ccbb3Scarlsonj represent these groupings, we will use a new dhcp_lease_t structure. 262d04ccbb3Scarlsonj IPv4 DHCP will have one such structure per state machine, while 263d04ccbb3Scarlsonj DHCPv6 will have a list. (Note: the initial implementation will 264d04ccbb3Scarlsonj have only one lease per DHCPv6 state machine, because each state 265d04ccbb3Scarlsonj machine uses a single link-local address, a single DUID+IAID pair, 266d04ccbb3Scarlsonj and supports only Non-temporary Addresses [IA_NA option]. Future 267d04ccbb3Scarlsonj enhancements may use multiple leases per DHCPv6 state machine or 268d04ccbb3Scarlsonj support other IA types.) 269d04ccbb3Scarlsonj 270d04ccbb3Scarlsonj For all of these new structures, we will use the same insert/hold/ 271d04ccbb3Scarlsonj release/remove model as with the original ifslist. 272d04ccbb3Scarlsonj 273d04ccbb3Scarlsonj Finally, the remaining items (and the bulk of the original ifslist 274d04ccbb3Scarlsonj members) are kept on a per-state-machine basis. As this is no 275d04ccbb3Scarlsonj longer just an "interface," a new dhcp_smach_t structure will hold 276d04ccbb3Scarlsonj these, and the ifslist structure is gone. 277d04ccbb3Scarlsonj 278d04ccbb3Scarlsonj 279d04ccbb3ScarlsonjLease Representation 280d04ccbb3Scarlsonj 281d04ccbb3Scarlsonj For DHCPv6, we need to track multiple LIFs per lease (IA), but we 282d04ccbb3Scarlsonj also need multiple LIFs per PIF. Rather than having two sets of 283d04ccbb3Scarlsonj list linkage for each LIF, we can observe that a LIF is on exactly 284d04ccbb3Scarlsonj one PIF and is a member of at most one lease, and then simplify: the 285d04ccbb3Scarlsonj lease structure will use a base pointer for the first LIF in the 286d04ccbb3Scarlsonj lease, and a count for the number of consecutive LIFs in the PIF's 287d04ccbb3Scarlsonj list of LIFs that belong to the lease. 288d04ccbb3Scarlsonj 289d04ccbb3Scarlsonj When removing a LIF from the system, we need to decrement the count 290d04ccbb3Scarlsonj of LIFs in the lease, and advance the base pointer if the LIF being 291d04ccbb3Scarlsonj removed is the first one. Inserting a LIF means just moving it into 292d04ccbb3Scarlsonj this list and bumping the counter. 293d04ccbb3Scarlsonj 294d04ccbb3Scarlsonj When removing a lease from a state machine, we need to dispose of 295d04ccbb3Scarlsonj the LIFs referenced. If the LIF being disposed is the main LIF for 296d04ccbb3Scarlsonj a state machine, then all that we can do is canonize the LIF 297d04ccbb3Scarlsonj (returning it to a default state); this represents the normal IPv4 298d04ccbb3Scarlsonj DHCP operation on lease expiry. Otherwise, the lease is the owner 299d04ccbb3Scarlsonj of that LIF (it was created because of a DHCPv6 IA), and disposal 300d04ccbb3Scarlsonj means unplumbing the LIF from the actual system and removing the LIF 301d04ccbb3Scarlsonj entry from the PIF. 302d04ccbb3Scarlsonj 303d04ccbb3Scarlsonj 304d04ccbb3ScarlsonjMain Structure Linkage 305d04ccbb3Scarlsonj 306d04ccbb3Scarlsonj For IPv4 DHCP, the new linkage is straightforward. Using the same 307d04ccbb3Scarlsonj system configuration example as in the initial design discussion: 308d04ccbb3Scarlsonj 309d04ccbb3Scarlsonj +- lease +- lease +- lease 310d04ccbb3Scarlsonj | ^ | ^ | ^ 311d04ccbb3Scarlsonj | | | | | | 312d04ccbb3Scarlsonj \ smach \ smach \ smach 313d04ccbb3Scarlsonj \ ^| \ ^| \ ^| 314d04ccbb3Scarlsonj v|v v|v v|v 315d04ccbb3Scarlsonj lif ----> lif -> NULL lif -> NULL 316d04ccbb3Scarlsonj net0 net0:1 net1 317d04ccbb3Scarlsonj ^ ^ 318d04ccbb3Scarlsonj | | 319d04ccbb3Scarlsonj v4root -> pif --------------------> pif -> NULL 320d04ccbb3Scarlsonj net0 net1 321d04ccbb3Scarlsonj 322d04ccbb3Scarlsonj This diagram shows three separate state machines running (with 323d04ccbb3Scarlsonj backpointers omitted for clarity). Each state machine has a single 324d04ccbb3Scarlsonj "main" LIF with which it's associated (and named). Each also has a 325d04ccbb3Scarlsonj single lease structure that points back to the same LIF (count of 326d04ccbb3Scarlsonj 1), because IPv4 DHCP controls a single address allocation per state 327d04ccbb3Scarlsonj machine. 328d04ccbb3Scarlsonj 329d04ccbb3Scarlsonj DHCPv6 is a bit more complex. This shows DHCPv6 running on two 330d04ccbb3Scarlsonj interfaces (more or fewer interfaces are of course possible) and 331d04ccbb3Scarlsonj with multiple leases on the first interface, and each lease with 332d04ccbb3Scarlsonj multiple addresses (one with two addresses, the second with one). 333d04ccbb3Scarlsonj 334d04ccbb3Scarlsonj lease ----------------> lease -> NULL lease -> NULL 335d04ccbb3Scarlsonj ^ \(2) |(1) ^ \ (1) 336d04ccbb3Scarlsonj | \ | | \ 337d04ccbb3Scarlsonj smach \ | smach \ 338d04ccbb3Scarlsonj ^ | \ | ^ | \ 339d04ccbb3Scarlsonj | v v v | v v 340d04ccbb3Scarlsonj lif --> lif --> lif --> lif --> NULL lif --> lif -> NULL 341d04ccbb3Scarlsonj net0 net0:1 net0:4 net0:2 net1 net1:5 342d04ccbb3Scarlsonj ^ ^ 343d04ccbb3Scarlsonj | | 344d04ccbb3Scarlsonj v6root -> pif ----------------------------------> pif -> NULL 345d04ccbb3Scarlsonj net0 net1 346d04ccbb3Scarlsonj 347d04ccbb3Scarlsonj Note that there's intentionally no ordering based on name in the 348d04ccbb3Scarlsonj list of LIFs. Instead, the contiguous LIF structures in that list 349d04ccbb3Scarlsonj represent the addresses in each lease. The logical interfaces 350d04ccbb3Scarlsonj themselves are allocated and numbered by the system kernel, so they 351d04ccbb3Scarlsonj may not be sequential, and there may be gaps in the list if other 352d04ccbb3Scarlsonj entities (such as in.ndpd) are also configuring interfaces. 353d04ccbb3Scarlsonj 354d04ccbb3Scarlsonj Note also that with IPv4 DHCP, the lease points to the LIF that's 355d04ccbb3Scarlsonj also the main LIF for the state machine, because that's the IP 356d04ccbb3Scarlsonj interface that dhcpagent controls. With DHCPv6, the lease (one per 357d04ccbb3Scarlsonj IA structure) points to a separate set of LIFs that are created just 358d04ccbb3Scarlsonj for the leased addresses (one per IA address in an IAADDR option). 359d04ccbb3Scarlsonj The state machine alone points to the main LIF. 360d04ccbb3Scarlsonj 361d04ccbb3Scarlsonj 362d04ccbb3ScarlsonjPacket Structure Extensions 363d04ccbb3Scarlsonj 364d04ccbb3Scarlsonj Obviously, we need some DHCPv6 packet data structures and 365d04ccbb3Scarlsonj definitions. A new <netinet/dhcp6.h> file will be introduced with 366d04ccbb3Scarlsonj the necessary #defines and structures. The key structure there will 367d04ccbb3Scarlsonj be: 368d04ccbb3Scarlsonj 369d04ccbb3Scarlsonj struct dhcpv6_message { 370d04ccbb3Scarlsonj uint8_t d6m_msg_type; 371d04ccbb3Scarlsonj uint8_t d6m_transid_ho; 372d04ccbb3Scarlsonj uint16_t d6m_transid_lo; 373d04ccbb3Scarlsonj }; 374d04ccbb3Scarlsonj typedef struct dhcpv6_message dhcpv6_message_t; 375d04ccbb3Scarlsonj 376d04ccbb3Scarlsonj This defines the usual (non-relay) DHCPv6 packet header, and is 377d04ccbb3Scarlsonj roughly equivalent to PKT for IPv4. 378d04ccbb3Scarlsonj 379d04ccbb3Scarlsonj Extending dhcp_pkt_t for DHCPv6 is straightforward, as it's used 380d04ccbb3Scarlsonj only within dhcpagent. This structure will be amended to use a 381d04ccbb3Scarlsonj union for v4/v6 and include a boolean to flag which version is in 382d04ccbb3Scarlsonj use. 383d04ccbb3Scarlsonj 384d04ccbb3Scarlsonj For the PKT_LIST structure, things are more complex. This defines 385d04ccbb3Scarlsonj both a queuing mechanism for received packets (typically OFFERs) and 386d04ccbb3Scarlsonj a set of packet decoding structures. The decoding structures are 387d04ccbb3Scarlsonj highly specific to IPv4 DHCP -- they have no means to handle nested 388d04ccbb3Scarlsonj or repeated options (as used heavily in DHCPv6) and make use of the 389d04ccbb3Scarlsonj DHCP_OPT structure which is specific to IPv4 DHCP -- and are 390d04ccbb3Scarlsonj somewhat expensive in storage, due to the use of arrays indexed by 391d04ccbb3Scarlsonj option code number. 392d04ccbb3Scarlsonj 393d04ccbb3Scarlsonj Worse, this structure is used throughout the system, so changes to 394d04ccbb3Scarlsonj it need to be made carefully. (For example, the existing 'pkt' 395d04ccbb3Scarlsonj member can't just be turned into a union.) 396d04ccbb3Scarlsonj 397d04ccbb3Scarlsonj For an initial prototype, since discarded, I created a new 398d04ccbb3Scarlsonj dhcp_plist_t structure to represent packet lists as used inside 399d04ccbb3Scarlsonj dhcpagent and made dhcp_pkt_t valid for use on input and output. 400d04ccbb3Scarlsonj The result is unsatisfying, though, as it results in code that 401d04ccbb3Scarlsonj manipulates far too many data structures in common cases; it's a sea 402d04ccbb3Scarlsonj of pointers to pointers. 403d04ccbb3Scarlsonj 404d04ccbb3Scarlsonj The better answer is to use PKT_LIST for both IPv4 and IPv6, adding 405d04ccbb3Scarlsonj the few new bits of metadata required to the end (receiving ifIndex, 406d04ccbb3Scarlsonj packet source/destination addresses), and staying within the overall 407d04ccbb3Scarlsonj existing design. 408d04ccbb3Scarlsonj 409d04ccbb3Scarlsonj For option parsing, dhcpv6_find_option() and dhcpv6_pkt_option() 410d04ccbb3Scarlsonj functions will be added to libdhcputil. The former function will 411d04ccbb3Scarlsonj walk a DHCPv6 option list, and provide safe (bounds-checked) access 412d04ccbb3Scarlsonj to the options inside. The function can be called recursively, so 413d04ccbb3Scarlsonj that option nesting can be handled fairly simply by nested loops, 414d04ccbb3Scarlsonj and can be called repeatedly to return each instance of a given 415d04ccbb3Scarlsonj option code number. The latter function is just a convenience 416d04ccbb3Scarlsonj wrapper on dhcpv6_find_option() that starts with a PKT_LIST pointer 417d04ccbb3Scarlsonj and iterates over the top-level options with a given code number. 418d04ccbb3Scarlsonj 419d04ccbb3Scarlsonj There are two special considerations for the use of these library 420d04ccbb3Scarlsonj interfaces: there's no "pad" option for DHCPv6 or alignment 421d04ccbb3Scarlsonj requirements on option headers or contents, and nested options 422d04ccbb3Scarlsonj always follow a structure that has type-dependent length. This 423d04ccbb3Scarlsonj means that code that handles options must all be written to deal 424d04ccbb3Scarlsonj with unaligned data, and suboption code must index the pointer past 425d04ccbb3Scarlsonj the type-dependent part. 426d04ccbb3Scarlsonj 427d04ccbb3Scarlsonj 428d04ccbb3ScarlsonjPacket Construction 429d04ccbb3Scarlsonj 430d04ccbb3Scarlsonj Unlike DHCPv4, DHCPv6 places the transaction timer value in an 431d04ccbb3Scarlsonj option. The existing code sets the current time value in 432d04ccbb3Scarlsonj send_pkt_internal(), which allows it to be updated in a 433d04ccbb3Scarlsonj straightforward way when doing retransmits. 434d04ccbb3Scarlsonj 435d04ccbb3Scarlsonj To make this work in a simple manner for DHCPv6, I added a 436d04ccbb3Scarlsonj remove_pkt_opt() function. The update logic just does a remove and 437d04ccbb3Scarlsonj re-adds the option. We could also just assume the presence of the 438d04ccbb3Scarlsonj option, find it, and modify in place, but the remove feature seems 439d04ccbb3Scarlsonj more general. 440d04ccbb3Scarlsonj 441d04ccbb3Scarlsonj DHCPv6 uses nesting options. To make this work, two new utility 442d04ccbb3Scarlsonj functions are needed. First, an add_pkt_subopt() function will take 443d04ccbb3Scarlsonj a pointer to an existing option and add an embedded option within 444d04ccbb3Scarlsonj it. The packet length and existing option length are updated. If 445d04ccbb3Scarlsonj that existing option isn't a top-level option, though, this means 446d04ccbb3Scarlsonj that the caller must update the lengths of all of the enclosing 447d04ccbb3Scarlsonj options up to the top level. To do this, update_v6opt_len() will be 448d04ccbb3Scarlsonj added. This is used in the special case of adding a Status Code 449d04ccbb3Scarlsonj option to an IAADDR option within an IA_NA top-level option. 450d04ccbb3Scarlsonj 451d04ccbb3Scarlsonj 452d04ccbb3ScarlsonjSockets and I/O Handling 453d04ccbb3Scarlsonj 454d04ccbb3Scarlsonj DHCPv6 doesn't need or use either a DLPI or a broadcast IP socket. 455d04ccbb3Scarlsonj Instead, a single unicast-bound IP socket on a link-local address 456d04ccbb3Scarlsonj would be the most that is needed. This is roughly equivalent to 457d04ccbb3Scarlsonj if_sock_ip_fd in the existing design, but that existing socket is 458d04ccbb3Scarlsonj bound only after DHCP reaches BOUND state -- that is, when it 459d04ccbb3Scarlsonj switches away from DLPI. We need something different. 460d04ccbb3Scarlsonj 461d04ccbb3Scarlsonj This, along with the excess of open file descriptors in an otherwise 462d04ccbb3Scarlsonj idle daemon and the potentially serious performance problems in 463d04ccbb3Scarlsonj leaving DLPI open at all times, argues for a larger redesign of the 464d04ccbb3Scarlsonj I/O logic in dhcpagent. 465d04ccbb3Scarlsonj 466d04ccbb3Scarlsonj The first thing that we can do is eliminate the need for the 467d04ccbb3Scarlsonj per-ifslist if_sock_fd. This is used primarily for issuing ioctls 468d04ccbb3Scarlsonj to configure interfaces -- a task that would work as well with any 469d04ccbb3Scarlsonj open socket -- and is also registered to receive any ACK/NAK packets 470d04ccbb3Scarlsonj that may arrive via broadcast. Both of these can be eliminated by 471d04ccbb3Scarlsonj creating a pair of global sockets (IPv4 and IPv6), bound and 472d04ccbb3Scarlsonj configured for ACK/NAK reception. The only functional difference is 473d04ccbb3Scarlsonj that the list of running state machines must be scanned on reception 474d04ccbb3Scarlsonj to find the correct transaction ID, but the existing design 475d04ccbb3Scarlsonj effectively already goes to this effort because the kernel 476d04ccbb3Scarlsonj replicates received datagrams among all matching sockets, and each 477d04ccbb3Scarlsonj ifslist entry has a socket open. 478d04ccbb3Scarlsonj 479d04ccbb3Scarlsonj (The existing code for if_sock_fd makes oblique reference to unknown 480d04ccbb3Scarlsonj problems in the system that may prevent binding from working in some 481d04ccbb3Scarlsonj cases. The reference dates back some seven years to the original 482d04ccbb3Scarlsonj DHCP implementation. I've observed no such problems in extensive 483d04ccbb3Scarlsonj testing and if any do show up, they will be dealt with by fixing the 484d04ccbb3Scarlsonj underlying bugs.) 485d04ccbb3Scarlsonj 486d04ccbb3Scarlsonj This leads to an important simplification: it's no longer necessary 487d04ccbb3Scarlsonj to register, unregister, and re-register for packet reception while 488d04ccbb3Scarlsonj changing state -- register_acknak() and unregister_acknak() are 489d04ccbb3Scarlsonj gone. Instead, we always receive, and we dispatch the packets as 490d04ccbb3Scarlsonj they arrive. As a result, when receiving a DHCPv4 ACK or DHCPv6 491d04ccbb3Scarlsonj Reply when in BOUND state, we know it's a duplicate, and we can 492d04ccbb3Scarlsonj discard. 493d04ccbb3Scarlsonj 494d04ccbb3Scarlsonj The next part is in minimizing DLPI usage. A DLPI stream is needed 495d04ccbb3Scarlsonj at most for each IPv4 PIF, and it's not needed when all of the 496d04ccbb3Scarlsonj DHCP instances on that PIF are bound. In fact, the current 497d04ccbb3Scarlsonj implementation deals with this in configure_bound() by setting a 498d04ccbb3Scarlsonj "blackhole" packet filter. The stream is left open. 499d04ccbb3Scarlsonj 500d04ccbb3Scarlsonj To simplify this, we will open at most one DLPI stream on a PIF, and 501d04ccbb3Scarlsonj use reference counts from the state machines to determine when the 502d04ccbb3Scarlsonj stream must be open and when it can be closed. This mechanism will 503d04ccbb3Scarlsonj be centralized in a set_smach_state() function that changes the 504d04ccbb3Scarlsonj state and opens/closes the DLPI stream when needed. 505d04ccbb3Scarlsonj 506d04ccbb3Scarlsonj This leads to another simplification. The I/O logic in the existing 507d04ccbb3Scarlsonj dhcpagent makes use of the protocol state to select between DLPI and 508d04ccbb3Scarlsonj sockets. Now that we keep track of this in a simpler manner, we no 509d04ccbb3Scarlsonj longer need to switch out on state in when sending a packet; just 510d04ccbb3Scarlsonj test the dsm_using_dlpi flag instead. 511d04ccbb3Scarlsonj 512d04ccbb3Scarlsonj Still another simplification is in the handling of DHCPv4 INFORM. 513d04ccbb3Scarlsonj The current code has separate logic in it for getting the interface 514d04ccbb3Scarlsonj state and address information. This is no longer necessary, as the 515d04ccbb3Scarlsonj LIF mechanism keeps track of the interface state. And since we have 516d04ccbb3Scarlsonj separate lease structures, and INFORM doesn't acquire a lease, we no 517d04ccbb3Scarlsonj longer have to be careful about canonizing the interface on 518d04ccbb3Scarlsonj shutdown. 519d04ccbb3Scarlsonj 520d04ccbb3Scarlsonj Although the default is to send all client messages to a well-known 521d04ccbb3Scarlsonj multicast address for servers and relays, DHCPv6 also has a 522d04ccbb3Scarlsonj mechanism that allows the client to send unicast messages to the 523d04ccbb3Scarlsonj server. The operation of this mechanism is slightly complex. 524d04ccbb3Scarlsonj First, the server sends the client a unicast address via an option. 525d04ccbb3Scarlsonj We may use this address as the destination (rather than the 526d04ccbb3Scarlsonj well-known multicast address for local DHCPv6 servers and relays) 527d04ccbb3Scarlsonj only if we have a viable local source address. This means using 528d04ccbb3Scarlsonj SIOCGDSTINFO each time we try to send unicast. Next, the server may 529d04ccbb3Scarlsonj send back a special status code: UseMulticast. If this is received, 530d04ccbb3Scarlsonj and if we were actually using unicast in our messages to the server, 531d04ccbb3Scarlsonj then we need to forget the unicast address, switch back to 532d04ccbb3Scarlsonj multicast, and resend our last message. 533d04ccbb3Scarlsonj 534d04ccbb3Scarlsonj Note that it's important to avoid the temptation to resend the last 535d04ccbb3Scarlsonj message every time UseMulticast is seen, and do it only once on 536d04ccbb3Scarlsonj switching back to multicast: otherwise, a potential feedback loop is 537d04ccbb3Scarlsonj created. 538d04ccbb3Scarlsonj 539d04ccbb3Scarlsonj Because IP_PKTINFO (PSARC 2006/466) has integrated, we could go a 540d04ccbb3Scarlsonj step further by removing the need for any per-LIF sockets and just 541d04ccbb3Scarlsonj use the global sockets for all but DLPI. However, in order to 542d04ccbb3Scarlsonj facilitate a Solaris 10 backport, this will be done separately as CR 543d04ccbb3Scarlsonj 6509317. 544d04ccbb3Scarlsonj 545d04ccbb3Scarlsonj In the case of DHCPv6, we already have IPV6_PKTINFO, so we will pave 546d04ccbb3Scarlsonj the way for IPv4 by beginning to using this now, and thus have just 547d04ccbb3Scarlsonj a single socket (bound to "::") for all of DHCPv6. Doing this 548d04ccbb3Scarlsonj requires switching from the old BSD4.2 -lsocket -lnsl to the 549d04ccbb3Scarlsonj standards-compliant -lxnet in order to use ancillary data. 550d04ccbb3Scarlsonj 551d04ccbb3Scarlsonj It may also be possible to remove the need for DLPI for IPv4, and 552d04ccbb3Scarlsonj incidentally simplify the code a fair amount, by adding a kernel 553d04ccbb3Scarlsonj option to allow transmission and reception of UDP packets over 554d04ccbb3Scarlsonj interfaces that are plumbed but not marked IFF_UP. This is left for 555d04ccbb3Scarlsonj future work. 556d04ccbb3Scarlsonj 557d04ccbb3Scarlsonj 558d04ccbb3ScarlsonjThe State Machine 559d04ccbb3Scarlsonj 560d04ccbb3Scarlsonj Several parts of the existing state machine need additions to handle 561d04ccbb3Scarlsonj DHCPv6, which is a superset of DHCPv4. 562d04ccbb3Scarlsonj 563d04ccbb3Scarlsonj First, there are the RENEWING and REBINDING states. For IPv4 DHCP, 564d04ccbb3Scarlsonj these states map one-to-one with a single address and single lease 565d04ccbb3Scarlsonj that's undergoing renewal. It's a simple progression (on timeout) 566d04ccbb3Scarlsonj from BOUND, to RENEWING, to REBINDING and finally back to SELECTING 567d04ccbb3Scarlsonj to start over. Each retransmit is done by simply rescheduling the 568d04ccbb3Scarlsonj T1 or T2 timer. 569d04ccbb3Scarlsonj 570d04ccbb3Scarlsonj For DHCPv6, things are somewhat more complex. At any one time, 571d04ccbb3Scarlsonj there may be multiple IAs (leases) that are effectively in renewing 572d04ccbb3Scarlsonj or rebinding state, based on the T1/T2 timers for each IA, and many 573d04ccbb3Scarlsonj addresses that have expired. 574d04ccbb3Scarlsonj 575d04ccbb3Scarlsonj However, because all of the leases are related to a single server, 576d04ccbb3Scarlsonj and that server either responds to our requests or doesn't, we can 577d04ccbb3Scarlsonj simplify the states to be nearly identical to IPv4 DHCP. 578d04ccbb3Scarlsonj 579d04ccbb3Scarlsonj The revised definition for use with DHCPv6 is: 580d04ccbb3Scarlsonj 581d04ccbb3Scarlsonj - Transition from BOUND to RENEWING state when the first T1 timer 582d04ccbb3Scarlsonj (of any lease on the state machine) expires. At this point, as 583d04ccbb3Scarlsonj an optimization, we should begin attempting to renew any IAs 584d04ccbb3Scarlsonj that are within REN_TIMEOUT (10 seconds) of reaching T1 as well. 585d04ccbb3Scarlsonj We may as well avoid sending an excess of packets. 586d04ccbb3Scarlsonj 587d04ccbb3Scarlsonj - When a T1 lease timer expires and we're in RENEWING or REBINDING 588d04ccbb3Scarlsonj state, just ignore it, because the transaction is already in 589d04ccbb3Scarlsonj progress. 590d04ccbb3Scarlsonj 591d04ccbb3Scarlsonj - At each retransmit timeout, we should check to see if there are 592d04ccbb3Scarlsonj more IAs that need to join in because they've passed point T1 as 593d04ccbb3Scarlsonj well, and, if so, add them. This check isn't necessary at this 594d04ccbb3Scarlsonj time, because only a single IA_NA is possible with the initial 595d04ccbb3Scarlsonj design. 596d04ccbb3Scarlsonj 597d04ccbb3Scarlsonj - When we reach T2 on any IA and we're in BOUND or RENEWING state, 598d04ccbb3Scarlsonj enter REBINDING state. At this point, we have a choice. For 599d04ccbb3Scarlsonj those other IAs that are past T1 but not yet at T2, we could 600d04ccbb3Scarlsonj ignore them (sending only those that have passed point T2), 601d04ccbb3Scarlsonj continue to send separate Renew messages for them, or just 602d04ccbb3Scarlsonj include them in the Rebind message. This isn't an issue that 603d04ccbb3Scarlsonj must be dealt with for this project, but the plan is to include 604d04ccbb3Scarlsonj them in the Rebind message. 605d04ccbb3Scarlsonj 606d04ccbb3Scarlsonj - When a T2 lease timer expires and we're in REBINDING state, just 607d04ccbb3Scarlsonj ignore it, as with the corresponding T1 timer. 608d04ccbb3Scarlsonj 609d04ccbb3Scarlsonj - As addresses reach the end of their preferred lifetimes, set the 610d04ccbb3Scarlsonj IFF_DEPRECATED flag. As they reach the end of the valid 611d04ccbb3Scarlsonj lifetime, remove them from the system. When an IA (lease) 612d04ccbb3Scarlsonj becomes empty, just remove it. When there are no more leases 613d04ccbb3Scarlsonj left, return to SELECTING state to start over. 614d04ccbb3Scarlsonj 615d04ccbb3Scarlsonj Note that the RFC treats the IAs as separate entities when 616d04ccbb3Scarlsonj discussing the renew/rebind T1/T2 timers, but treats them as a unit 617d04ccbb3Scarlsonj when doing the initial negotiation. This is, to say the least, 618d04ccbb3Scarlsonj confusing, especially so given that there's no reason to expect that 619d04ccbb3Scarlsonj after having failed to elicit any responses at all from the server 620d04ccbb3Scarlsonj on one IA, the server will suddenly start responding when we attempt 621d04ccbb3Scarlsonj to renew some other IA. We rationalize this behavior by using a 622d04ccbb3Scarlsonj single renew/rebind state for the entire state machine (and thus 623d04ccbb3Scarlsonj client/server pair). 624d04ccbb3Scarlsonj 625d04ccbb3Scarlsonj There's a subtle timing difference here between DHCPv4 and DHCPv6. 626d04ccbb3Scarlsonj For DHCPv4, the client just sends packets more and more frequently 627d04ccbb3Scarlsonj (shorter timeouts) as the next state gets nearer. DHCPv6 treats 628d04ccbb3Scarlsonj each as a transaction, using the same retransmit logic as for other 629d04ccbb3Scarlsonj messages. The DHCPv6 method is a cleaner design, so we will change 630d04ccbb3Scarlsonj the DHCPv4 implementation to do the same, and compute the new timer 631d04ccbb3Scarlsonj values as part of stop_extending(). 632d04ccbb3Scarlsonj 633d04ccbb3Scarlsonj Note that it would be possible to start the SELECTING state earlier 634d04ccbb3Scarlsonj than waiting for the last lease to expire, and thus avoid a loss of 635d04ccbb3Scarlsonj connectivity. However, it this point, there are other servers on 636d04ccbb3Scarlsonj the network that have seen us attempting to Rebind for quite some 637d04ccbb3Scarlsonj time, and they have not responded. The likelihood that there's a 638d04ccbb3Scarlsonj server that will ignore Rebind but then suddenly spring into action 639d04ccbb3Scarlsonj on a Solicit message seems low enough that the optimization won't be 640d04ccbb3Scarlsonj done now. (Starting SELECTING state earlier may be done in the 641d04ccbb3Scarlsonj future, if it's found to be useful.) 642d04ccbb3Scarlsonj 643d04ccbb3Scarlsonj 644d04ccbb3ScarlsonjPersistent State 645d04ccbb3Scarlsonj 646d04ccbb3Scarlsonj IPv4 DHCP has only minimal need for persistent state, beyond the 647d04ccbb3Scarlsonj configuration parameters. The state is stored when "ifconfig dhcp 648d04ccbb3Scarlsonj drop" is run or the daemon receives SIGTERM, which is typically done 649d04ccbb3Scarlsonj only well after the system is booted and running. 650d04ccbb3Scarlsonj 651d04ccbb3Scarlsonj The daemon stores this state in /etc/dhcp, because it needs to be 652d04ccbb3Scarlsonj available when only the root file system has been mounted. 653d04ccbb3Scarlsonj 654d04ccbb3Scarlsonj Moreover, dhcpagent starts very early in the boot process. It runs 655d04ccbb3Scarlsonj as part of svc:/network/physical:default, which runs well before 656d04ccbb3Scarlsonj root is mounted read/write: 657d04ccbb3Scarlsonj 658d04ccbb3Scarlsonj svc:/system/filesystem/root:default -> 659d04ccbb3Scarlsonj svc:/system/metainit:default -> 660d04ccbb3Scarlsonj svc:/system/identity:node -> 661d04ccbb3Scarlsonj svc:/network/physical:default 662d04ccbb3Scarlsonj svc:/network/iscsi_initiator:default -> 663d04ccbb3Scarlsonj svc:/network/physical:default 664d04ccbb3Scarlsonj 665d04ccbb3Scarlsonj and, of course, well before either /var or /usr is mounted. This 666d04ccbb3Scarlsonj means that any persistent state must be kept in the root file 667d04ccbb3Scarlsonj system, and that if we write before shutdown, we have to cope 668d04ccbb3Scarlsonj gracefully with the root file system returning EROFS on write 669d04ccbb3Scarlsonj attempts. 670d04ccbb3Scarlsonj 671d04ccbb3Scarlsonj For DHCPv6, we need to try to keep our stable DUID and IAID values 672d04ccbb3Scarlsonj stable across reboots to fulfill the demands of RFC 3315. 673d04ccbb3Scarlsonj 674d04ccbb3Scarlsonj The DUID is either configured or automatically generated. When 675d04ccbb3Scarlsonj configured, it comes from the /etc/default/dhcpagent file, and thus 676d04ccbb3Scarlsonj does not need to be saved by the daemon. If automatically 677d04ccbb3Scarlsonj generated, there's exactly one of these created, and it will 678d04ccbb3Scarlsonj eventually be needed before /usr is mounted, if /usr is mounted over 679d04ccbb3Scarlsonj IPv6. This means a new file in the root file system, 680d04ccbb3Scarlsonj /etc/dhcp/duid, will be used to hold the automatically generated 681d04ccbb3Scarlsonj DUID. 682d04ccbb3Scarlsonj 683d04ccbb3Scarlsonj The determination of whether to use a configured DUID or one saved 684d04ccbb3Scarlsonj in a file is made in get_smach_cid(). This function will 685d04ccbb3Scarlsonj encapsulate all of the DUID parsing and generation machinery for the 686d04ccbb3Scarlsonj rest of dhcpagent. 687d04ccbb3Scarlsonj 688d04ccbb3Scarlsonj If root is not writable at the point when dhcpagent starts, and our 689d04ccbb3Scarlsonj attempt fails with EROFS, we will set a timer for 60 second 690d04ccbb3Scarlsonj intervals to retry the operation periodically. In the unlikely case 691d04ccbb3Scarlsonj that it just never succeeds or that we're rebooted before root 692d04ccbb3Scarlsonj becomes writable, then the impact will be that the daemon will wake 693d04ccbb3Scarlsonj up once a minute and, ultimately, we'll choose a different DUID on 694d04ccbb3Scarlsonj next start-up, and we'll thus lose our leases across a reboot. 695d04ccbb3Scarlsonj 696d04ccbb3Scarlsonj The IAID similarly must be kept stable if at all possible, but 697d04ccbb3Scarlsonj cannot be configured by the user. To do make these values stable, 698d04ccbb3Scarlsonj we will use two strategies. First the IAID value for a given 699d04ccbb3Scarlsonj interface (if not known) will just default to the IP ifIndex value, 700d04ccbb3Scarlsonj provided that there's no known saved IAID using that value. Second, 701d04ccbb3Scarlsonj we will save off the IAID we choose in a single /etc/dhcp/iaid file, 702d04ccbb3Scarlsonj containing an array of entries indexed by logical interface name. 703d04ccbb3Scarlsonj Keeping it in a single file allows us to scan for used and unused 704d04ccbb3Scarlsonj IAID values when necessary. 705d04ccbb3Scarlsonj 706d04ccbb3Scarlsonj This mechanism depends on the interface name, and thus will need to 707d04ccbb3Scarlsonj be revisited when Clearview vanity naming and NWAM are available. 708d04ccbb3Scarlsonj 709d04ccbb3Scarlsonj Currently, the boot system (GRUB, OBP, the miniroot) does not 710d04ccbb3Scarlsonj support installing over IPv6. This could change in the future, so 711d04ccbb3Scarlsonj one of the goals of the above stability plan is to support that 712d04ccbb3Scarlsonj event. 713d04ccbb3Scarlsonj 714d04ccbb3Scarlsonj When running in the miniroot on an x86 system, /etc/dhcp (and the 715d04ccbb3Scarlsonj rest of the root) is mounted on a read-only ramdisk. In this case, 716d04ccbb3Scarlsonj writing to /etc/dhcp will just never work. A possible solution 717d04ccbb3Scarlsonj would be to add a new privileged command in ifconfig that forces 718d04ccbb3Scarlsonj dhcpagent to write to an alternate location. The initial install 719d04ccbb3Scarlsonj process could then do "ifconfig <x> dhcp write /a" to get the needed 720d04ccbb3Scarlsonj state written out to the newly-constructed system root. 721d04ccbb3Scarlsonj 722d04ccbb3Scarlsonj This part (the new write option) won't be implemented as part of 723d04ccbb3Scarlsonj this project, because it's not needed yet. 724d04ccbb3Scarlsonj 725d04ccbb3Scarlsonj 726d04ccbb3ScarlsonjRouter Advertisements 727d04ccbb3Scarlsonj 728d04ccbb3Scarlsonj IPv6 Router Advertisements perform two functions related to DHCPv6: 729d04ccbb3Scarlsonj 730d04ccbb3Scarlsonj - they specify whether and how to run DHCPv6 on a given interface. 731d04ccbb3Scarlsonj - they provide a list of the valid prefixes on an interface. 732d04ccbb3Scarlsonj 733d04ccbb3Scarlsonj For the first function, in.ndpd needs to use the same DHCP control 734d04ccbb3Scarlsonj interfaces that ifconfig uses, so that it can launch dhcpagent and 735d04ccbb3Scarlsonj trigger DHCPv6 when necessary. Note that it never needs to shut 736d04ccbb3Scarlsonj down DHCPv6, as router advertisements can't do that. 737d04ccbb3Scarlsonj 738d04ccbb3Scarlsonj However, launching dhcpagent presents new problems. As a part of 739d04ccbb3Scarlsonj the "Quagga SMF Modifications" project (PSARC 2006/552), in.ndpd in 740d04ccbb3Scarlsonj Nevada is now privilege-aware and runs with limited privileges, 741d04ccbb3Scarlsonj courtesy of SMF. Dhcpagent, on the other hand, must run with all 742d04ccbb3Scarlsonj privileges. 743d04ccbb3Scarlsonj 744d04ccbb3Scarlsonj A simple work-around for this issue is to rip out the "privileges=" 745d04ccbb3Scarlsonj clause from the method_credential for in.ndpd. I've taken this 746d04ccbb3Scarlsonj direction initially, but the right longer-term answer seems to be 747d04ccbb3Scarlsonj converting dhcpagent into an SMF service. This is quite a bit more 748d04ccbb3Scarlsonj complex, as it means turning the /sbin/dhcpagent command line 749d04ccbb3Scarlsonj interface into a utility that manipulates the service and passes the 750d04ccbb3Scarlsonj command line options via IPC extensions. 751d04ccbb3Scarlsonj 752d04ccbb3Scarlsonj Such a design also begs the question of whether dhcpagent itself 753d04ccbb3Scarlsonj ought to run with reduced privileges. It could, but it still needs 754d04ccbb3Scarlsonj the ability to grant "all" (traditional UNIX root) privileges to the 755d04ccbb3Scarlsonj eventhook script, if present. There seem to be few ways to do this, 756d04ccbb3Scarlsonj though it's a good area for research. 757d04ccbb3Scarlsonj 758d04ccbb3Scarlsonj The second function, prefix handling, is also subtle. Unlike IPv4 759d04ccbb3Scarlsonj DHCP, DHCPv6 does not give the netmask or prefix length along with 760d04ccbb3Scarlsonj the leased address. The client is on its own to determine the right 761d04ccbb3Scarlsonj netmask to use. This is where the advertised prefixes come in: 762d04ccbb3Scarlsonj these must be used to finish the interface configuration. 763d04ccbb3Scarlsonj 764d04ccbb3Scarlsonj We will have the DHCPv6 client configure each interface with an 765d04ccbb3Scarlsonj all-ones (/128) netmask by default. In.ndpd will be modified so 766d04ccbb3Scarlsonj that when it detects a new IFF_DHCPRUNNING IP logical interface, it 767d04ccbb3Scarlsonj checks for a known matching prefix, and sets the netmask as 768d04ccbb3Scarlsonj necessary. If no matching prefix is known, it will send a new 769d04ccbb3Scarlsonj Router Solicitation message to try to find one. 770d04ccbb3Scarlsonj 771d04ccbb3Scarlsonj When in.ndpd learns of a new prefix from a Router Advertisement, it 772d04ccbb3Scarlsonj will scan all of the IFF_DHCPRUNNING IP logical interfaces on the 773d04ccbb3Scarlsonj same physical interface and set the netmasks when necessary. 774d04ccbb3Scarlsonj Dhcpagent, for its part, will ignore the netmask on IPv6 interfaces 775d04ccbb3Scarlsonj when checking for changes that would require it to "abandon" the 776d04ccbb3Scarlsonj interface. 777d04ccbb3Scarlsonj 778d04ccbb3Scarlsonj Given the way that DHCPv6 and in.ndpd control both the horizontal 779d04ccbb3Scarlsonj and the vertical in plumbing and removing logical interfaces, and 780d04ccbb3Scarlsonj users do not, it might be worthwhile to consider roping off any 781d04ccbb3Scarlsonj direct user changes to IPv6 logical interfaces under control of 782d04ccbb3Scarlsonj in.ndpd or dhcpagent, and instead force users through a higher-level 783d04ccbb3Scarlsonj interface. This won't be done as part of this project, however. 784d04ccbb3Scarlsonj 785d04ccbb3Scarlsonj 786d04ccbb3ScarlsonjARP Hardware Types 787d04ccbb3Scarlsonj 788d04ccbb3Scarlsonj There are multiple places within the DHCPv6 client where the mapping 789d04ccbb3Scarlsonj of DLPI MAC type to ARP Hardware Type is required: 790d04ccbb3Scarlsonj 791d04ccbb3Scarlsonj - When we are constructing an automatic, stable DUID for our own 792d04ccbb3Scarlsonj identity, we prefer to use a DUID-LLT if possible. This is done 793d04ccbb3Scarlsonj by finding a link-layer interface, opening it, reading the MAC 794d04ccbb3Scarlsonj address and type, and translating in the make_stable_duid() 795d04ccbb3Scarlsonj function in libdhcpagent. 796d04ccbb3Scarlsonj 797d04ccbb3Scarlsonj - When we translate a user-configured DUID from 798d04ccbb3Scarlsonj /etc/default/dhcpagent into a binary representation, we may have 799d04ccbb3Scarlsonj to deal with a physical interface name. In this case, we must 800d04ccbb3Scarlsonj open that interface and read the MAC address and type. 801d04ccbb3Scarlsonj 802d04ccbb3Scarlsonj - As part of the PIF data structure initialization, we need to read 803d04ccbb3Scarlsonj out the MAC type so that it can be used in the BOOTP/DHCPv4 804d04ccbb3Scarlsonj 'htype' field. 805d04ccbb3Scarlsonj 806d04ccbb3Scarlsonj Ideally, these would all be provided by a single libdlpi 807d04ccbb3Scarlsonj implementation. However, that project is on-going at this time and 808d04ccbb3Scarlsonj has not yet integrated. For the time being, a dlpi_to_arp() 809d04ccbb3Scarlsonj translation function (taking dl_mac_type and returning an ARP 810d04ccbb3Scarlsonj Hardware Type number) will be placed in libdhcputil. 811d04ccbb3Scarlsonj 812d04ccbb3Scarlsonj This temporary function should be removed and this section of the 813d04ccbb3Scarlsonj code updated when the new libdlpi from Clearview integrates. 814d04ccbb3Scarlsonj 815d04ccbb3Scarlsonj 816d04ccbb3ScarlsonjField Mappings 817d04ccbb3Scarlsonj 818d04ccbb3Scarlsonj Old (all in ifslist) New 819d04ccbb3Scarlsonj next dhcp_smach_t.dsm_next 820d04ccbb3Scarlsonj prev dhcp_smach_t.dsm_prev 821d04ccbb3Scarlsonj if_hold_count dhcp_smach_t.dsm_hold_count 822d04ccbb3Scarlsonj if_ia dhcp_smach_t.dsm_ia 823d04ccbb3Scarlsonj if_async dhcp_smach_t.dsm_async 824d04ccbb3Scarlsonj if_state dhcp_smach_t.dsm_state 825d04ccbb3Scarlsonj if_dflags dhcp_smach_t.dsm_dflags 826d04ccbb3Scarlsonj if_name dhcp_smach_t.dsm_name (see text) 827d04ccbb3Scarlsonj if_index dhcp_pif_t.pif_index 828d04ccbb3Scarlsonj if_max dhcp_lif_t.lif_max and dhcp_pif_t.pif_max 829d04ccbb3Scarlsonj if_min (was unused; removed) 830d04ccbb3Scarlsonj if_opt (was unused; removed) 831d04ccbb3Scarlsonj if_hwaddr dhcp_pif_t.pif_hwaddr 832d04ccbb3Scarlsonj if_hwlen dhcp_pif_t.pif_hwlen 833d04ccbb3Scarlsonj if_hwtype dhcp_pif_t.pif_hwtype 834d04ccbb3Scarlsonj if_cid dhcp_smach_t.dsm_cid 835d04ccbb3Scarlsonj if_cidlen dhcp_smach_t.dsm_cidlen 836d04ccbb3Scarlsonj if_prl dhcp_smach_t.dsm_prl 837d04ccbb3Scarlsonj if_prllen dhcp_smach_t.dsm_prllen 838d04ccbb3Scarlsonj if_daddr dhcp_pif_t.pif_daddr 839d04ccbb3Scarlsonj if_dlen dhcp_pif_t.pif_dlen 840d04ccbb3Scarlsonj if_saplen dhcp_pif_t.pif_saplen 841d04ccbb3Scarlsonj if_sap_before dhcp_pif_t.pif_sap_before 842d04ccbb3Scarlsonj if_dlpi_fd dhcp_pif_t.pif_dlpi_fd 843d04ccbb3Scarlsonj if_sock_fd v4_sock_fd and v6_sock_fd (globals) 844d04ccbb3Scarlsonj if_sock_ip_fd dhcp_lif_t.lif_sock_ip_fd 845d04ccbb3Scarlsonj if_timer (see text) 846d04ccbb3Scarlsonj if_t1 dhcp_lease_t.dl_t1 847d04ccbb3Scarlsonj if_t2 dhcp_lease_t.dl_t2 848d04ccbb3Scarlsonj if_lease dhcp_lif_t.lif_expire 849d04ccbb3Scarlsonj if_nrouters dhcp_smach_t.dsm_nrouters 850d04ccbb3Scarlsonj if_routers dhcp_smach_t.dsm_routers 851d04ccbb3Scarlsonj if_server dhcp_smach_t.dsm_server 852d04ccbb3Scarlsonj if_addr dhcp_lif_t.lif_v6addr 853d04ccbb3Scarlsonj if_netmask dhcp_lif_t.lif_v6mask 854d04ccbb3Scarlsonj if_broadcast dhcp_lif_t.lif_v6peer 855d04ccbb3Scarlsonj if_ack dhcp_smach_t.dsm_ack 856d04ccbb3Scarlsonj if_orig_ack dhcp_smach_t.dsm_orig_ack 857d04ccbb3Scarlsonj if_offer_wait dhcp_smach_t.dsm_offer_wait 858d04ccbb3Scarlsonj if_offer_timer dhcp_smach_t.dsm_offer_timer 859d04ccbb3Scarlsonj if_offer_id dhcp_pif_t.pif_dlpi_id 860d04ccbb3Scarlsonj if_acknak_id dhcp_lif_t.lif_acknak_id 861d04ccbb3Scarlsonj if_acknak_bcast_id v4_acknak_bcast_id (global) 862d04ccbb3Scarlsonj if_neg_monosec dhcp_smach_t.dsm_neg_monosec 863d04ccbb3Scarlsonj if_newstart_monosec dhcp_smach_t.dsm_newstart_monosec 864d04ccbb3Scarlsonj if_curstart_monosec dhcp_smach_t.dsm_curstart_monosec 865d04ccbb3Scarlsonj if_disc_secs dhcp_smach_t.dsm_disc_secs 866d04ccbb3Scarlsonj if_reqhost dhcp_smach_t.dsm_reqhost 867d04ccbb3Scarlsonj if_recv_pkt_list dhcp_smach_t.dsm_recv_pkt_list 868d04ccbb3Scarlsonj if_sent dhcp_smach_t.dsm_sent 869d04ccbb3Scarlsonj if_received dhcp_smach_t.dsm_received 870d04ccbb3Scarlsonj if_bad_offers dhcp_smach_t.dsm_bad_offers 871d04ccbb3Scarlsonj if_send_pkt dhcp_smach_t.dsm_send_pkt 872d04ccbb3Scarlsonj if_send_timeout dhcp_smach_t.dsm_send_timeout 873d04ccbb3Scarlsonj if_send_dest dhcp_smach_t.dsm_send_dest 874d04ccbb3Scarlsonj if_send_stop_func dhcp_smach_t.dsm_send_stop_func 875d04ccbb3Scarlsonj if_packet_sent dhcp_smach_t.dsm_packet_sent 876d04ccbb3Scarlsonj if_retrans_timer dhcp_smach_t.dsm_retrans_timer 877d04ccbb3Scarlsonj if_script_fd dhcp_smach_t.dsm_script_fd 878d04ccbb3Scarlsonj if_script_pid dhcp_smach_t.dsm_script_pid 879d04ccbb3Scarlsonj if_script_helper_pid dhcp_smach_t.dsm_script_helper_pid 880d04ccbb3Scarlsonj if_script_event dhcp_smach_t.dsm_script_event 881d04ccbb3Scarlsonj if_script_event_id dhcp_smach_t.dsm_script_event_id 882d04ccbb3Scarlsonj if_callback_msg dhcp_smach_t.dsm_callback_msg 883d04ccbb3Scarlsonj if_script_callback dhcp_smach_t.dsm_script_callback 884d04ccbb3Scarlsonj 885d04ccbb3Scarlsonj Notes: 886d04ccbb3Scarlsonj 887d04ccbb3Scarlsonj - The dsm_name field currently just points to the lif_name on the 888d04ccbb3Scarlsonj controlling LIF. This may need to be named differently in the 889d04ccbb3Scarlsonj future; perhaps when Zones are supported. 890d04ccbb3Scarlsonj 891d04ccbb3Scarlsonj - The timer mechanism will be refactored. Rather than using the 892d04ccbb3Scarlsonj separate if_timer[] array to hold the timer IDs and 893d04ccbb3Scarlsonj if_{t1,t2,lease} to hold the relative timer values, we will 894d04ccbb3Scarlsonj gather this information into a dhcp_timer_t structure: 895d04ccbb3Scarlsonj 896d04ccbb3Scarlsonj dt_id timer ID value 897d04ccbb3Scarlsonj dt_start relative start time 898d04ccbb3Scarlsonj 899d04ccbb3Scarlsonj New fields not accounted for above: 900d04ccbb3Scarlsonj 901d04ccbb3Scarlsonj dhcp_pif_t.pif_next linkage in global list of PIFs 902d04ccbb3Scarlsonj dhcp_pif_t.pif_prev linkage in global list of PIFs 903d04ccbb3Scarlsonj dhcp_pif_t.pif_lifs pointer to list of LIFs on this PIF 904d04ccbb3Scarlsonj dhcp_pif_t.pif_isv6 IPv6 flag 905d04ccbb3Scarlsonj dhcp_pif_t.pif_dlpi_count number of state machines using DLPI 906d04ccbb3Scarlsonj dhcp_pif_t.pif_hold_count reference count 907d04ccbb3Scarlsonj dhcp_pif_t.pif_name name of physical interface 908d04ccbb3Scarlsonj dhcp_lif_t.lif_next linkage in per-PIF list of LIFs 909d04ccbb3Scarlsonj dhcp_lif_t.lif_prev linkage in per-PIF list of LIFs 910d04ccbb3Scarlsonj dhcp_lif_t.lif_pif backpointer to parent PIF 911d04ccbb3Scarlsonj dhcp_lif_t.lif_smachs pointer to list of state machines 912d04ccbb3Scarlsonj dhcp_lif_t.lif_lease backpointer to lease holding LIF 913d04ccbb3Scarlsonj dhcp_lif_t.lif_flags interface flags (IFF_*) 914d04ccbb3Scarlsonj dhcp_lif_t.lif_hold_count reference count 915d04ccbb3Scarlsonj dhcp_lif_t.lif_dad_wait waiting for DAD resolution flag 916d04ccbb3Scarlsonj dhcp_lif_t.lif_removed removed from list flag 917d04ccbb3Scarlsonj dhcp_lif_t.lif_plumbed plumbed by dhcpagent flag 918d04ccbb3Scarlsonj dhcp_lif_t.lif_expired lease has expired flag 919d04ccbb3Scarlsonj dhcp_lif_t.lif_declined reason to refuse this address (string) 920d04ccbb3Scarlsonj dhcp_lif_t.lif_iaid unique and stable 32-bit identifier 921d04ccbb3Scarlsonj dhcp_lif_t.lif_iaid_id timer for delayed /etc writes 922d04ccbb3Scarlsonj dhcp_lif_t.lif_preferred preferred timer for v6; deprecate after 923d04ccbb3Scarlsonj dhcp_lif_t.lif_name name of logical interface 924d04ccbb3Scarlsonj dhcp_smach_t.dsm_lif controlling (main) LIF 925d04ccbb3Scarlsonj dhcp_smach_t.dsm_leases pointer to list of leases 926d04ccbb3Scarlsonj dhcp_smach_t.dsm_lif_wait number of LIFs waiting on DAD 927d04ccbb3Scarlsonj dhcp_smach_t.dsm_lif_down number of LIFs that have failed 928d04ccbb3Scarlsonj dhcp_smach_t.dsm_using_dlpi currently using DLPI flag 929d04ccbb3Scarlsonj dhcp_smach_t.dsm_send_tcenter v4 central timer value; v6 MRT 930d04ccbb3Scarlsonj dhcp_lease_t.dl_next linkage in per-state-machine list of leases 931d04ccbb3Scarlsonj dhcp_lease_t.dl_prev linkage in per-state-machine list of leases 932d04ccbb3Scarlsonj dhcp_lease_t.dl_smach back pointer to state machine 933d04ccbb3Scarlsonj dhcp_lease_t.dl_lifs pointer to first LIF configured by lease 934d04ccbb3Scarlsonj dhcp_lease_t.dl_nlifs number of configured consecutive LIFs 935d04ccbb3Scarlsonj dhcp_lease_t.dl_hold_count reference counter 936d04ccbb3Scarlsonj dhcp_lease_t.dl_removed removed from list flag 937d04ccbb3Scarlsonj dhcp_lease_t.dl_stale lease was not updated by Renew/Rebind 938d04ccbb3Scarlsonj 939d04ccbb3Scarlsonj 940d04ccbb3ScarlsonjSnoop 941d04ccbb3Scarlsonj 942d04ccbb3Scarlsonj The snoop changes are fairly straightforward. As snoop just decodes 943d04ccbb3Scarlsonj the messages, and the message format is quite different between 944d04ccbb3Scarlsonj DHCPv4 and DHCPv6, a new module will be created to handle DHCPv6 945d04ccbb3Scarlsonj decoding, and will export a interpret_dhcpv6() function. 946d04ccbb3Scarlsonj 947d04ccbb3Scarlsonj The one bit of commonality between the two protocols is the use of 948d04ccbb3Scarlsonj ARP Hardware Type numbers, which are found in the underlying BOOTP 949d04ccbb3Scarlsonj message format for DHCPv4 and in the DUID-LL and DUID-LLT 950d04ccbb3Scarlsonj construction for DHCPv6. To simplify this, the existing static 951d04ccbb3Scarlsonj show_htype() function in snoop_dhcp.c will be renamed to arp_htype() 952d04ccbb3Scarlsonj (to better reflect its functionality), updated with more modern 953d04ccbb3Scarlsonj hardware types, moved to snoop_arp.c (where it belongs), and made a 954d04ccbb3Scarlsonj public symbol within snoop. 955d04ccbb3Scarlsonj 956d04ccbb3Scarlsonj While I'm there, I'll update snoop_arp.c so that when it prints an 957d04ccbb3Scarlsonj ARP message in verbose mode, it uses arp_htype() to translate the 958d04ccbb3Scarlsonj ar_hrd value. 959d04ccbb3Scarlsonj 960d04ccbb3Scarlsonj The snoop updates also involve the addition of a new "dhcp6" keyword 961d04ccbb3Scarlsonj for filtering. As a part of this, CR 6487534 will be fixed. 962d04ccbb3Scarlsonj 963d04ccbb3Scarlsonj 964d04ccbb3ScarlsonjIPv6 Source Address Selection 965d04ccbb3Scarlsonj 966d04ccbb3Scarlsonj One of the customer requests for DHCPv6 is to be able to predict the 967d04ccbb3Scarlsonj address selection behavior in the presence of both stateful and 968d04ccbb3Scarlsonj stateless addresses on the same network. 969d04ccbb3Scarlsonj 970d04ccbb3Scarlsonj Solaris implements RFC 3484 address selection behavior. In this 971d04ccbb3Scarlsonj scheme, the first seven rules implement some basic preferences for 972d04ccbb3Scarlsonj addresses, with Rule 8 being a deterministic tie breaker. 973d04ccbb3Scarlsonj 974d04ccbb3Scarlsonj Rule 8 relies on a special function, CommonPrefixLen, defined in the 975d04ccbb3Scarlsonj RFC, that compares leading bits of the address without regard to 976d04ccbb3Scarlsonj configured prefix length. As Rule 1 eliminates equal addresses, 977d04ccbb3Scarlsonj this always picks a single address. 978d04ccbb3Scarlsonj 979d04ccbb3Scarlsonj This rule, though, allows for additional checks: 980d04ccbb3Scarlsonj 981d04ccbb3Scarlsonj Rule 8 may be superseded if the implementation has other means of 982d04ccbb3Scarlsonj choosing among source addresses. For example, if the implementation 983d04ccbb3Scarlsonj somehow knows which source address will result in the "best" 984d04ccbb3Scarlsonj communications performance. 985d04ccbb3Scarlsonj 986d04ccbb3Scarlsonj We will thus split Rule 8 into three separate rules: 987d04ccbb3Scarlsonj 988d04ccbb3Scarlsonj - First, compare on configured prefix. The interface with the 989d04ccbb3Scarlsonj longest configured prefix length that also matches the candidate 990d04ccbb3Scarlsonj address will be preferred. 991d04ccbb3Scarlsonj 992d04ccbb3Scarlsonj - Next, check the type of address. Prefer statically configured 993d04ccbb3Scarlsonj addresses above all others. Next, those from DHCPv6. Next, 994d04ccbb3Scarlsonj stateless autoconfigured addresses. Finally, temporary addresses. 995d04ccbb3Scarlsonj (Note that Rule 7 will take care of temporary address preferences, 996d04ccbb3Scarlsonj so that this rule doesn't actually need to look at them.) 997d04ccbb3Scarlsonj 998d04ccbb3Scarlsonj - Finally, run the check-all-bits (CommonPrefixLen) tie breaker. 999d04ccbb3Scarlsonj 1000d04ccbb3Scarlsonj The result of this is that if there's a local address in the same 1001d04ccbb3Scarlsonj configured prefix, then we'll prefer that over other addresses. If 1002d04ccbb3Scarlsonj there are multiple to choose from, then will pick static first, then 1003d04ccbb3Scarlsonj DHCPv6, then dynamic. Finally, if there are still multiples, we'll 1004d04ccbb3Scarlsonj use the "closest" address, bitwise. 1005d04ccbb3Scarlsonj 1006d04ccbb3Scarlsonj Also, this basic implementation scheme also addresses CR 6485164, so 1007d04ccbb3Scarlsonj a fix for that will be included with this project. 1008d04ccbb3Scarlsonj 1009d04ccbb3Scarlsonj 1010d04ccbb3ScarlsonjMinor Improvements 1011d04ccbb3Scarlsonj 1012d04ccbb3Scarlsonj Various small problems with the system encountered during 1013d04ccbb3Scarlsonj development will be fixed along with this project. Some of these 1014d04ccbb3Scarlsonj are: 1015d04ccbb3Scarlsonj 1016d04ccbb3Scarlsonj - List of ARPHRD_* types is a bit short; add some new ones. 1017d04ccbb3Scarlsonj 1018d04ccbb3Scarlsonj - List of IPPORT_* values is similarly sparse; add others in use by 1019d04ccbb3Scarlsonj snoop. 1020d04ccbb3Scarlsonj 1021d04ccbb3Scarlsonj - dhcpmsg.h lacks PRINTFLIKE for dhcpmsg(); add it. 1022d04ccbb3Scarlsonj 1023d04ccbb3Scarlsonj - CR 6482163 causes excessive lint errors with libxnet; will fix. 1024d04ccbb3Scarlsonj 1025d04ccbb3Scarlsonj - libdhcpagent uses gettimeofday() for I/O timing, and this can 1026d04ccbb3Scarlsonj drift on systems with NTP. It should use a stable time source 1027d04ccbb3Scarlsonj (gethrtime()) instead, and should return better error values. 1028d04ccbb3Scarlsonj 1029d04ccbb3Scarlsonj - Controlling debug mode in the daemon shouldn't require changing 1030d04ccbb3Scarlsonj the command line arguments or jumping through special hoops. I've 1031d04ccbb3Scarlsonj added undocumented ".DEBUG_LEVEL=[0-3]" and ".VERBOSE=[01]" 1032d04ccbb3Scarlsonj features to /etc/default/dhcpagent. 1033d04ccbb3Scarlsonj 1034d04ccbb3Scarlsonj - The various attributes of the IPC commands (requires privileges, 1035d04ccbb3Scarlsonj creates a new session, valid with BOOTP, immediate reply) should 1036d04ccbb3Scarlsonj be gathered together into one look-up table rather than scattered 1037d04ccbb3Scarlsonj as hard-coded tests. 1038d04ccbb3Scarlsonj 1039d04ccbb3Scarlsonj - Remove the event unregistration from the command dispatch loop and 1040d04ccbb3Scarlsonj get rid of the ipc_action_pending() botch. We'll get a 1041d04ccbb3Scarlsonj zero-length read any time the client goes away, and that will be 1042d04ccbb3Scarlsonj enough to trigger termination. This fix removes async_pending() 1043d04ccbb3Scarlsonj and async_timeout() as well, and fixes CR 6487958 as a 1044d04ccbb3Scarlsonj side-effect. 1045d04ccbb3Scarlsonj 1046d04ccbb3Scarlsonj - Throughout the dhcpagent code, there are private implementations 1047d04ccbb3Scarlsonj of doubly-linked and singly-linked lists for each data type. 1048d04ccbb3Scarlsonj These will all be removed and replaced with insque(3C) and 1049d04ccbb3Scarlsonj remque(3C). 1050d04ccbb3Scarlsonj 1051d04ccbb3Scarlsonj 1052d04ccbb3ScarlsonjTesting 1053d04ccbb3Scarlsonj 1054d04ccbb3Scarlsonj The implementation was tested using the TAHI test suite for DHCPv6 1055d04ccbb3Scarlsonj (www.tahi.org). There are some peculiar aspects to this test suite, 1056d04ccbb3Scarlsonj and these issues directed some of the design. In particular: 1057d04ccbb3Scarlsonj 1058d04ccbb3Scarlsonj - If Renew/Rebind doesn't mention one of our leases, then we need to 1059d04ccbb3Scarlsonj allow the message to be retransmitted. Real servers are unlikely 1060d04ccbb3Scarlsonj to do this. 1061d04ccbb3Scarlsonj 1062d04ccbb3Scarlsonj - We must look for a status code within IAADDR and within IA_NA, and 1063d04ccbb3Scarlsonj handle the paradoxical case of "NoAddrAvail." That doesn't make 1064d04ccbb3Scarlsonj sense, as a server with no addresses wouldn't use those options. 1065d04ccbb3Scarlsonj That option makes more sense at the top level of the message. 1066d04ccbb3Scarlsonj 1067d04ccbb3Scarlsonj - If we get "UseMulticast" when we were already using multicast, 1068d04ccbb3Scarlsonj then ignore the error code. Sending another request would cause a 1069d04ccbb3Scarlsonj loop. 1070d04ccbb3Scarlsonj 1071d04ccbb3Scarlsonj - TAHI uses "NoBinding" at the top level of the message. This 1072d04ccbb3Scarlsonj status code only makes sense within an IA, as it refers to the 1073d04ccbb3Scarlsonj GUID:IAID binding, which doesn't exist outside an IA. We must 1074d04ccbb3Scarlsonj ignore such errors -- treat them as success. 1075d04ccbb3Scarlsonj 1076d04ccbb3Scarlsonj 1077d04ccbb3ScarlsonjInteractions With Other Projects 1078d04ccbb3Scarlsonj 1079d04ccbb3Scarlsonj Clearview UV (vanity naming) will cause link names, and thus IP 1080d04ccbb3Scarlsonj interface names, to become changeable over time. This will break 1081d04ccbb3Scarlsonj the IAID stability mechanism if UV is used for arbitrary renaming, 1082d04ccbb3Scarlsonj rather than as just a DR enhancement. 1083d04ccbb3Scarlsonj 1084d04ccbb3Scarlsonj When this portion of Clearview integrates, this part of the DHCPv6 1085d04ccbb3Scarlsonj design may need to be revisited. (The solution will likely be 1086d04ccbb3Scarlsonj handled at some higher layer, such as within Network Automagic.) 1087d04ccbb3Scarlsonj 1088d04ccbb3Scarlsonj Clearview is also contributing a new libdlpi that will work for 1089d04ccbb3Scarlsonj dhcpagent, and is thus removing the private dlpi_io.[ch] functions 1090d04ccbb3Scarlsonj from this daemon. When that Clearview project integrates, the 1091d04ccbb3Scarlsonj DHCPv6 project will need to adjust to the new interfaces, and remove 1092d04ccbb3Scarlsonj or relocate the dlpi_to_arp() function. 1093d04ccbb3Scarlsonj 1094d04ccbb3Scarlsonj 1095d04ccbb3ScarlsonjFutures 1096d04ccbb3Scarlsonj 1097d04ccbb3Scarlsonj Zones currently cannot address any IP interfaces by way of DHCP. 1098d04ccbb3Scarlsonj This project will not fix that problem, but the DUID/IAID could be 1099d04ccbb3Scarlsonj used to help fix it in the future. 1100d04ccbb3Scarlsonj 1101d04ccbb3Scarlsonj In particular, the DUID allows the client to obtain separate sets of 1102d04ccbb3Scarlsonj addresses and configuration parameters on a single interface, just 1103d04ccbb3Scarlsonj like an IPv4 Client ID, but it includes a clean mechanism for vendor 1104d04ccbb3Scarlsonj extensions. If we associate the DUID with the zone identifier or 1105d04ccbb3Scarlsonj name through an extension, then we have a really simple way of 1106d04ccbb3Scarlsonj allocating per-zone addresses. 1107d04ccbb3Scarlsonj 1108d04ccbb3Scarlsonj Moreover, RFC 4361 describes a handy way of using DHCPv6 DUID/IAID 1109d04ccbb3Scarlsonj values with IPv4 DHCP, which would quickly solve the problem of 1110d04ccbb3Scarlsonj using DHCP for IPv4 address assignment in non-global zones as well. 1111d04ccbb3Scarlsonj 1112d04ccbb3Scarlsonj (One potential risk with this plan is that there may be server 1113d04ccbb3Scarlsonj implementations that either do not implement the RFC correctly or 1114d04ccbb3Scarlsonj otherwise mishandle the DUID. This has apparently bitten some early 1115d04ccbb3Scarlsonj adopters.) 1116d04ccbb3Scarlsonj 1117d04ccbb3Scarlsonj Implementing the FQDN option for DHCPv6 would, given the current 1118d04ccbb3Scarlsonj libdhcputil design, require a new 'type' of entry for the inittab6 1119d04ccbb3Scarlsonj file. This is because the design does not allow for any simple 1120d04ccbb3Scarlsonj means to ``compose'' a sequence of basic types together. Thus, 1121d04ccbb3Scarlsonj every type of option must either be a basic type, or an array of 1122d04ccbb3Scarlsonj multiple instances of the same basic type. 1123d04ccbb3Scarlsonj 1124d04ccbb3Scarlsonj If we implement FQDN in the future, it may be useful to explore some 1125d04ccbb3Scarlsonj means of allowing a given option instance to be a sequence of basic 1126d04ccbb3Scarlsonj types. 1127d04ccbb3Scarlsonj 1128d04ccbb3Scarlsonj This project does not make the DNS resolver or any other subsystem 1129d04ccbb3Scarlsonj use the data gathered by DHCPv6. It just makes the data available 1130d04ccbb3Scarlsonj through dhcpinfo(1). Future projects should modify those services 1131d04ccbb3Scarlsonj to use configuration data learned via DHCPv6. (One of the reasons 1132d04ccbb3Scarlsonj this is not being done now is that Network Automagic [NWAM] will 1133d04ccbb3Scarlsonj likely be changing this area substantially in the very near future, 1134d04ccbb3Scarlsonj and thus the effort would be largely wasted.) 1135d04ccbb3Scarlsonj 1136d04ccbb3Scarlsonj 1137d04ccbb3ScarlsonjAppendix A - Choice of Venue 1138d04ccbb3Scarlsonj 1139d04ccbb3Scarlsonj There are three logical places to implement DHCPv6: 1140d04ccbb3Scarlsonj 1141d04ccbb3Scarlsonj - in dhcpagent 1142d04ccbb3Scarlsonj - in in.ndpd 1143d04ccbb3Scarlsonj - in a new daemon (say, 'dhcp6agent') 1144d04ccbb3Scarlsonj 1145d04ccbb3Scarlsonj We need to access parameters via dhcpinfo, and should provide the 1146d04ccbb3Scarlsonj same set of status and control features via ifconfig as are present 1147d04ccbb3Scarlsonj for IPv4. (For the latter, if we fail to do that, it will likely 1148d04ccbb3Scarlsonj confuse users. The expense for doing it is comparatively small, and 1149d04ccbb3Scarlsonj it will be useful for testing, even though it should not be needed 1150d04ccbb3Scarlsonj in normal operation.) 1151d04ccbb3Scarlsonj 1152d04ccbb3Scarlsonj If we implement somewhere other than dhcpagent, then we need to give 1153d04ccbb3Scarlsonj that new daemon (in.ndpd or dhcp6agent) the same basic IPC features 1154d04ccbb3Scarlsonj as dhcpagent already has. This means either extracting those bits 1155d04ccbb3Scarlsonj (async.c and ipc_action.c) into a shared library or just copying 1156d04ccbb3Scarlsonj them. Obviously, the former would be preferred, but as those bits 1157d04ccbb3Scarlsonj depend on the rest of the dhcpagent infrastructure for timers and 1158d04ccbb3Scarlsonj state handling, this means that the new process would have to look a 1159d04ccbb3Scarlsonj lot like dhcpagent. 1160d04ccbb3Scarlsonj 1161d04ccbb3Scarlsonj Implementing DHCPv6 as part of in.ndpd is attractive, as it 1162d04ccbb3Scarlsonj eliminates the confusion that the router discovery process for 1163d04ccbb3Scarlsonj determining interface netmasks can cause, along with the need to do 1164d04ccbb3Scarlsonj any signaling at all to bring DHCPv6 up. However, the need to make 1165d04ccbb3Scarlsonj in.ndpd more like dhcpagent is unattractive. 1166d04ccbb3Scarlsonj 1167d04ccbb3Scarlsonj Having a new dhcp6agent daemon seems to have little to recommend it, 1168d04ccbb3Scarlsonj other than leaving the existing dhcpagent code untouched. If we do 1169d04ccbb3Scarlsonj that, then we end up with two implementations that do many similar 1170d04ccbb3Scarlsonj things, and must be maintained in parallel. 1171d04ccbb3Scarlsonj 1172d04ccbb3Scarlsonj Thus, although it leads to some complexity in reworking the data 1173d04ccbb3Scarlsonj structures to fit both protocols, on balance the simplest solution 1174d04ccbb3Scarlsonj is to extend dhcpagent. 1175d04ccbb3Scarlsonj 1176d04ccbb3Scarlsonj 1177d04ccbb3ScarlsonjAppendix B - Cross-Reference 1178d04ccbb3Scarlsonj 1179d04ccbb3Scarlsonj in.ndpd 1180d04ccbb3Scarlsonj 1181d04ccbb3Scarlsonj - Start dhcpagent and issue "dhcp start" command via libdhcpagent 1182d04ccbb3Scarlsonj - Parse StatefulAddrConf interface option from ndpd.conf 1183d04ccbb3Scarlsonj - Watch for M and O bits to trigger DHCPv6 1184d04ccbb3Scarlsonj - Handle "no routers found" case and start DHCPv6 1185d04ccbb3Scarlsonj - Track prefixes and set prefix length on IFF_DHCPRUNNING aliases 1186d04ccbb3Scarlsonj - Send new Router Solicitation when prefix unknown 1187d04ccbb3Scarlsonj - Change privileges so that dhcpagent can be launched successfully 1188d04ccbb3Scarlsonj 1189d04ccbb3Scarlsonj libdhcputil 1190d04ccbb3Scarlsonj 1191d04ccbb3Scarlsonj - Parse new /etc/dhcp/inittab6 file 1192d04ccbb3Scarlsonj - Handle new UNUMBER24, SNUMBER64, IPV6, DUID and DOMAIN types 1193d04ccbb3Scarlsonj - Add DHCPv6 option iterators (dhcpv6_find_option and 1194d04ccbb3Scarlsonj dhcpv6_pkt_option) 1195d04ccbb3Scarlsonj - Add dlpi_to_arp function (temporary) 1196d04ccbb3Scarlsonj 1197d04ccbb3Scarlsonj libdhcpagent 1198d04ccbb3Scarlsonj 1199d04ccbb3Scarlsonj - Add stable DUID and IAID creation and storage support 1200d04ccbb3Scarlsonj functions and add new dhcp_stable.h include file 1201d04ccbb3Scarlsonj - Support new DECLINING and RELEASING states introduced by DHCPv6. 1202d04ccbb3Scarlsonj - Update implementation so that it doesn't rely on gettimeofday() 1203d04ccbb3Scarlsonj for I/O timeouts 1204d04ccbb3Scarlsonj - Extend the hostconf functions to support DHCPv6, using a new 1205d04ccbb3Scarlsonj ".dh6" file 1206d04ccbb3Scarlsonj 1207d04ccbb3Scarlsonj snoop 1208d04ccbb3Scarlsonj 1209d04ccbb3Scarlsonj - Add support for DHCPv6 packet decoding (all types) 1210d04ccbb3Scarlsonj - Add "dhcp6" filter keyword 1211d04ccbb3Scarlsonj - Fix known bugs in DHCP filtering 1212d04ccbb3Scarlsonj 1213d04ccbb3Scarlsonj ifconfig 1214d04ccbb3Scarlsonj 1215d04ccbb3Scarlsonj - Remove inet-only restriction on "dhcp" keyword 1216d04ccbb3Scarlsonj 1217d04ccbb3Scarlsonj netstat 1218d04ccbb3Scarlsonj 1219d04ccbb3Scarlsonj - Remove strange "-I list" feature. 1220d04ccbb3Scarlsonj - Add support for DHCPv6 and iterating over IPv6 interfaces. 1221d04ccbb3Scarlsonj 1222d04ccbb3Scarlsonj ip 1223d04ccbb3Scarlsonj 1224d04ccbb3Scarlsonj - Add extensions to IPv6 source address selection to prefer DHCPv6 1225d04ccbb3Scarlsonj addresses when all else is equal 1226d04ccbb3Scarlsonj - Fix known bugs in source address selection (remaining from TX 1227d04ccbb3Scarlsonj integration) 1228d04ccbb3Scarlsonj 1229d04ccbb3Scarlsonj other 1230d04ccbb3Scarlsonj 1231d04ccbb3Scarlsonj - Add ifindex and source/destination address into PKT_LIST. 1232d04ccbb3Scarlsonj - Add more ARPHDR_* and IPPORT_* values. 1233