1d04ccbb3ScarlsonjCDDL HEADER START
2d04ccbb3Scarlsonj
3d04ccbb3ScarlsonjThe contents of this file are subject to the terms of the
4d04ccbb3ScarlsonjCommon Development and Distribution License (the "License").
5d04ccbb3ScarlsonjYou may not use this file except in compliance with the License.
6d04ccbb3Scarlsonj
7d04ccbb3ScarlsonjYou can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
8d04ccbb3Scarlsonjor http://www.opensolaris.org/os/licensing.
9d04ccbb3ScarlsonjSee the License for the specific language governing permissions
10d04ccbb3Scarlsonjand limitations under the License.
11d04ccbb3Scarlsonj
12d04ccbb3ScarlsonjWhen distributing Covered Code, include this CDDL HEADER in each
13d04ccbb3Scarlsonjfile and include the License file at usr/src/OPENSOLARIS.LICENSE.
14d04ccbb3ScarlsonjIf applicable, add the following below this CDDL HEADER, with the
15d04ccbb3Scarlsonjfields enclosed by brackets "[]" replaced with your own identifying
16d04ccbb3Scarlsonjinformation: Portions Copyright [yyyy] [name of copyright owner]
17d04ccbb3Scarlsonj
18d04ccbb3ScarlsonjCDDL HEADER END
19d04ccbb3Scarlsonj
20d04ccbb3ScarlsonjCopyright 2007 Sun Microsystems, Inc.  All rights reserved.
21d04ccbb3ScarlsonjUse is subject to license terms.
22d04ccbb3Scarlsonj
23e704a8f2Smeem
24e704a8f2Smeem**  PLEASE NOTE:
25e704a8f2Smeem**
26e704a8f2Smeem**  This document discusses aspects of the DHCPv4 client design that have
27e704a8f2Smeem**  since changed (e.g., DLPI is no longer used).  However, since those
28e704a8f2Smeem**  aspects affected the DHCPv6 design, the discussion has been left for
29e704a8f2Smeem**  historical record.
30e704a8f2Smeem
31e704a8f2Smeem
32d04ccbb3ScarlsonjDHCPv6 Client Low-Level Design
33d04ccbb3Scarlsonj
34d04ccbb3ScarlsonjIntroduction
35d04ccbb3Scarlsonj
36d04ccbb3Scarlsonj  This project adds DHCPv6 client-side (not server) support to
37d04ccbb3Scarlsonj  Solaris.  Future projects may add server-side support as well as
38d04ccbb3Scarlsonj  enhance the basic capabilities added here.  These future projects
39d04ccbb3Scarlsonj  are not discussed in detail in this document.
40d04ccbb3Scarlsonj
41d04ccbb3Scarlsonj  This document assumes that the reader is familiar with the following
42d04ccbb3Scarlsonj  other documents:
43d04ccbb3Scarlsonj
44d04ccbb3Scarlsonj  - RFC 3315: the primary description of DHCPv6
45d04ccbb3Scarlsonj  - RFCs 2131 and 2132: IPv4 DHCP
46d04ccbb3Scarlsonj  - RFCs 2461 and 2462: IPv6 NDP and stateless autoconfiguration
47d04ccbb3Scarlsonj  - RFC 3484: IPv6 default address selection
48*bbf21555SRichard Lowe  - ifconfig(8): Solaris IP interface configuration
49*bbf21555SRichard Lowe  - in.ndpd(8): Solaris IPv6 Neighbor and Router Discovery daemon
50*bbf21555SRichard Lowe  - dhcpagent(8): Solaris DHCP client
51d04ccbb3Scarlsonj  - dhcpinfo(1): Solaris DHCP parameter utility
52*bbf21555SRichard Lowe  - ndpd.conf(5): in.ndpd configuration file
53*bbf21555SRichard Lowe  - netstat(8): Solaris network status utility
54*bbf21555SRichard Lowe  - snoop(8): Solaris network packet capture and inspection
55d04ccbb3Scarlsonj  - "DHCPv6 Client High-Level Design"
56d04ccbb3Scarlsonj
57d04ccbb3Scarlsonj  Several terms from those documents (such as the DHCPv6 IA_NA and
58d04ccbb3Scarlsonj  IAADDR options) are used without further explanation in this
59d04ccbb3Scarlsonj  document; see the reference documents above for details.
60d04ccbb3Scarlsonj
61d04ccbb3Scarlsonj  The overall plan is to enhance the existing Solaris dhcpagent so
62d04ccbb3Scarlsonj  that it is able to process DHCPv6.  It would also have been possible
63d04ccbb3Scarlsonj  to create a new, separate daemon process for this, or to integrate
64d04ccbb3Scarlsonj  the feature into in.ndpd.  These alternatives, and the reason for
65d04ccbb3Scarlsonj  the chosen design, are discussed in Appendix A.
66d04ccbb3Scarlsonj
67d04ccbb3Scarlsonj  This document discusses the internal design issues involved in the
68d04ccbb3Scarlsonj  protocol implementation, and with the associated components (such as
69d04ccbb3Scarlsonj  in.ndpd, snoop, and the kernel's source address selection
70d04ccbb3Scarlsonj  algorithm).  It does not discuss the details of the protocol itself,
71d04ccbb3Scarlsonj  which are more than adequately described in the RFC, nor the
72d04ccbb3Scarlsonj  individual lines of code, which will be in the code review.
73d04ccbb3Scarlsonj
74d04ccbb3Scarlsonj  As a cross-reference, Appendix B has a summary of the components
75d04ccbb3Scarlsonj  involved and the changes to each.
76d04ccbb3Scarlsonj
77d04ccbb3Scarlsonj
78d04ccbb3ScarlsonjBackground
79d04ccbb3Scarlsonj
80d04ccbb3Scarlsonj  In order to discuss the design changes for DHCPv6, it's necessary
81d04ccbb3Scarlsonj  first to talk about the current IPv4-only design, and the
82d04ccbb3Scarlsonj  assumptions built into that design.
83d04ccbb3Scarlsonj
84d04ccbb3Scarlsonj  The main data structure used in dhcpagent is the 'struct ifslist'.
85d04ccbb3Scarlsonj  Each instance of this structure represents a Solaris logical IP
86d04ccbb3Scarlsonj  interface under DHCP's control.  It also represents the shared state
87d04ccbb3Scarlsonj  with the DHCP server that granted the address, the address itself,
88d04ccbb3Scarlsonj  and copies of the negotiated options.
89d04ccbb3Scarlsonj
90d04ccbb3Scarlsonj  There is one list in dhcpagent containing all of the IP interfaces
91d04ccbb3Scarlsonj  that are under DHCP control.  IP interfaces not under DHCP control
92d04ccbb3Scarlsonj  (for example, those that are statically addressed) are not included
93d04ccbb3Scarlsonj  in this list, even when plumbed on the system.  These ifslist
94d04ccbb3Scarlsonj  entries are chained like this:
95d04ccbb3Scarlsonj
96d04ccbb3Scarlsonj  ifsheadp -> ifslist -> ifslist -> ifslist -> NULL
97d04ccbb3Scarlsonj	        net0	  net0:1     net1
98d04ccbb3Scarlsonj
99d04ccbb3Scarlsonj  Each ifslist entry contains the address, mask, lease information,
100d04ccbb3Scarlsonj  interface name, hardware information, packets, protocol state, and
101d04ccbb3Scarlsonj  timers.  The name of the logical IP interface under DHCP's control
102d04ccbb3Scarlsonj  is also the name used in the administrative interfaces (dhcpinfo,
103d04ccbb3Scarlsonj  ifconfig) and when logging events.
104d04ccbb3Scarlsonj
105d04ccbb3Scarlsonj  Each entry holds open a DLPI stream and two sockets.  The DLPI
106d04ccbb3Scarlsonj  stream is nulled-out with a filter when not in use, but still
107d04ccbb3Scarlsonj  consumes system resources.  (Most significantly, it causes data
108d04ccbb3Scarlsonj  copies in the driver layer that end up sapping performance.)
109d04ccbb3Scarlsonj
110d04ccbb3Scarlsonj  The entry storage is managed by a insert/hold/release/remove model
111d04ccbb3Scarlsonj  and reference counts.  In this model, insert_ifs() allocates a new
112d04ccbb3Scarlsonj  ifslist entry and inserts it into the global list, with the global
113d04ccbb3Scarlsonj  list holding a reference.  remove_ifs() removes it from the global
114d04ccbb3Scarlsonj  list and drops that reference.  hold_ifs() and release_ifs() are
115d04ccbb3Scarlsonj  used by data structures that refer to ifslist entries, such as timer
116d04ccbb3Scarlsonj  entries, to make sure that the ifslist entry isn't freed until the
117d04ccbb3Scarlsonj  timer has been dispatched or deleted.
118d04ccbb3Scarlsonj
119d04ccbb3Scarlsonj  The design is single-threaded, so code that walks the global list
120d04ccbb3Scarlsonj  needn't bother taking holds on the ifslist structure.  Only
121d04ccbb3Scarlsonj  references that may be used at a different time (i.e., pointers
122d04ccbb3Scarlsonj  stored in other data structures) need to be recorded.
123d04ccbb3Scarlsonj
124d04ccbb3Scarlsonj  Packets are handled using PKT (struct dhcp; <netinet/dhcp.h>),
125d04ccbb3Scarlsonj  PKT_LIST (struct dhcp_list; <dhcp_impl.h>), and dhcp_pkt_t (struct
126d04ccbb3Scarlsonj  dhcp_pkt; "packet.h").  PKT is just the RFC 2131 DHCP packet
127d04ccbb3Scarlsonj  structure, and has no additional information, such as packet length.
128d04ccbb3Scarlsonj  PKT_LIST contains a PKT pointer, length, decoded option arrays, and
129d04ccbb3Scarlsonj  linkage for putting the packet in a list.  Finally, dhcp_pkt_t has a
130d04ccbb3Scarlsonj  PKT pointer and length values suitable for modifying the packet.
131d04ccbb3Scarlsonj
132d04ccbb3Scarlsonj  Essentially, PKT_LIST is a wrapper for received packets, and
133d04ccbb3Scarlsonj  dhcp_pkt_t is a wrapper for packets to be sent.
134d04ccbb3Scarlsonj
135d04ccbb3Scarlsonj  The basic PKT structure is used in dhcpagent, inetboot, in.dhcpd,
136bf5d9f18SAndy Fiddaman  libdhcpagent, libdhcputil, and others.  PKT_LIST is used
137d04ccbb3Scarlsonj  in a similar set of places, including the kernel NFS modules.
138d04ccbb3Scarlsonj  dhcp_pkt_t is (as the header file implies) limited to dhcpagent.
139d04ccbb3Scarlsonj
140d04ccbb3Scarlsonj  In addition to these structures, dhcpagent maintains a set of
141d04ccbb3Scarlsonj  internal supporting abstractions.  Two key ones involved in this
142d04ccbb3Scarlsonj  project are the "async operation" and the "IPC action."  An async
143d04ccbb3Scarlsonj  operation encapsulates the actions needed for a given operation, so
144d04ccbb3Scarlsonj  that if cancellation is needed, there's a single point where the
145d04ccbb3Scarlsonj  associated resources can be freed.  An IPC action represents the
146d04ccbb3Scarlsonj  user state related to the private interface used by ifconfig.
147d04ccbb3Scarlsonj
148d04ccbb3Scarlsonj
149d04ccbb3ScarlsonjDHCPv6 Inherent Differences
150d04ccbb3Scarlsonj
151d04ccbb3Scarlsonj  DHCPv6 naturally has some commonality with IPv4 DHCP, but also has
152d04ccbb3Scarlsonj  some significant differences.
153d04ccbb3Scarlsonj
154d04ccbb3Scarlsonj  Unlike IPv4 DHCP, DHCPv6 relies on link-local IP addresses to do its
155d04ccbb3Scarlsonj  work.  This means that, on Solaris, the client doesn't need DLPI to
156d04ccbb3Scarlsonj  perform any of the I/O; regular IP sockets will do the job.  It also
157d04ccbb3Scarlsonj  means that, unlike IPv4 DHCP, DHCPv6 does not need to obtain a lease
158d04ccbb3Scarlsonj  for the address used in its messages to the server.  The system
159d04ccbb3Scarlsonj  provides the address automatically.
160d04ccbb3Scarlsonj
161d04ccbb3Scarlsonj  IPv4 DHCP expects some messages from the server to be broadcast.
162d04ccbb3Scarlsonj  DHCPv6 has no such mechanism; all messages from the server to the
163d04ccbb3Scarlsonj  client are unicast.  In the case where the client and server aren't
164d04ccbb3Scarlsonj  on the same subnet, a relay agent is used to get the unicast replies
165d04ccbb3Scarlsonj  back to the client's link-local address.
166d04ccbb3Scarlsonj
167d04ccbb3Scarlsonj  With IPv4 DHCP, a single address plus configuration options is
168d04ccbb3Scarlsonj  leased with a given client ID and a single state machine instance,
169d04ccbb3Scarlsonj  and the implementation binds that to a single IP logical interface
170d04ccbb3Scarlsonj  specified by the user.  The lease has a "Lease Time," a required
171d04ccbb3Scarlsonj  option, as well as two timers, called T1 (renew) and T2 (rebind),
172d04ccbb3Scarlsonj  which are controlled by regular options.
173d04ccbb3Scarlsonj
174d04ccbb3Scarlsonj  DHCPv6 uses a single client/server session to control the
175d04ccbb3Scarlsonj  acquisition of configuration options and "identity associations"
176d04ccbb3Scarlsonj  (IAs).  The identity associations, in turn, contain lists of
177d04ccbb3Scarlsonj  addresses for the client to use and the T1/T2 timer values.  Each
178d04ccbb3Scarlsonj  individual address has its own preferred and valid lifetime, with
179d04ccbb3Scarlsonj  the address being marked "deprecated" at the end of the preferred
180d04ccbb3Scarlsonj  interval, and removed at the end of the valid interval.
181d04ccbb3Scarlsonj
182d04ccbb3Scarlsonj  IPv4 DHCP leaves many of the retransmit decisions up to the client,
183d04ccbb3Scarlsonj  and some things (such as RELEASE and DECLINE) are sent just once.
184d04ccbb3Scarlsonj  Others (such as the REQUEST message used for renew and rebind) are
185d04ccbb3Scarlsonj  dealt with by heuristics.  DHCPv6 treats each message to the server
186d04ccbb3Scarlsonj  as a separate transaction, and resends each message using a common
187d04ccbb3Scarlsonj  retransmission mechanism.  DHCPv6 also has separate messages for
188d04ccbb3Scarlsonj  Renew, Rebind, and Confirm rather than reusing the Request
189d04ccbb3Scarlsonj  mechanism.
190d04ccbb3Scarlsonj
191d04ccbb3Scarlsonj  The set of options (which are used to convey configuration
192d04ccbb3Scarlsonj  information) for each protocol are distinct.  Notably, two of the
193d04ccbb3Scarlsonj  mistakes from IPv4 DHCP have been fixed: DHCPv6 doesn't carry a
194d04ccbb3Scarlsonj  client name, and doesn't attempt to impersonate a routing protocol
195d04ccbb3Scarlsonj  by setting a "default route."
196d04ccbb3Scarlsonj
197d04ccbb3Scarlsonj  Another welcome change is the lack of a netmask/prefix length with
198d04ccbb3Scarlsonj  DHCPv6.  Instead, the client uses the Router Advertisement prefixes
199d04ccbb3Scarlsonj  to set the correct interface netmask.  This reduces the number of
200d04ccbb3Scarlsonj  databases that need to be kept in sync.  (The equivalent mechanism
201d04ccbb3Scarlsonj  in IPv4 would have been the use of ICMP Address Mask Request /
202d04ccbb3Scarlsonj  Reply, but the BOOTP designers chose to embed it in the address
203d04ccbb3Scarlsonj  assignment protocol itself.)
204d04ccbb3Scarlsonj
205d04ccbb3Scarlsonj  Otherwise, DHCPv6 is similar to IPv4 DHCP.  The same overall
206d04ccbb3Scarlsonj  renew/rebind and lease expiry strategy is used, although the state
207d04ccbb3Scarlsonj  machine events must now take into account multiple IAs and the fact
208d04ccbb3Scarlsonj  that each can cause RENEWING or REBINDING state independently.
209d04ccbb3Scarlsonj
210d04ccbb3Scarlsonj
211d04ccbb3ScarlsonjDHCPv6 And Solaris
212d04ccbb3Scarlsonj
213d04ccbb3Scarlsonj  The protocol distinctions above have several important implications.
214d04ccbb3Scarlsonj  For the logical interfaces:
215d04ccbb3Scarlsonj
216d04ccbb3Scarlsonj    - Because Solaris uses IP logical interfaces to configure
217d04ccbb3Scarlsonj      addresses, we must have multiple IP logical interfaces per IA
218d04ccbb3Scarlsonj      with IPv6.
219d04ccbb3Scarlsonj
220d04ccbb3Scarlsonj    - Because we need to support multiple addresses (and thus multiple
221d04ccbb3Scarlsonj      IP logical interfaces) per IA and multiple IAs per client/server
222d04ccbb3Scarlsonj      session, the IP logical interface name isn't a unique name for
223d04ccbb3Scarlsonj      the lease.
224d04ccbb3Scarlsonj
225d04ccbb3Scarlsonj  As a result, IP logical interfaces will come and go with DHCPv6,
226d04ccbb3Scarlsonj  just as happens with the existing stateless address
227d04ccbb3Scarlsonj  autoconfiguration support in in.ndpd.  The logical interface names
228d04ccbb3Scarlsonj  (visible in ifconfig) have no administrative significance.
229d04ccbb3Scarlsonj
230d04ccbb3Scarlsonj  Fortunately, DHCPv6 does end up with one fixed name that can be used
231d04ccbb3Scarlsonj  to identify a session.  Because DHCPv6 uses link local addresses for
232d04ccbb3Scarlsonj  communication with the server, the name of the IP logical interface
233d04ccbb3Scarlsonj  that has this link local address (normally the same as the IP
234d04ccbb3Scarlsonj  physical interface) can be used as an identifier for dhcpinfo and
235d04ccbb3Scarlsonj  logging purposes.
236d04ccbb3Scarlsonj
237d04ccbb3Scarlsonj
238d04ccbb3ScarlsonjDhcpagent Redesign Overview
239d04ccbb3Scarlsonj
240d04ccbb3Scarlsonj  The redesign starts by refactoring the IP interface representation.
241d04ccbb3Scarlsonj  Because we need to have multiple IP logical interfaces (LIFs) for a
242d04ccbb3Scarlsonj  single identity association (IA), we should not store all of the
243d04ccbb3Scarlsonj  DHCP state information along with the LIF information.
244d04ccbb3Scarlsonj
245d04ccbb3Scarlsonj  For DHCPv6, we will need to keep LIFs on a single IP physical
246d04ccbb3Scarlsonj  interface (PIF) together, so this is probably also a good time to
247d04ccbb3Scarlsonj  reconsider the way dhcpagent represents physical interfaces.  The
248d04ccbb3Scarlsonj  current design simply replicates the state (notably the DLPI stream,
249d04ccbb3Scarlsonj  but also the hardware address and other bits) among all of the
250d04ccbb3Scarlsonj  ifslist entries on the same physical interface.
251d04ccbb3Scarlsonj
252d04ccbb3Scarlsonj  The new design creates two lists of dhcp_pif_t entries, one list for
253d04ccbb3Scarlsonj  IPv4 and the other for IPv6.  Each dhcp_pif_t represents a PIF, with
254d04ccbb3Scarlsonj  a list of dhcp_lif_t entries attached, each of which represents a
255d04ccbb3Scarlsonj  LIF used by dhcpagent.  This structure mirrors the kernel's ill_t
256d04ccbb3Scarlsonj  and ipif_t interface representations.
257d04ccbb3Scarlsonj
258d04ccbb3Scarlsonj  Next, the lease-tracking needs to be refactored.  DHCPv6 is the
259d04ccbb3Scarlsonj  functional superset in this case, as it has two lifetimes per
260d04ccbb3Scarlsonj  address (LIF) and IA groupings with shared T1/T2 timers.  To
261d04ccbb3Scarlsonj  represent these groupings, we will use a new dhcp_lease_t structure.
262d04ccbb3Scarlsonj  IPv4 DHCP will have one such structure per state machine, while
263d04ccbb3Scarlsonj  DHCPv6 will have a list.  (Note: the initial implementation will
264d04ccbb3Scarlsonj  have only one lease per DHCPv6 state machine, because each state
265d04ccbb3Scarlsonj  machine uses a single link-local address, a single DUID+IAID pair,
266d04ccbb3Scarlsonj  and supports only Non-temporary Addresses [IA_NA option].  Future
267d04ccbb3Scarlsonj  enhancements may use multiple leases per DHCPv6 state machine or
268d04ccbb3Scarlsonj  support other IA types.)
269d04ccbb3Scarlsonj
270d04ccbb3Scarlsonj  For all of these new structures, we will use the same insert/hold/
271d04ccbb3Scarlsonj  release/remove model as with the original ifslist.
272d04ccbb3Scarlsonj
273d04ccbb3Scarlsonj  Finally, the remaining items (and the bulk of the original ifslist
274d04ccbb3Scarlsonj  members) are kept on a per-state-machine basis.  As this is no
275d04ccbb3Scarlsonj  longer just an "interface," a new dhcp_smach_t structure will hold
276d04ccbb3Scarlsonj  these, and the ifslist structure is gone.
277d04ccbb3Scarlsonj
278d04ccbb3Scarlsonj
279d04ccbb3ScarlsonjLease Representation
280d04ccbb3Scarlsonj
281d04ccbb3Scarlsonj  For DHCPv6, we need to track multiple LIFs per lease (IA), but we
282d04ccbb3Scarlsonj  also need multiple LIFs per PIF.  Rather than having two sets of
283d04ccbb3Scarlsonj  list linkage for each LIF, we can observe that a LIF is on exactly
284d04ccbb3Scarlsonj  one PIF and is a member of at most one lease, and then simplify: the
285d04ccbb3Scarlsonj  lease structure will use a base pointer for the first LIF in the
286d04ccbb3Scarlsonj  lease, and a count for the number of consecutive LIFs in the PIF's
287d04ccbb3Scarlsonj  list of LIFs that belong to the lease.
288d04ccbb3Scarlsonj
289d04ccbb3Scarlsonj  When removing a LIF from the system, we need to decrement the count
290d04ccbb3Scarlsonj  of LIFs in the lease, and advance the base pointer if the LIF being
291d04ccbb3Scarlsonj  removed is the first one.  Inserting a LIF means just moving it into
292d04ccbb3Scarlsonj  this list and bumping the counter.
293d04ccbb3Scarlsonj
294d04ccbb3Scarlsonj  When removing a lease from a state machine, we need to dispose of
295d04ccbb3Scarlsonj  the LIFs referenced.  If the LIF being disposed is the main LIF for
296d04ccbb3Scarlsonj  a state machine, then all that we can do is canonize the LIF
297d04ccbb3Scarlsonj  (returning it to a default state); this represents the normal IPv4
298d04ccbb3Scarlsonj  DHCP operation on lease expiry.  Otherwise, the lease is the owner
299d04ccbb3Scarlsonj  of that LIF (it was created because of a DHCPv6 IA), and disposal
300d04ccbb3Scarlsonj  means unplumbing the LIF from the actual system and removing the LIF
301d04ccbb3Scarlsonj  entry from the PIF.
302d04ccbb3Scarlsonj
303d04ccbb3Scarlsonj
304d04ccbb3ScarlsonjMain Structure Linkage
305d04ccbb3Scarlsonj
306d04ccbb3Scarlsonj  For IPv4 DHCP, the new linkage is straightforward.  Using the same
307d04ccbb3Scarlsonj  system configuration example as in the initial design discussion:
308d04ccbb3Scarlsonj
309d04ccbb3Scarlsonj          +- lease  +- lease       +- lease
310d04ccbb3Scarlsonj          |  ^      |  ^           |  ^
311d04ccbb3Scarlsonj          |  |      |  |           |  |
312d04ccbb3Scarlsonj          \  smach  \  smach       \  smach
313d04ccbb3Scarlsonj           \ ^|      \ ^|           \ ^|
314d04ccbb3Scarlsonj            v|v       v|v            v|v
315d04ccbb3Scarlsonj            lif ----> lif -> NULL     lif -> NULL
316d04ccbb3Scarlsonj            net0      net0:1          net1
317d04ccbb3Scarlsonj            ^                         ^
318d04ccbb3Scarlsonj            |                         |
319d04ccbb3Scarlsonj  v4root -> pif --------------------> pif -> NULL
320d04ccbb3Scarlsonj            net0                      net1
321d04ccbb3Scarlsonj
322d04ccbb3Scarlsonj  This diagram shows three separate state machines running (with
323d04ccbb3Scarlsonj  backpointers omitted for clarity).  Each state machine has a single
324d04ccbb3Scarlsonj  "main" LIF with which it's associated (and named).  Each also has a
325d04ccbb3Scarlsonj  single lease structure that points back to the same LIF (count of
326d04ccbb3Scarlsonj  1), because IPv4 DHCP controls a single address allocation per state
327d04ccbb3Scarlsonj  machine.
328d04ccbb3Scarlsonj
329d04ccbb3Scarlsonj  DHCPv6 is a bit more complex.  This shows DHCPv6 running on two
330d04ccbb3Scarlsonj  interfaces (more or fewer interfaces are of course possible) and
331d04ccbb3Scarlsonj  with multiple leases on the first interface, and each lease with
332d04ccbb3Scarlsonj  multiple addresses (one with two addresses, the second with one).
333d04ccbb3Scarlsonj
334d04ccbb3Scarlsonj            lease ----------------> lease -> NULL   lease -> NULL
335d04ccbb3Scarlsonj            ^   \(2)                |(1)            ^   \ (1)
336d04ccbb3Scarlsonj            |    \                  |               |    \
337d04ccbb3Scarlsonj            smach \                 |               smach \
338d04ccbb3Scarlsonj            ^ |    \                |               ^ |    \
339d04ccbb3Scarlsonj            | v     v               v               | v     v
340d04ccbb3Scarlsonj            lif --> lif --> lif --> lif --> NULL    lif --> lif -> NULL
341d04ccbb3Scarlsonj            net0    net0:1  net0:4  net0:2          net1    net1:5
342d04ccbb3Scarlsonj            ^                                       ^
343d04ccbb3Scarlsonj            |                                       |
344d04ccbb3Scarlsonj  v6root -> pif ----------------------------------> pif -> NULL
345d04ccbb3Scarlsonj            net0                                    net1
346d04ccbb3Scarlsonj
347d04ccbb3Scarlsonj  Note that there's intentionally no ordering based on name in the
348d04ccbb3Scarlsonj  list of LIFs.  Instead, the contiguous LIF structures in that list
349d04ccbb3Scarlsonj  represent the addresses in each lease.  The logical interfaces
350d04ccbb3Scarlsonj  themselves are allocated and numbered by the system kernel, so they
351d04ccbb3Scarlsonj  may not be sequential, and there may be gaps in the list if other
352d04ccbb3Scarlsonj  entities (such as in.ndpd) are also configuring interfaces.
353d04ccbb3Scarlsonj
354d04ccbb3Scarlsonj  Note also that with IPv4 DHCP, the lease points to the LIF that's
355d04ccbb3Scarlsonj  also the main LIF for the state machine, because that's the IP
356d04ccbb3Scarlsonj  interface that dhcpagent controls.  With DHCPv6, the lease (one per
357d04ccbb3Scarlsonj  IA structure) points to a separate set of LIFs that are created just
358d04ccbb3Scarlsonj  for the leased addresses (one per IA address in an IAADDR option).
359d04ccbb3Scarlsonj  The state machine alone points to the main LIF.
360d04ccbb3Scarlsonj
361d04ccbb3Scarlsonj
362d04ccbb3ScarlsonjPacket Structure Extensions
363d04ccbb3Scarlsonj
364d04ccbb3Scarlsonj  Obviously, we need some DHCPv6 packet data structures and
365d04ccbb3Scarlsonj  definitions.  A new <netinet/dhcp6.h> file will be introduced with
366d04ccbb3Scarlsonj  the necessary #defines and structures.  The key structure there will
367d04ccbb3Scarlsonj  be:
368d04ccbb3Scarlsonj
369d04ccbb3Scarlsonj	struct dhcpv6_message {
370d04ccbb3Scarlsonj		uint8_t		d6m_msg_type;
371d04ccbb3Scarlsonj		uint8_t		d6m_transid_ho;
372d04ccbb3Scarlsonj		uint16_t	d6m_transid_lo;
373d04ccbb3Scarlsonj	};
374d04ccbb3Scarlsonj	typedef	struct dhcpv6_message	dhcpv6_message_t;
375d04ccbb3Scarlsonj
376d04ccbb3Scarlsonj  This defines the usual (non-relay) DHCPv6 packet header, and is
377d04ccbb3Scarlsonj  roughly equivalent to PKT for IPv4.
378d04ccbb3Scarlsonj
379d04ccbb3Scarlsonj  Extending dhcp_pkt_t for DHCPv6 is straightforward, as it's used
380d04ccbb3Scarlsonj  only within dhcpagent.  This structure will be amended to use a
381d04ccbb3Scarlsonj  union for v4/v6 and include a boolean to flag which version is in
382d04ccbb3Scarlsonj  use.
383d04ccbb3Scarlsonj
384d04ccbb3Scarlsonj  For the PKT_LIST structure, things are more complex.  This defines
385d04ccbb3Scarlsonj  both a queuing mechanism for received packets (typically OFFERs) and
386d04ccbb3Scarlsonj  a set of packet decoding structures.  The decoding structures are
387d04ccbb3Scarlsonj  highly specific to IPv4 DHCP -- they have no means to handle nested
388d04ccbb3Scarlsonj  or repeated options (as used heavily in DHCPv6) and make use of the
389d04ccbb3Scarlsonj  DHCP_OPT structure which is specific to IPv4 DHCP -- and are
390d04ccbb3Scarlsonj  somewhat expensive in storage, due to the use of arrays indexed by
391d04ccbb3Scarlsonj  option code number.
392d04ccbb3Scarlsonj
393d04ccbb3Scarlsonj  Worse, this structure is used throughout the system, so changes to
394d04ccbb3Scarlsonj  it need to be made carefully.  (For example, the existing 'pkt'
395d04ccbb3Scarlsonj  member can't just be turned into a union.)
396d04ccbb3Scarlsonj
397d04ccbb3Scarlsonj  For an initial prototype, since discarded, I created a new
398d04ccbb3Scarlsonj  dhcp_plist_t structure to represent packet lists as used inside
399d04ccbb3Scarlsonj  dhcpagent and made dhcp_pkt_t valid for use on input and output.
400d04ccbb3Scarlsonj  The result is unsatisfying, though, as it results in code that
401d04ccbb3Scarlsonj  manipulates far too many data structures in common cases; it's a sea
402d04ccbb3Scarlsonj  of pointers to pointers.
403d04ccbb3Scarlsonj
404d04ccbb3Scarlsonj  The better answer is to use PKT_LIST for both IPv4 and IPv6, adding
405d04ccbb3Scarlsonj  the few new bits of metadata required to the end (receiving ifIndex,
406d04ccbb3Scarlsonj  packet source/destination addresses), and staying within the overall
407d04ccbb3Scarlsonj  existing design.
408d04ccbb3Scarlsonj
409d04ccbb3Scarlsonj  For option parsing, dhcpv6_find_option() and dhcpv6_pkt_option()
410d04ccbb3Scarlsonj  functions will be added to libdhcputil.  The former function will
411d04ccbb3Scarlsonj  walk a DHCPv6 option list, and provide safe (bounds-checked) access
412d04ccbb3Scarlsonj  to the options inside.  The function can be called recursively, so
413d04ccbb3Scarlsonj  that option nesting can be handled fairly simply by nested loops,
414d04ccbb3Scarlsonj  and can be called repeatedly to return each instance of a given
415d04ccbb3Scarlsonj  option code number.  The latter function is just a convenience
416d04ccbb3Scarlsonj  wrapper on dhcpv6_find_option() that starts with a PKT_LIST pointer
417d04ccbb3Scarlsonj  and iterates over the top-level options with a given code number.
418d04ccbb3Scarlsonj
419d04ccbb3Scarlsonj  There are two special considerations for the use of these library
420d04ccbb3Scarlsonj  interfaces: there's no "pad" option for DHCPv6 or alignment
421d04ccbb3Scarlsonj  requirements on option headers or contents, and nested options
422d04ccbb3Scarlsonj  always follow a structure that has type-dependent length.  This
423d04ccbb3Scarlsonj  means that code that handles options must all be written to deal
424d04ccbb3Scarlsonj  with unaligned data, and suboption code must index the pointer past
425d04ccbb3Scarlsonj  the type-dependent part.
426d04ccbb3Scarlsonj
427d04ccbb3Scarlsonj
428d04ccbb3ScarlsonjPacket Construction
429d04ccbb3Scarlsonj
430d04ccbb3Scarlsonj  Unlike DHCPv4, DHCPv6 places the transaction timer value in an
431d04ccbb3Scarlsonj  option.  The existing code sets the current time value in
432d04ccbb3Scarlsonj  send_pkt_internal(), which allows it to be updated in a
433d04ccbb3Scarlsonj  straightforward way when doing retransmits.
434d04ccbb3Scarlsonj
435d04ccbb3Scarlsonj  To make this work in a simple manner for DHCPv6, I added a
436d04ccbb3Scarlsonj  remove_pkt_opt() function.  The update logic just does a remove and
437d04ccbb3Scarlsonj  re-adds the option.  We could also just assume the presence of the
438d04ccbb3Scarlsonj  option, find it, and modify in place, but the remove feature seems
439d04ccbb3Scarlsonj  more general.
440d04ccbb3Scarlsonj
441d04ccbb3Scarlsonj  DHCPv6 uses nesting options.  To make this work, two new utility
442d04ccbb3Scarlsonj  functions are needed.  First, an add_pkt_subopt() function will take
443d04ccbb3Scarlsonj  a pointer to an existing option and add an embedded option within
444d04ccbb3Scarlsonj  it.  The packet length and existing option length are updated.  If
445d04ccbb3Scarlsonj  that existing option isn't a top-level option, though, this means
446d04ccbb3Scarlsonj  that the caller must update the lengths of all of the enclosing
447d04ccbb3Scarlsonj  options up to the top level.  To do this, update_v6opt_len() will be
448d04ccbb3Scarlsonj  added.  This is used in the special case of adding a Status Code
449d04ccbb3Scarlsonj  option to an IAADDR option within an IA_NA top-level option.
450d04ccbb3Scarlsonj
451d04ccbb3Scarlsonj
452d04ccbb3ScarlsonjSockets and I/O Handling
453d04ccbb3Scarlsonj
454d04ccbb3Scarlsonj  DHCPv6 doesn't need or use either a DLPI or a broadcast IP socket.
455d04ccbb3Scarlsonj  Instead, a single unicast-bound IP socket on a link-local address
456d04ccbb3Scarlsonj  would be the most that is needed.  This is roughly equivalent to
457d04ccbb3Scarlsonj  if_sock_ip_fd in the existing design, but that existing socket is
458d04ccbb3Scarlsonj  bound only after DHCP reaches BOUND state -- that is, when it
459d04ccbb3Scarlsonj  switches away from DLPI.  We need something different.
460d04ccbb3Scarlsonj
461d04ccbb3Scarlsonj  This, along with the excess of open file descriptors in an otherwise
462d04ccbb3Scarlsonj  idle daemon and the potentially serious performance problems in
463d04ccbb3Scarlsonj  leaving DLPI open at all times, argues for a larger redesign of the
464d04ccbb3Scarlsonj  I/O logic in dhcpagent.
465d04ccbb3Scarlsonj
466d04ccbb3Scarlsonj  The first thing that we can do is eliminate the need for the
467d04ccbb3Scarlsonj  per-ifslist if_sock_fd.  This is used primarily for issuing ioctls
468d04ccbb3Scarlsonj  to configure interfaces -- a task that would work as well with any
469d04ccbb3Scarlsonj  open socket -- and is also registered to receive any ACK/NAK packets
470d04ccbb3Scarlsonj  that may arrive via broadcast.  Both of these can be eliminated by
471d04ccbb3Scarlsonj  creating a pair of global sockets (IPv4 and IPv6), bound and
472d04ccbb3Scarlsonj  configured for ACK/NAK reception.  The only functional difference is
473d04ccbb3Scarlsonj  that the list of running state machines must be scanned on reception
474d04ccbb3Scarlsonj  to find the correct transaction ID, but the existing design
475d04ccbb3Scarlsonj  effectively already goes to this effort because the kernel
476d04ccbb3Scarlsonj  replicates received datagrams among all matching sockets, and each
477d04ccbb3Scarlsonj  ifslist entry has a socket open.
478d04ccbb3Scarlsonj
479d04ccbb3Scarlsonj  (The existing code for if_sock_fd makes oblique reference to unknown
480d04ccbb3Scarlsonj  problems in the system that may prevent binding from working in some
481d04ccbb3Scarlsonj  cases.  The reference dates back some seven years to the original
482d04ccbb3Scarlsonj  DHCP implementation.  I've observed no such problems in extensive
483d04ccbb3Scarlsonj  testing and if any do show up, they will be dealt with by fixing the
484d04ccbb3Scarlsonj  underlying bugs.)
485d04ccbb3Scarlsonj
486d04ccbb3Scarlsonj  This leads to an important simplification: it's no longer necessary
487d04ccbb3Scarlsonj  to register, unregister, and re-register for packet reception while
488d04ccbb3Scarlsonj  changing state -- register_acknak() and unregister_acknak() are
489d04ccbb3Scarlsonj  gone.  Instead, we always receive, and we dispatch the packets as
490d04ccbb3Scarlsonj  they arrive.  As a result, when receiving a DHCPv4 ACK or DHCPv6
491d04ccbb3Scarlsonj  Reply when in BOUND state, we know it's a duplicate, and we can
492d04ccbb3Scarlsonj  discard.
493d04ccbb3Scarlsonj
494d04ccbb3Scarlsonj  The next part is in minimizing DLPI usage.  A DLPI stream is needed
495d04ccbb3Scarlsonj  at most for each IPv4 PIF, and it's not needed when all of the
496d04ccbb3Scarlsonj  DHCP instances on that PIF are bound.  In fact, the current
497d04ccbb3Scarlsonj  implementation deals with this in configure_bound() by setting a
498d04ccbb3Scarlsonj  "blackhole" packet filter.  The stream is left open.
499d04ccbb3Scarlsonj
500d04ccbb3Scarlsonj  To simplify this, we will open at most one DLPI stream on a PIF, and
501d04ccbb3Scarlsonj  use reference counts from the state machines to determine when the
502d04ccbb3Scarlsonj  stream must be open and when it can be closed.  This mechanism will
503d04ccbb3Scarlsonj  be centralized in a set_smach_state() function that changes the
504d04ccbb3Scarlsonj  state and opens/closes the DLPI stream when needed.
505d04ccbb3Scarlsonj
506d04ccbb3Scarlsonj  This leads to another simplification.  The I/O logic in the existing
507d04ccbb3Scarlsonj  dhcpagent makes use of the protocol state to select between DLPI and
508d04ccbb3Scarlsonj  sockets.  Now that we keep track of this in a simpler manner, we no
509d04ccbb3Scarlsonj  longer need to switch out on state in when sending a packet; just
510d04ccbb3Scarlsonj  test the dsm_using_dlpi flag instead.
511d04ccbb3Scarlsonj
512d04ccbb3Scarlsonj  Still another simplification is in the handling of DHCPv4 INFORM.
513d04ccbb3Scarlsonj  The current code has separate logic in it for getting the interface
514d04ccbb3Scarlsonj  state and address information.  This is no longer necessary, as the
515d04ccbb3Scarlsonj  LIF mechanism keeps track of the interface state.  And since we have
516d04ccbb3Scarlsonj  separate lease structures, and INFORM doesn't acquire a lease, we no
517d04ccbb3Scarlsonj  longer have to be careful about canonizing the interface on
518d04ccbb3Scarlsonj  shutdown.
519d04ccbb3Scarlsonj
520d04ccbb3Scarlsonj  Although the default is to send all client messages to a well-known
521d04ccbb3Scarlsonj  multicast address for servers and relays, DHCPv6 also has a
522d04ccbb3Scarlsonj  mechanism that allows the client to send unicast messages to the
523d04ccbb3Scarlsonj  server.  The operation of this mechanism is slightly complex.
524d04ccbb3Scarlsonj  First, the server sends the client a unicast address via an option.
525d04ccbb3Scarlsonj  We may use this address as the destination (rather than the
526d04ccbb3Scarlsonj  well-known multicast address for local DHCPv6 servers and relays)
527d04ccbb3Scarlsonj  only if we have a viable local source address.  This means using
528d04ccbb3Scarlsonj  SIOCGDSTINFO each time we try to send unicast.  Next, the server may
529d04ccbb3Scarlsonj  send back a special status code: UseMulticast.  If this is received,
530d04ccbb3Scarlsonj  and if we were actually using unicast in our messages to the server,
531d04ccbb3Scarlsonj  then we need to forget the unicast address, switch back to
532d04ccbb3Scarlsonj  multicast, and resend our last message.
533d04ccbb3Scarlsonj
534d04ccbb3Scarlsonj  Note that it's important to avoid the temptation to resend the last
535d04ccbb3Scarlsonj  message every time UseMulticast is seen, and do it only once on
536d04ccbb3Scarlsonj  switching back to multicast: otherwise, a potential feedback loop is
537d04ccbb3Scarlsonj  created.
538d04ccbb3Scarlsonj
539d04ccbb3Scarlsonj  Because IP_PKTINFO (PSARC 2006/466) has integrated, we could go a
540d04ccbb3Scarlsonj  step further by removing the need for any per-LIF sockets and just
541d04ccbb3Scarlsonj  use the global sockets for all but DLPI.  However, in order to
542d04ccbb3Scarlsonj  facilitate a Solaris 10 backport, this will be done separately as CR
543d04ccbb3Scarlsonj  6509317.
544d04ccbb3Scarlsonj
545d04ccbb3Scarlsonj  In the case of DHCPv6, we already have IPV6_PKTINFO, so we will pave
546d04ccbb3Scarlsonj  the way for IPv4 by beginning to using this now, and thus have just
547d04ccbb3Scarlsonj  a single socket (bound to "::") for all of DHCPv6.  Doing this
548d04ccbb3Scarlsonj  requires switching from the old BSD4.2 -lsocket -lnsl to the
549d04ccbb3Scarlsonj  standards-compliant -lxnet in order to use ancillary data.
550d04ccbb3Scarlsonj
551d04ccbb3Scarlsonj  It may also be possible to remove the need for DLPI for IPv4, and
552d04ccbb3Scarlsonj  incidentally simplify the code a fair amount, by adding a kernel
553d04ccbb3Scarlsonj  option to allow transmission and reception of UDP packets over
554d04ccbb3Scarlsonj  interfaces that are plumbed but not marked IFF_UP.  This is left for
555d04ccbb3Scarlsonj  future work.
556d04ccbb3Scarlsonj
557d04ccbb3Scarlsonj
558d04ccbb3ScarlsonjThe State Machine
559d04ccbb3Scarlsonj
560d04ccbb3Scarlsonj  Several parts of the existing state machine need additions to handle
561d04ccbb3Scarlsonj  DHCPv6, which is a superset of DHCPv4.
562d04ccbb3Scarlsonj
563d04ccbb3Scarlsonj  First, there are the RENEWING and REBINDING states.  For IPv4 DHCP,
564d04ccbb3Scarlsonj  these states map one-to-one with a single address and single lease
565d04ccbb3Scarlsonj  that's undergoing renewal.  It's a simple progression (on timeout)
566d04ccbb3Scarlsonj  from BOUND, to RENEWING, to REBINDING and finally back to SELECTING
567d04ccbb3Scarlsonj  to start over.  Each retransmit is done by simply rescheduling the
568d04ccbb3Scarlsonj  T1 or T2 timer.
569d04ccbb3Scarlsonj
570d04ccbb3Scarlsonj  For DHCPv6, things are somewhat more complex.  At any one time,
571d04ccbb3Scarlsonj  there may be multiple IAs (leases) that are effectively in renewing
572d04ccbb3Scarlsonj  or rebinding state, based on the T1/T2 timers for each IA, and many
573d04ccbb3Scarlsonj  addresses that have expired.
574d04ccbb3Scarlsonj
575d04ccbb3Scarlsonj  However, because all of the leases are related to a single server,
576d04ccbb3Scarlsonj  and that server either responds to our requests or doesn't, we can
577d04ccbb3Scarlsonj  simplify the states to be nearly identical to IPv4 DHCP.
578d04ccbb3Scarlsonj
579d04ccbb3Scarlsonj  The revised definition for use with DHCPv6 is:
580d04ccbb3Scarlsonj
581d04ccbb3Scarlsonj    - Transition from BOUND to RENEWING state when the first T1 timer
582d04ccbb3Scarlsonj      (of any lease on the state machine) expires.  At this point, as
583d04ccbb3Scarlsonj      an optimization, we should begin attempting to renew any IAs
584d04ccbb3Scarlsonj      that are within REN_TIMEOUT (10 seconds) of reaching T1 as well.
585d04ccbb3Scarlsonj      We may as well avoid sending an excess of packets.
586d04ccbb3Scarlsonj
587d04ccbb3Scarlsonj    - When a T1 lease timer expires and we're in RENEWING or REBINDING
588d04ccbb3Scarlsonj      state, just ignore it, because the transaction is already in
589d04ccbb3Scarlsonj      progress.
590d04ccbb3Scarlsonj
591d04ccbb3Scarlsonj    - At each retransmit timeout, we should check to see if there are
592d04ccbb3Scarlsonj      more IAs that need to join in because they've passed point T1 as
593d04ccbb3Scarlsonj      well, and, if so, add them.  This check isn't necessary at this
594d04ccbb3Scarlsonj      time, because only a single IA_NA is possible with the initial
595d04ccbb3Scarlsonj      design.
596d04ccbb3Scarlsonj
597d04ccbb3Scarlsonj    - When we reach T2 on any IA and we're in BOUND or RENEWING state,
598d04ccbb3Scarlsonj      enter REBINDING state.  At this point, we have a choice.  For
599d04ccbb3Scarlsonj      those other IAs that are past T1 but not yet at T2, we could
600d04ccbb3Scarlsonj      ignore them (sending only those that have passed point T2),
601d04ccbb3Scarlsonj      continue to send separate Renew messages for them, or just
602d04ccbb3Scarlsonj      include them in the Rebind message.  This isn't an issue that
603d04ccbb3Scarlsonj      must be dealt with for this project, but the plan is to include
604d04ccbb3Scarlsonj      them in the Rebind message.
605d04ccbb3Scarlsonj
606d04ccbb3Scarlsonj    - When a T2 lease timer expires and we're in REBINDING state, just
607d04ccbb3Scarlsonj      ignore it, as with the corresponding T1 timer.
608d04ccbb3Scarlsonj
609d04ccbb3Scarlsonj    - As addresses reach the end of their preferred lifetimes, set the
610d04ccbb3Scarlsonj      IFF_DEPRECATED flag.  As they reach the end of the valid
611d04ccbb3Scarlsonj      lifetime, remove them from the system.  When an IA (lease)
612d04ccbb3Scarlsonj      becomes empty, just remove it.  When there are no more leases
613d04ccbb3Scarlsonj      left, return to SELECTING state to start over.
614d04ccbb3Scarlsonj
615d04ccbb3Scarlsonj  Note that the RFC treats the IAs as separate entities when
616d04ccbb3Scarlsonj  discussing the renew/rebind T1/T2 timers, but treats them as a unit
617d04ccbb3Scarlsonj  when doing the initial negotiation.  This is, to say the least,
618d04ccbb3Scarlsonj  confusing, especially so given that there's no reason to expect that
619d04ccbb3Scarlsonj  after having failed to elicit any responses at all from the server
620d04ccbb3Scarlsonj  on one IA, the server will suddenly start responding when we attempt
621d04ccbb3Scarlsonj  to renew some other IA.  We rationalize this behavior by using a
622d04ccbb3Scarlsonj  single renew/rebind state for the entire state machine (and thus
623d04ccbb3Scarlsonj  client/server pair).
624d04ccbb3Scarlsonj
625d04ccbb3Scarlsonj  There's a subtle timing difference here between DHCPv4 and DHCPv6.
626d04ccbb3Scarlsonj  For DHCPv4, the client just sends packets more and more frequently
627d04ccbb3Scarlsonj  (shorter timeouts) as the next state gets nearer.  DHCPv6 treats
628d04ccbb3Scarlsonj  each as a transaction, using the same retransmit logic as for other
629d04ccbb3Scarlsonj  messages.  The DHCPv6 method is a cleaner design, so we will change
630d04ccbb3Scarlsonj  the DHCPv4 implementation to do the same, and compute the new timer
631d04ccbb3Scarlsonj  values as part of stop_extending().
632d04ccbb3Scarlsonj
633d04ccbb3Scarlsonj  Note that it would be possible to start the SELECTING state earlier
634d04ccbb3Scarlsonj  than waiting for the last lease to expire, and thus avoid a loss of
635d04ccbb3Scarlsonj  connectivity.  However, it this point, there are other servers on
636d04ccbb3Scarlsonj  the network that have seen us attempting to Rebind for quite some
637d04ccbb3Scarlsonj  time, and they have not responded.  The likelihood that there's a
638d04ccbb3Scarlsonj  server that will ignore Rebind but then suddenly spring into action
639d04ccbb3Scarlsonj  on a Solicit message seems low enough that the optimization won't be
640d04ccbb3Scarlsonj  done now.  (Starting SELECTING state earlier may be done in the
641d04ccbb3Scarlsonj  future, if it's found to be useful.)
642d04ccbb3Scarlsonj
643d04ccbb3Scarlsonj
644d04ccbb3ScarlsonjPersistent State
645d04ccbb3Scarlsonj
646d04ccbb3Scarlsonj  IPv4 DHCP has only minimal need for persistent state, beyond the
647d04ccbb3Scarlsonj  configuration parameters.  The state is stored when "ifconfig dhcp
648d04ccbb3Scarlsonj  drop" is run or the daemon receives SIGTERM, which is typically done
649d04ccbb3Scarlsonj  only well after the system is booted and running.
650d04ccbb3Scarlsonj
651d04ccbb3Scarlsonj  The daemon stores this state in /etc/dhcp, because it needs to be
652d04ccbb3Scarlsonj  available when only the root file system has been mounted.
653d04ccbb3Scarlsonj
654d04ccbb3Scarlsonj  Moreover, dhcpagent starts very early in the boot process.  It runs
655d04ccbb3Scarlsonj  as part of svc:/network/physical:default, which runs well before
656d04ccbb3Scarlsonj  root is mounted read/write:
657d04ccbb3Scarlsonj
658d04ccbb3Scarlsonj     svc:/system/filesystem/root:default ->
659d04ccbb3Scarlsonj        svc:/system/metainit:default ->
660d04ccbb3Scarlsonj           svc:/system/identity:node ->
661d04ccbb3Scarlsonj              svc:/network/physical:default
662d04ccbb3Scarlsonj           svc:/network/iscsi_initiator:default ->
663d04ccbb3Scarlsonj              svc:/network/physical:default
664d04ccbb3Scarlsonj
665d04ccbb3Scarlsonj  and, of course, well before either /var or /usr is mounted.  This
666d04ccbb3Scarlsonj  means that any persistent state must be kept in the root file
667d04ccbb3Scarlsonj  system, and that if we write before shutdown, we have to cope
668d04ccbb3Scarlsonj  gracefully with the root file system returning EROFS on write
669d04ccbb3Scarlsonj  attempts.
670d04ccbb3Scarlsonj
671d04ccbb3Scarlsonj  For DHCPv6, we need to try to keep our stable DUID and IAID values
672d04ccbb3Scarlsonj  stable across reboots to fulfill the demands of RFC 3315.
673d04ccbb3Scarlsonj
674d04ccbb3Scarlsonj  The DUID is either configured or automatically generated.  When
675d04ccbb3Scarlsonj  configured, it comes from the /etc/default/dhcpagent file, and thus
676d04ccbb3Scarlsonj  does not need to be saved by the daemon.  If automatically
677d04ccbb3Scarlsonj  generated, there's exactly one of these created, and it will
678d04ccbb3Scarlsonj  eventually be needed before /usr is mounted, if /usr is mounted over
679d04ccbb3Scarlsonj  IPv6.  This means a new file in the root file system,
680d04ccbb3Scarlsonj  /etc/dhcp/duid, will be used to hold the automatically generated
681d04ccbb3Scarlsonj  DUID.
682d04ccbb3Scarlsonj
683d04ccbb3Scarlsonj  The determination of whether to use a configured DUID or one saved
684d04ccbb3Scarlsonj  in a file is made in get_smach_cid().  This function will
685d04ccbb3Scarlsonj  encapsulate all of the DUID parsing and generation machinery for the
686d04ccbb3Scarlsonj  rest of dhcpagent.
687d04ccbb3Scarlsonj
688d04ccbb3Scarlsonj  If root is not writable at the point when dhcpagent starts, and our
689d04ccbb3Scarlsonj  attempt fails with EROFS, we will set a timer for 60 second
690d04ccbb3Scarlsonj  intervals to retry the operation periodically.  In the unlikely case
691d04ccbb3Scarlsonj  that it just never succeeds or that we're rebooted before root
692d04ccbb3Scarlsonj  becomes writable, then the impact will be that the daemon will wake
693d04ccbb3Scarlsonj  up once a minute and, ultimately, we'll choose a different DUID on
694d04ccbb3Scarlsonj  next start-up, and we'll thus lose our leases across a reboot.
695d04ccbb3Scarlsonj
696d04ccbb3Scarlsonj  The IAID similarly must be kept stable if at all possible, but
697d04ccbb3Scarlsonj  cannot be configured by the user.  To do make these values stable,
698d04ccbb3Scarlsonj  we will use two strategies.  First the IAID value for a given
699d04ccbb3Scarlsonj  interface (if not known) will just default to the IP ifIndex value,
700d04ccbb3Scarlsonj  provided that there's no known saved IAID using that value.  Second,
701d04ccbb3Scarlsonj  we will save off the IAID we choose in a single /etc/dhcp/iaid file,
702d04ccbb3Scarlsonj  containing an array of entries indexed by logical interface name.
703d04ccbb3Scarlsonj  Keeping it in a single file allows us to scan for used and unused
704d04ccbb3Scarlsonj  IAID values when necessary.
705d04ccbb3Scarlsonj
706d04ccbb3Scarlsonj  This mechanism depends on the interface name, and thus will need to
707d04ccbb3Scarlsonj  be revisited when Clearview vanity naming and NWAM are available.
708d04ccbb3Scarlsonj
709d04ccbb3Scarlsonj  Currently, the boot system (GRUB, OBP, the miniroot) does not
710d04ccbb3Scarlsonj  support installing over IPv6.  This could change in the future, so
711d04ccbb3Scarlsonj  one of the goals of the above stability plan is to support that
712d04ccbb3Scarlsonj  event.
713d04ccbb3Scarlsonj
714d04ccbb3Scarlsonj  When running in the miniroot on an x86 system, /etc/dhcp (and the
715d04ccbb3Scarlsonj  rest of the root) is mounted on a read-only ramdisk.  In this case,
716d04ccbb3Scarlsonj  writing to /etc/dhcp will just never work.  A possible solution
717d04ccbb3Scarlsonj  would be to add a new privileged command in ifconfig that forces
718d04ccbb3Scarlsonj  dhcpagent to write to an alternate location.  The initial install
719d04ccbb3Scarlsonj  process could then do "ifconfig <x> dhcp write /a" to get the needed
720d04ccbb3Scarlsonj  state written out to the newly-constructed system root.
721d04ccbb3Scarlsonj
722d04ccbb3Scarlsonj  This part (the new write option) won't be implemented as part of
723d04ccbb3Scarlsonj  this project, because it's not needed yet.
724d04ccbb3Scarlsonj
725d04ccbb3Scarlsonj
726d04ccbb3ScarlsonjRouter Advertisements
727d04ccbb3Scarlsonj
728d04ccbb3Scarlsonj  IPv6 Router Advertisements perform two functions related to DHCPv6:
729d04ccbb3Scarlsonj
730d04ccbb3Scarlsonj    - they specify whether and how to run DHCPv6 on a given interface.
731d04ccbb3Scarlsonj    - they provide a list of the valid prefixes on an interface.
732d04ccbb3Scarlsonj
733d04ccbb3Scarlsonj  For the first function, in.ndpd needs to use the same DHCP control
734d04ccbb3Scarlsonj  interfaces that ifconfig uses, so that it can launch dhcpagent and
735d04ccbb3Scarlsonj  trigger DHCPv6 when necessary.  Note that it never needs to shut
736d04ccbb3Scarlsonj  down DHCPv6, as router advertisements can't do that.
737d04ccbb3Scarlsonj
738d04ccbb3Scarlsonj  However, launching dhcpagent presents new problems.  As a part of
739d04ccbb3Scarlsonj  the "Quagga SMF Modifications" project (PSARC 2006/552), in.ndpd in
740d04ccbb3Scarlsonj  Nevada is now privilege-aware and runs with limited privileges,
741d04ccbb3Scarlsonj  courtesy of SMF.  Dhcpagent, on the other hand, must run with all
742d04ccbb3Scarlsonj  privileges.
743d04ccbb3Scarlsonj
744d04ccbb3Scarlsonj  A simple work-around for this issue is to rip out the "privileges="
745d04ccbb3Scarlsonj  clause from the method_credential for in.ndpd.  I've taken this
746d04ccbb3Scarlsonj  direction initially, but the right longer-term answer seems to be
747d04ccbb3Scarlsonj  converting dhcpagent into an SMF service.  This is quite a bit more
748d04ccbb3Scarlsonj  complex, as it means turning the /sbin/dhcpagent command line
749d04ccbb3Scarlsonj  interface into a utility that manipulates the service and passes the
750d04ccbb3Scarlsonj  command line options via IPC extensions.
751d04ccbb3Scarlsonj
752d04ccbb3Scarlsonj  Such a design also begs the question of whether dhcpagent itself
753d04ccbb3Scarlsonj  ought to run with reduced privileges.  It could, but it still needs
754d04ccbb3Scarlsonj  the ability to grant "all" (traditional UNIX root) privileges to the
755d04ccbb3Scarlsonj  eventhook script, if present.  There seem to be few ways to do this,
756d04ccbb3Scarlsonj  though it's a good area for research.
757d04ccbb3Scarlsonj
758d04ccbb3Scarlsonj  The second function, prefix handling, is also subtle.  Unlike IPv4
759d04ccbb3Scarlsonj  DHCP, DHCPv6 does not give the netmask or prefix length along with
760d04ccbb3Scarlsonj  the leased address.  The client is on its own to determine the right
761d04ccbb3Scarlsonj  netmask to use.  This is where the advertised prefixes come in:
762d04ccbb3Scarlsonj  these must be used to finish the interface configuration.
763d04ccbb3Scarlsonj
764d04ccbb3Scarlsonj  We will have the DHCPv6 client configure each interface with an
765d04ccbb3Scarlsonj  all-ones (/128) netmask by default.  In.ndpd will be modified so
766d04ccbb3Scarlsonj  that when it detects a new IFF_DHCPRUNNING IP logical interface, it
767d04ccbb3Scarlsonj  checks for a known matching prefix, and sets the netmask as
768d04ccbb3Scarlsonj  necessary.  If no matching prefix is known, it will send a new
769d04ccbb3Scarlsonj  Router Solicitation message to try to find one.
770d04ccbb3Scarlsonj
771d04ccbb3Scarlsonj  When in.ndpd learns of a new prefix from a Router Advertisement, it
772d04ccbb3Scarlsonj  will scan all of the IFF_DHCPRUNNING IP logical interfaces on the
773d04ccbb3Scarlsonj  same physical interface and set the netmasks when necessary.
774d04ccbb3Scarlsonj  Dhcpagent, for its part, will ignore the netmask on IPv6 interfaces
775d04ccbb3Scarlsonj  when checking for changes that would require it to "abandon" the
776d04ccbb3Scarlsonj  interface.
777d04ccbb3Scarlsonj
778d04ccbb3Scarlsonj  Given the way that DHCPv6 and in.ndpd control both the horizontal
779d04ccbb3Scarlsonj  and the vertical in plumbing and removing logical interfaces, and
780d04ccbb3Scarlsonj  users do not, it might be worthwhile to consider roping off any
781d04ccbb3Scarlsonj  direct user changes to IPv6 logical interfaces under control of
782d04ccbb3Scarlsonj  in.ndpd or dhcpagent, and instead force users through a higher-level
783d04ccbb3Scarlsonj  interface.  This won't be done as part of this project, however.
784d04ccbb3Scarlsonj
785d04ccbb3Scarlsonj
786d04ccbb3ScarlsonjARP Hardware Types
787d04ccbb3Scarlsonj
788d04ccbb3Scarlsonj  There are multiple places within the DHCPv6 client where the mapping
789d04ccbb3Scarlsonj  of DLPI MAC type to ARP Hardware Type is required:
790d04ccbb3Scarlsonj
791d04ccbb3Scarlsonj  - When we are constructing an automatic, stable DUID for our own
792d04ccbb3Scarlsonj    identity, we prefer to use a DUID-LLT if possible.  This is done
793d04ccbb3Scarlsonj    by finding a link-layer interface, opening it, reading the MAC
794d04ccbb3Scarlsonj    address and type, and translating in the make_stable_duid()
795d04ccbb3Scarlsonj    function in libdhcpagent.
796d04ccbb3Scarlsonj
797d04ccbb3Scarlsonj  - When we translate a user-configured DUID from
798d04ccbb3Scarlsonj    /etc/default/dhcpagent into a binary representation, we may have
799d04ccbb3Scarlsonj    to deal with a physical interface name.  In this case, we must
800d04ccbb3Scarlsonj    open that interface and read the MAC address and type.
801d04ccbb3Scarlsonj
802d04ccbb3Scarlsonj  - As part of the PIF data structure initialization, we need to read
803d04ccbb3Scarlsonj    out the MAC type so that it can be used in the BOOTP/DHCPv4
804d04ccbb3Scarlsonj    'htype' field.
805d04ccbb3Scarlsonj
806d04ccbb3Scarlsonj  Ideally, these would all be provided by a single libdlpi
807d04ccbb3Scarlsonj  implementation.  However, that project is on-going at this time and
808d04ccbb3Scarlsonj  has not yet integrated.  For the time being, a dlpi_to_arp()
809d04ccbb3Scarlsonj  translation function (taking dl_mac_type and returning an ARP
810d04ccbb3Scarlsonj  Hardware Type number) will be placed in libdhcputil.
811d04ccbb3Scarlsonj
812d04ccbb3Scarlsonj  This temporary function should be removed and this section of the
813d04ccbb3Scarlsonj  code updated when the new libdlpi from Clearview integrates.
814d04ccbb3Scarlsonj
815d04ccbb3Scarlsonj
816d04ccbb3ScarlsonjField Mappings
817d04ccbb3Scarlsonj
818d04ccbb3Scarlsonj  Old (all in ifslist)	New
819d04ccbb3Scarlsonj  next			dhcp_smach_t.dsm_next
820d04ccbb3Scarlsonj  prev			dhcp_smach_t.dsm_prev
821d04ccbb3Scarlsonj  if_hold_count		dhcp_smach_t.dsm_hold_count
822d04ccbb3Scarlsonj  if_ia			dhcp_smach_t.dsm_ia
823d04ccbb3Scarlsonj  if_async		dhcp_smach_t.dsm_async
824d04ccbb3Scarlsonj  if_state		dhcp_smach_t.dsm_state
825d04ccbb3Scarlsonj  if_dflags		dhcp_smach_t.dsm_dflags
826d04ccbb3Scarlsonj  if_name		dhcp_smach_t.dsm_name (see text)
827d04ccbb3Scarlsonj  if_index		dhcp_pif_t.pif_index
828d04ccbb3Scarlsonj  if_max		dhcp_lif_t.lif_max and dhcp_pif_t.pif_max
829d04ccbb3Scarlsonj  if_min		(was unused; removed)
830d04ccbb3Scarlsonj  if_opt		(was unused; removed)
831d04ccbb3Scarlsonj  if_hwaddr		dhcp_pif_t.pif_hwaddr
832d04ccbb3Scarlsonj  if_hwlen		dhcp_pif_t.pif_hwlen
833d04ccbb3Scarlsonj  if_hwtype		dhcp_pif_t.pif_hwtype
834d04ccbb3Scarlsonj  if_cid		dhcp_smach_t.dsm_cid
835d04ccbb3Scarlsonj  if_cidlen		dhcp_smach_t.dsm_cidlen
836d04ccbb3Scarlsonj  if_prl		dhcp_smach_t.dsm_prl
837d04ccbb3Scarlsonj  if_prllen		dhcp_smach_t.dsm_prllen
838d04ccbb3Scarlsonj  if_daddr		dhcp_pif_t.pif_daddr
839d04ccbb3Scarlsonj  if_dlen		dhcp_pif_t.pif_dlen
840d04ccbb3Scarlsonj  if_saplen		dhcp_pif_t.pif_saplen
841d04ccbb3Scarlsonj  if_sap_before		dhcp_pif_t.pif_sap_before
842d04ccbb3Scarlsonj  if_dlpi_fd		dhcp_pif_t.pif_dlpi_fd
843d04ccbb3Scarlsonj  if_sock_fd		v4_sock_fd and v6_sock_fd (globals)
844d04ccbb3Scarlsonj  if_sock_ip_fd		dhcp_lif_t.lif_sock_ip_fd
845d04ccbb3Scarlsonj  if_timer		(see text)
846d04ccbb3Scarlsonj  if_t1			dhcp_lease_t.dl_t1
847d04ccbb3Scarlsonj  if_t2			dhcp_lease_t.dl_t2
848d04ccbb3Scarlsonj  if_lease		dhcp_lif_t.lif_expire
849d04ccbb3Scarlsonj  if_nrouters		dhcp_smach_t.dsm_nrouters
850d04ccbb3Scarlsonj  if_routers		dhcp_smach_t.dsm_routers
851d04ccbb3Scarlsonj  if_server		dhcp_smach_t.dsm_server
852d04ccbb3Scarlsonj  if_addr		dhcp_lif_t.lif_v6addr
853d04ccbb3Scarlsonj  if_netmask		dhcp_lif_t.lif_v6mask
854d04ccbb3Scarlsonj  if_broadcast		dhcp_lif_t.lif_v6peer
855d04ccbb3Scarlsonj  if_ack		dhcp_smach_t.dsm_ack
856d04ccbb3Scarlsonj  if_orig_ack		dhcp_smach_t.dsm_orig_ack
857d04ccbb3Scarlsonj  if_offer_wait		dhcp_smach_t.dsm_offer_wait
858d04ccbb3Scarlsonj  if_offer_timer	dhcp_smach_t.dsm_offer_timer
859d04ccbb3Scarlsonj  if_offer_id		dhcp_pif_t.pif_dlpi_id
860d04ccbb3Scarlsonj  if_acknak_id		dhcp_lif_t.lif_acknak_id
861d04ccbb3Scarlsonj  if_acknak_bcast_id	v4_acknak_bcast_id (global)
862d04ccbb3Scarlsonj  if_neg_monosec	dhcp_smach_t.dsm_neg_monosec
863d04ccbb3Scarlsonj  if_newstart_monosec	dhcp_smach_t.dsm_newstart_monosec
864d04ccbb3Scarlsonj  if_curstart_monosec	dhcp_smach_t.dsm_curstart_monosec
865d04ccbb3Scarlsonj  if_disc_secs		dhcp_smach_t.dsm_disc_secs
866d04ccbb3Scarlsonj  if_reqhost		dhcp_smach_t.dsm_reqhost
867d04ccbb3Scarlsonj  if_recv_pkt_list	dhcp_smach_t.dsm_recv_pkt_list
868d04ccbb3Scarlsonj  if_sent		dhcp_smach_t.dsm_sent
869d04ccbb3Scarlsonj  if_received		dhcp_smach_t.dsm_received
870d04ccbb3Scarlsonj  if_bad_offers		dhcp_smach_t.dsm_bad_offers
871d04ccbb3Scarlsonj  if_send_pkt		dhcp_smach_t.dsm_send_pkt
872d04ccbb3Scarlsonj  if_send_timeout	dhcp_smach_t.dsm_send_timeout
873d04ccbb3Scarlsonj  if_send_dest		dhcp_smach_t.dsm_send_dest
874d04ccbb3Scarlsonj  if_send_stop_func	dhcp_smach_t.dsm_send_stop_func
875d04ccbb3Scarlsonj  if_packet_sent	dhcp_smach_t.dsm_packet_sent
876d04ccbb3Scarlsonj  if_retrans_timer	dhcp_smach_t.dsm_retrans_timer
877d04ccbb3Scarlsonj  if_script_fd		dhcp_smach_t.dsm_script_fd
878d04ccbb3Scarlsonj  if_script_pid		dhcp_smach_t.dsm_script_pid
879d04ccbb3Scarlsonj  if_script_helper_pid	dhcp_smach_t.dsm_script_helper_pid
880d04ccbb3Scarlsonj  if_script_event	dhcp_smach_t.dsm_script_event
881d04ccbb3Scarlsonj  if_script_event_id	dhcp_smach_t.dsm_script_event_id
882d04ccbb3Scarlsonj  if_callback_msg	dhcp_smach_t.dsm_callback_msg
883d04ccbb3Scarlsonj  if_script_callback	dhcp_smach_t.dsm_script_callback
884d04ccbb3Scarlsonj
885d04ccbb3Scarlsonj  Notes:
886d04ccbb3Scarlsonj
887d04ccbb3Scarlsonj    - The dsm_name field currently just points to the lif_name on the
888d04ccbb3Scarlsonj      controlling LIF.  This may need to be named differently in the
889d04ccbb3Scarlsonj      future; perhaps when Zones are supported.
890d04ccbb3Scarlsonj
891d04ccbb3Scarlsonj    - The timer mechanism will be refactored.  Rather than using the
892d04ccbb3Scarlsonj      separate if_timer[] array to hold the timer IDs and
893d04ccbb3Scarlsonj      if_{t1,t2,lease} to hold the relative timer values, we will
894d04ccbb3Scarlsonj      gather this information into a dhcp_timer_t structure:
895d04ccbb3Scarlsonj
896d04ccbb3Scarlsonj	dt_id		timer ID value
897d04ccbb3Scarlsonj	dt_start	relative start time
898d04ccbb3Scarlsonj
899d04ccbb3Scarlsonj  New fields not accounted for above:
900d04ccbb3Scarlsonj
901d04ccbb3Scarlsonj  dhcp_pif_t.pif_next		linkage in global list of PIFs
902d04ccbb3Scarlsonj  dhcp_pif_t.pif_prev		linkage in global list of PIFs
903d04ccbb3Scarlsonj  dhcp_pif_t.pif_lifs		pointer to list of LIFs on this PIF
904d04ccbb3Scarlsonj  dhcp_pif_t.pif_isv6		IPv6 flag
905d04ccbb3Scarlsonj  dhcp_pif_t.pif_dlpi_count	number of state machines using DLPI
906d04ccbb3Scarlsonj  dhcp_pif_t.pif_hold_count	reference count
907d04ccbb3Scarlsonj  dhcp_pif_t.pif_name		name of physical interface
908d04ccbb3Scarlsonj  dhcp_lif_t.lif_next		linkage in per-PIF list of LIFs
909d04ccbb3Scarlsonj  dhcp_lif_t.lif_prev		linkage in per-PIF list of LIFs
910d04ccbb3Scarlsonj  dhcp_lif_t.lif_pif		backpointer to parent PIF
911d04ccbb3Scarlsonj  dhcp_lif_t.lif_smachs		pointer to list of state machines
912d04ccbb3Scarlsonj  dhcp_lif_t.lif_lease		backpointer to lease holding LIF
913d04ccbb3Scarlsonj  dhcp_lif_t.lif_flags		interface flags (IFF_*)
914d04ccbb3Scarlsonj  dhcp_lif_t.lif_hold_count	reference count
915d04ccbb3Scarlsonj  dhcp_lif_t.lif_dad_wait	waiting for DAD resolution flag
916d04ccbb3Scarlsonj  dhcp_lif_t.lif_removed	removed from list flag
917d04ccbb3Scarlsonj  dhcp_lif_t.lif_plumbed	plumbed by dhcpagent flag
918d04ccbb3Scarlsonj  dhcp_lif_t.lif_expired	lease has expired flag
919d04ccbb3Scarlsonj  dhcp_lif_t.lif_declined	reason to refuse this address (string)
920d04ccbb3Scarlsonj  dhcp_lif_t.lif_iaid		unique and stable 32-bit identifier
921d04ccbb3Scarlsonj  dhcp_lif_t.lif_iaid_id	timer for delayed /etc writes
922d04ccbb3Scarlsonj  dhcp_lif_t.lif_preferred	preferred timer for v6; deprecate after
923d04ccbb3Scarlsonj  dhcp_lif_t.lif_name		name of logical interface
924d04ccbb3Scarlsonj  dhcp_smach_t.dsm_lif		controlling (main) LIF
925d04ccbb3Scarlsonj  dhcp_smach_t.dsm_leases	pointer to list of leases
926d04ccbb3Scarlsonj  dhcp_smach_t.dsm_lif_wait	number of LIFs waiting on DAD
927d04ccbb3Scarlsonj  dhcp_smach_t.dsm_lif_down	number of LIFs that have failed
928d04ccbb3Scarlsonj  dhcp_smach_t.dsm_using_dlpi	currently using DLPI flag
929d04ccbb3Scarlsonj  dhcp_smach_t.dsm_send_tcenter	v4 central timer value; v6 MRT
930d04ccbb3Scarlsonj  dhcp_lease_t.dl_next		linkage in per-state-machine list of leases
931d04ccbb3Scarlsonj  dhcp_lease_t.dl_prev		linkage in per-state-machine list of leases
932d04ccbb3Scarlsonj  dhcp_lease_t.dl_smach		back pointer to state machine
933d04ccbb3Scarlsonj  dhcp_lease_t.dl_lifs		pointer to first LIF configured by lease
934d04ccbb3Scarlsonj  dhcp_lease_t.dl_nlifs		number of configured consecutive LIFs
935d04ccbb3Scarlsonj  dhcp_lease_t.dl_hold_count	reference counter
936d04ccbb3Scarlsonj  dhcp_lease_t.dl_removed	removed from list flag
937d04ccbb3Scarlsonj  dhcp_lease_t.dl_stale		lease was not updated by Renew/Rebind
938d04ccbb3Scarlsonj
939d04ccbb3Scarlsonj
940d04ccbb3ScarlsonjSnoop
941d04ccbb3Scarlsonj
942d04ccbb3Scarlsonj  The snoop changes are fairly straightforward.  As snoop just decodes
943d04ccbb3Scarlsonj  the messages, and the message format is quite different between
944d04ccbb3Scarlsonj  DHCPv4 and DHCPv6, a new module will be created to handle DHCPv6
945d04ccbb3Scarlsonj  decoding, and will export a interpret_dhcpv6() function.
946d04ccbb3Scarlsonj
947d04ccbb3Scarlsonj  The one bit of commonality between the two protocols is the use of
948d04ccbb3Scarlsonj  ARP Hardware Type numbers, which are found in the underlying BOOTP
949d04ccbb3Scarlsonj  message format for DHCPv4 and in the DUID-LL and DUID-LLT
950d04ccbb3Scarlsonj  construction for DHCPv6.  To simplify this, the existing static
951d04ccbb3Scarlsonj  show_htype() function in snoop_dhcp.c will be renamed to arp_htype()
952d04ccbb3Scarlsonj  (to better reflect its functionality), updated with more modern
953d04ccbb3Scarlsonj  hardware types, moved to snoop_arp.c (where it belongs), and made a
954d04ccbb3Scarlsonj  public symbol within snoop.
955d04ccbb3Scarlsonj
956d04ccbb3Scarlsonj  While I'm there, I'll update snoop_arp.c so that when it prints an
957d04ccbb3Scarlsonj  ARP message in verbose mode, it uses arp_htype() to translate the
958d04ccbb3Scarlsonj  ar_hrd value.
959d04ccbb3Scarlsonj
960d04ccbb3Scarlsonj  The snoop updates also involve the addition of a new "dhcp6" keyword
961d04ccbb3Scarlsonj  for filtering.  As a part of this, CR 6487534 will be fixed.
962d04ccbb3Scarlsonj
963d04ccbb3Scarlsonj
964d04ccbb3ScarlsonjIPv6 Source Address Selection
965d04ccbb3Scarlsonj
966d04ccbb3Scarlsonj  One of the customer requests for DHCPv6 is to be able to predict the
967d04ccbb3Scarlsonj  address selection behavior in the presence of both stateful and
968d04ccbb3Scarlsonj  stateless addresses on the same network.
969d04ccbb3Scarlsonj
970d04ccbb3Scarlsonj  Solaris implements RFC 3484 address selection behavior.  In this
971d04ccbb3Scarlsonj  scheme, the first seven rules implement some basic preferences for
972d04ccbb3Scarlsonj  addresses, with Rule 8 being a deterministic tie breaker.
973d04ccbb3Scarlsonj
974d04ccbb3Scarlsonj  Rule 8 relies on a special function, CommonPrefixLen, defined in the
975d04ccbb3Scarlsonj  RFC, that compares leading bits of the address without regard to
976d04ccbb3Scarlsonj  configured prefix length.  As Rule 1 eliminates equal addresses,
977d04ccbb3Scarlsonj  this always picks a single address.
978d04ccbb3Scarlsonj
979d04ccbb3Scarlsonj  This rule, though, allows for additional checks:
980d04ccbb3Scarlsonj
981d04ccbb3Scarlsonj   Rule 8 may be superseded if the implementation has other means of
982d04ccbb3Scarlsonj   choosing among source addresses.  For example, if the implementation
983d04ccbb3Scarlsonj   somehow knows which source address will result in the "best"
984d04ccbb3Scarlsonj   communications performance.
985d04ccbb3Scarlsonj
986d04ccbb3Scarlsonj  We will thus split Rule 8 into three separate rules:
987d04ccbb3Scarlsonj
988d04ccbb3Scarlsonj  - First, compare on configured prefix.  The interface with the
989d04ccbb3Scarlsonj    longest configured prefix length that also matches the candidate
990d04ccbb3Scarlsonj    address will be preferred.
991d04ccbb3Scarlsonj
992d04ccbb3Scarlsonj  - Next, check the type of address.  Prefer statically configured
993d04ccbb3Scarlsonj    addresses above all others.  Next, those from DHCPv6.  Next,
994d04ccbb3Scarlsonj    stateless autoconfigured addresses.  Finally, temporary addresses.
995d04ccbb3Scarlsonj    (Note that Rule 7 will take care of temporary address preferences,
996d04ccbb3Scarlsonj    so that this rule doesn't actually need to look at them.)
997d04ccbb3Scarlsonj
998d04ccbb3Scarlsonj  - Finally, run the check-all-bits (CommonPrefixLen) tie breaker.
999d04ccbb3Scarlsonj
1000d04ccbb3Scarlsonj  The result of this is that if there's a local address in the same
1001d04ccbb3Scarlsonj  configured prefix, then we'll prefer that over other addresses.  If
1002d04ccbb3Scarlsonj  there are multiple to choose from, then will pick static first, then
1003d04ccbb3Scarlsonj  DHCPv6, then dynamic.  Finally, if there are still multiples, we'll
1004d04ccbb3Scarlsonj  use the "closest" address, bitwise.
1005d04ccbb3Scarlsonj
1006d04ccbb3Scarlsonj  Also, this basic implementation scheme also addresses CR 6485164, so
1007d04ccbb3Scarlsonj  a fix for that will be included with this project.
1008d04ccbb3Scarlsonj
1009d04ccbb3Scarlsonj
1010d04ccbb3ScarlsonjMinor Improvements
1011d04ccbb3Scarlsonj
1012d04ccbb3Scarlsonj  Various small problems with the system encountered during
1013d04ccbb3Scarlsonj  development will be fixed along with this project.  Some of these
1014d04ccbb3Scarlsonj  are:
1015d04ccbb3Scarlsonj
1016d04ccbb3Scarlsonj  - List of ARPHRD_* types is a bit short; add some new ones.
1017d04ccbb3Scarlsonj
1018d04ccbb3Scarlsonj  - List of IPPORT_* values is similarly sparse; add others in use by
1019d04ccbb3Scarlsonj    snoop.
1020d04ccbb3Scarlsonj
1021d04ccbb3Scarlsonj  - dhcpmsg.h lacks PRINTFLIKE for dhcpmsg(); add it.
1022d04ccbb3Scarlsonj
1023d04ccbb3Scarlsonj  - CR 6482163 causes excessive lint errors with libxnet; will fix.
1024d04ccbb3Scarlsonj
1025d04ccbb3Scarlsonj  - libdhcpagent uses gettimeofday() for I/O timing, and this can
1026d04ccbb3Scarlsonj    drift on systems with NTP.  It should use a stable time source
1027d04ccbb3Scarlsonj    (gethrtime()) instead, and should return better error values.
1028d04ccbb3Scarlsonj
1029d04ccbb3Scarlsonj  - Controlling debug mode in the daemon shouldn't require changing
1030d04ccbb3Scarlsonj    the command line arguments or jumping through special hoops.  I've
1031d04ccbb3Scarlsonj    added undocumented ".DEBUG_LEVEL=[0-3]" and ".VERBOSE=[01]"
1032d04ccbb3Scarlsonj    features to /etc/default/dhcpagent.
1033d04ccbb3Scarlsonj
1034d04ccbb3Scarlsonj  - The various attributes of the IPC commands (requires privileges,
1035d04ccbb3Scarlsonj    creates a new session, valid with BOOTP, immediate reply) should
1036d04ccbb3Scarlsonj    be gathered together into one look-up table rather than scattered
1037d04ccbb3Scarlsonj    as hard-coded tests.
1038d04ccbb3Scarlsonj
1039d04ccbb3Scarlsonj  - Remove the event unregistration from the command dispatch loop and
1040d04ccbb3Scarlsonj    get rid of the ipc_action_pending() botch.  We'll get a
1041d04ccbb3Scarlsonj    zero-length read any time the client goes away, and that will be
1042d04ccbb3Scarlsonj    enough to trigger termination.  This fix removes async_pending()
1043d04ccbb3Scarlsonj    and async_timeout() as well, and fixes CR 6487958 as a
1044d04ccbb3Scarlsonj    side-effect.
1045d04ccbb3Scarlsonj
1046d04ccbb3Scarlsonj  - Throughout the dhcpagent code, there are private implementations
1047d04ccbb3Scarlsonj    of doubly-linked and singly-linked lists for each data type.
1048d04ccbb3Scarlsonj    These will all be removed and replaced with insque(3C) and
1049d04ccbb3Scarlsonj    remque(3C).
1050d04ccbb3Scarlsonj
1051d04ccbb3Scarlsonj
1052d04ccbb3ScarlsonjTesting
1053d04ccbb3Scarlsonj
1054d04ccbb3Scarlsonj  The implementation was tested using the TAHI test suite for DHCPv6
1055d04ccbb3Scarlsonj  (www.tahi.org).  There are some peculiar aspects to this test suite,
1056d04ccbb3Scarlsonj  and these issues directed some of the design.  In particular:
1057d04ccbb3Scarlsonj
1058d04ccbb3Scarlsonj  - If Renew/Rebind doesn't mention one of our leases, then we need to
1059d04ccbb3Scarlsonj    allow the message to be retransmitted.  Real servers are unlikely
1060d04ccbb3Scarlsonj    to do this.
1061d04ccbb3Scarlsonj
1062d04ccbb3Scarlsonj  - We must look for a status code within IAADDR and within IA_NA, and
1063d04ccbb3Scarlsonj    handle the paradoxical case of "NoAddrAvail."  That doesn't make
1064d04ccbb3Scarlsonj    sense, as a server with no addresses wouldn't use those options.
1065d04ccbb3Scarlsonj    That option makes more sense at the top level of the message.
1066d04ccbb3Scarlsonj
1067d04ccbb3Scarlsonj  - If we get "UseMulticast" when we were already using multicast,
1068d04ccbb3Scarlsonj    then ignore the error code.  Sending another request would cause a
1069d04ccbb3Scarlsonj    loop.
1070d04ccbb3Scarlsonj
1071d04ccbb3Scarlsonj  - TAHI uses "NoBinding" at the top level of the message.  This
1072d04ccbb3Scarlsonj    status code only makes sense within an IA, as it refers to the
1073d04ccbb3Scarlsonj    GUID:IAID binding, which doesn't exist outside an IA.  We must
1074d04ccbb3Scarlsonj    ignore such errors -- treat them as success.
1075d04ccbb3Scarlsonj
1076d04ccbb3Scarlsonj
1077d04ccbb3ScarlsonjInteractions With Other Projects
1078d04ccbb3Scarlsonj
1079d04ccbb3Scarlsonj  Clearview UV (vanity naming) will cause link names, and thus IP
1080d04ccbb3Scarlsonj  interface names, to become changeable over time.  This will break
1081d04ccbb3Scarlsonj  the IAID stability mechanism if UV is used for arbitrary renaming,
1082d04ccbb3Scarlsonj  rather than as just a DR enhancement.
1083d04ccbb3Scarlsonj
1084d04ccbb3Scarlsonj  When this portion of Clearview integrates, this part of the DHCPv6
1085d04ccbb3Scarlsonj  design may need to be revisited.  (The solution will likely be
1086d04ccbb3Scarlsonj  handled at some higher layer, such as within Network Automagic.)
1087d04ccbb3Scarlsonj
1088d04ccbb3Scarlsonj  Clearview is also contributing a new libdlpi that will work for
1089d04ccbb3Scarlsonj  dhcpagent, and is thus removing the private dlpi_io.[ch] functions
1090d04ccbb3Scarlsonj  from this daemon.  When that Clearview project integrates, the
1091d04ccbb3Scarlsonj  DHCPv6 project will need to adjust to the new interfaces, and remove
1092d04ccbb3Scarlsonj  or relocate the dlpi_to_arp() function.
1093d04ccbb3Scarlsonj
1094d04ccbb3Scarlsonj
1095d04ccbb3ScarlsonjFutures
1096d04ccbb3Scarlsonj
1097d04ccbb3Scarlsonj  Zones currently cannot address any IP interfaces by way of DHCP.
1098d04ccbb3Scarlsonj  This project will not fix that problem, but the DUID/IAID could be
1099d04ccbb3Scarlsonj  used to help fix it in the future.
1100d04ccbb3Scarlsonj
1101d04ccbb3Scarlsonj  In particular, the DUID allows the client to obtain separate sets of
1102d04ccbb3Scarlsonj  addresses and configuration parameters on a single interface, just
1103d04ccbb3Scarlsonj  like an IPv4 Client ID, but it includes a clean mechanism for vendor
1104d04ccbb3Scarlsonj  extensions.  If we associate the DUID with the zone identifier or
1105d04ccbb3Scarlsonj  name through an extension, then we have a really simple way of
1106d04ccbb3Scarlsonj  allocating per-zone addresses.
1107d04ccbb3Scarlsonj
1108d04ccbb3Scarlsonj  Moreover, RFC 4361 describes a handy way of using DHCPv6 DUID/IAID
1109d04ccbb3Scarlsonj  values with IPv4 DHCP, which would quickly solve the problem of
1110d04ccbb3Scarlsonj  using DHCP for IPv4 address assignment in non-global zones as well.
1111d04ccbb3Scarlsonj
1112d04ccbb3Scarlsonj  (One potential risk with this plan is that there may be server
1113d04ccbb3Scarlsonj  implementations that either do not implement the RFC correctly or
1114d04ccbb3Scarlsonj  otherwise mishandle the DUID.  This has apparently bitten some early
1115d04ccbb3Scarlsonj  adopters.)
1116d04ccbb3Scarlsonj
1117d04ccbb3Scarlsonj  Implementing the FQDN option for DHCPv6 would, given the current
1118d04ccbb3Scarlsonj  libdhcputil design, require a new 'type' of entry for the inittab6
1119d04ccbb3Scarlsonj  file.  This is because the design does not allow for any simple
1120d04ccbb3Scarlsonj  means to ``compose'' a sequence of basic types together.  Thus,
1121d04ccbb3Scarlsonj  every type of option must either be a basic type, or an array of
1122d04ccbb3Scarlsonj  multiple instances of the same basic type.
1123d04ccbb3Scarlsonj
1124d04ccbb3Scarlsonj  If we implement FQDN in the future, it may be useful to explore some
1125d04ccbb3Scarlsonj  means of allowing a given option instance to be a sequence of basic
1126d04ccbb3Scarlsonj  types.
1127d04ccbb3Scarlsonj
1128d04ccbb3Scarlsonj  This project does not make the DNS resolver or any other subsystem
1129d04ccbb3Scarlsonj  use the data gathered by DHCPv6.  It just makes the data available
1130d04ccbb3Scarlsonj  through dhcpinfo(1).  Future projects should modify those services
1131d04ccbb3Scarlsonj  to use configuration data learned via DHCPv6.  (One of the reasons
1132d04ccbb3Scarlsonj  this is not being done now is that Network Automagic [NWAM] will
1133d04ccbb3Scarlsonj  likely be changing this area substantially in the very near future,
1134d04ccbb3Scarlsonj  and thus the effort would be largely wasted.)
1135d04ccbb3Scarlsonj
1136d04ccbb3Scarlsonj
1137d04ccbb3ScarlsonjAppendix A - Choice of Venue
1138d04ccbb3Scarlsonj
1139d04ccbb3Scarlsonj  There are three logical places to implement DHCPv6:
1140d04ccbb3Scarlsonj
1141d04ccbb3Scarlsonj    - in dhcpagent
1142d04ccbb3Scarlsonj    - in in.ndpd
1143d04ccbb3Scarlsonj    - in a new daemon (say, 'dhcp6agent')
1144d04ccbb3Scarlsonj
1145d04ccbb3Scarlsonj  We need to access parameters via dhcpinfo, and should provide the
1146d04ccbb3Scarlsonj  same set of status and control features via ifconfig as are present
1147d04ccbb3Scarlsonj  for IPv4.  (For the latter, if we fail to do that, it will likely
1148d04ccbb3Scarlsonj  confuse users.  The expense for doing it is comparatively small, and
1149d04ccbb3Scarlsonj  it will be useful for testing, even though it should not be needed
1150d04ccbb3Scarlsonj  in normal operation.)
1151d04ccbb3Scarlsonj
1152d04ccbb3Scarlsonj  If we implement somewhere other than dhcpagent, then we need to give
1153d04ccbb3Scarlsonj  that new daemon (in.ndpd or dhcp6agent) the same basic IPC features
1154d04ccbb3Scarlsonj  as dhcpagent already has.  This means either extracting those bits
1155d04ccbb3Scarlsonj  (async.c and ipc_action.c) into a shared library or just copying
1156d04ccbb3Scarlsonj  them.  Obviously, the former would be preferred, but as those bits
1157d04ccbb3Scarlsonj  depend on the rest of the dhcpagent infrastructure for timers and
1158d04ccbb3Scarlsonj  state handling, this means that the new process would have to look a
1159d04ccbb3Scarlsonj  lot like dhcpagent.
1160d04ccbb3Scarlsonj
1161d04ccbb3Scarlsonj  Implementing DHCPv6 as part of in.ndpd is attractive, as it
1162d04ccbb3Scarlsonj  eliminates the confusion that the router discovery process for
1163d04ccbb3Scarlsonj  determining interface netmasks can cause, along with the need to do
1164d04ccbb3Scarlsonj  any signaling at all to bring DHCPv6 up.  However, the need to make
1165d04ccbb3Scarlsonj  in.ndpd more like dhcpagent is unattractive.
1166d04ccbb3Scarlsonj
1167d04ccbb3Scarlsonj  Having a new dhcp6agent daemon seems to have little to recommend it,
1168d04ccbb3Scarlsonj  other than leaving the existing dhcpagent code untouched.  If we do
1169d04ccbb3Scarlsonj  that, then we end up with two implementations that do many similar
1170d04ccbb3Scarlsonj  things, and must be maintained in parallel.
1171d04ccbb3Scarlsonj
1172d04ccbb3Scarlsonj  Thus, although it leads to some complexity in reworking the data
1173d04ccbb3Scarlsonj  structures to fit both protocols, on balance the simplest solution
1174d04ccbb3Scarlsonj  is to extend dhcpagent.
1175d04ccbb3Scarlsonj
1176d04ccbb3Scarlsonj
1177d04ccbb3ScarlsonjAppendix B - Cross-Reference
1178d04ccbb3Scarlsonj
1179d04ccbb3Scarlsonj  in.ndpd
1180d04ccbb3Scarlsonj
1181d04ccbb3Scarlsonj    - Start dhcpagent and issue "dhcp start" command via libdhcpagent
1182d04ccbb3Scarlsonj    - Parse StatefulAddrConf interface option from ndpd.conf
1183d04ccbb3Scarlsonj    - Watch for M and O bits to trigger DHCPv6
1184d04ccbb3Scarlsonj    - Handle "no routers found" case and start DHCPv6
1185d04ccbb3Scarlsonj    - Track prefixes and set prefix length on IFF_DHCPRUNNING aliases
1186d04ccbb3Scarlsonj    - Send new Router Solicitation when prefix unknown
1187d04ccbb3Scarlsonj    - Change privileges so that dhcpagent can be launched successfully
1188d04ccbb3Scarlsonj
1189d04ccbb3Scarlsonj  libdhcputil
1190d04ccbb3Scarlsonj
1191d04ccbb3Scarlsonj    - Parse new /etc/dhcp/inittab6 file
1192d04ccbb3Scarlsonj    - Handle new UNUMBER24, SNUMBER64, IPV6, DUID and DOMAIN types
1193d04ccbb3Scarlsonj    - Add DHCPv6 option iterators (dhcpv6_find_option and
1194d04ccbb3Scarlsonj      dhcpv6_pkt_option)
1195d04ccbb3Scarlsonj    - Add dlpi_to_arp function (temporary)
1196d04ccbb3Scarlsonj
1197d04ccbb3Scarlsonj  libdhcpagent
1198d04ccbb3Scarlsonj
1199d04ccbb3Scarlsonj    - Add stable DUID and IAID creation and storage support
1200d04ccbb3Scarlsonj      functions and add new dhcp_stable.h include file
1201d04ccbb3Scarlsonj    - Support new DECLINING and RELEASING states introduced by DHCPv6.
1202d04ccbb3Scarlsonj    - Update implementation so that it doesn't rely on gettimeofday()
1203d04ccbb3Scarlsonj      for I/O timeouts
1204d04ccbb3Scarlsonj    - Extend the hostconf functions to support DHCPv6, using a new
1205d04ccbb3Scarlsonj      ".dh6" file
1206d04ccbb3Scarlsonj
1207d04ccbb3Scarlsonj  snoop
1208d04ccbb3Scarlsonj
1209d04ccbb3Scarlsonj    - Add support for DHCPv6 packet decoding (all types)
1210d04ccbb3Scarlsonj    - Add "dhcp6" filter keyword
1211d04ccbb3Scarlsonj    - Fix known bugs in DHCP filtering
1212d04ccbb3Scarlsonj
1213d04ccbb3Scarlsonj  ifconfig
1214d04ccbb3Scarlsonj
1215d04ccbb3Scarlsonj    - Remove inet-only restriction on "dhcp" keyword
1216d04ccbb3Scarlsonj
1217d04ccbb3Scarlsonj  netstat
1218d04ccbb3Scarlsonj
1219d04ccbb3Scarlsonj    - Remove strange "-I list" feature.
1220d04ccbb3Scarlsonj    - Add support for DHCPv6 and iterating over IPv6 interfaces.
1221d04ccbb3Scarlsonj
1222d04ccbb3Scarlsonj  ip
1223d04ccbb3Scarlsonj
1224d04ccbb3Scarlsonj    - Add extensions to IPv6 source address selection to prefer DHCPv6
1225d04ccbb3Scarlsonj      addresses when all else is equal
1226d04ccbb3Scarlsonj    - Fix known bugs in source address selection (remaining from TX
1227d04ccbb3Scarlsonj      integration)
1228d04ccbb3Scarlsonj
1229d04ccbb3Scarlsonj  other
1230d04ccbb3Scarlsonj
1231d04ccbb3Scarlsonj    - Add ifindex and source/destination address into PKT_LIST.
1232d04ccbb3Scarlsonj    - Add more ARPHDR_* and IPPORT_* values.
1233