History log of /freebsd-head/sys/netpfil/ipfw/ip_fw_private.h
Revision Date Author Comments
fe1518789b8b05682cfb974f8e41ceace6ae9cbf 29-Jul-2019 ae <ae@FreeBSD.org> dd ipfw_get_action() function to get the pointer to action opcode.

ACTION_PTR() returns pointer to the start of rule action section,
but rule can keep several rule modifiers like O_LOG, O_TAG and O_ALTQ,
and only then real action opcode is stored.

ipfw_get_action() function inspects the rule action section, skips
all modifiers and returns action opcode.

Use this function in ipfw_reset_eaction() and flush_nat_ptrs().

MFC after: 1 week
Sponsored by: Yandex LLC
41a70f9371c9d6ee8f1108936c3b8508a6aab41b 21-Mar-2019 glebius <glebius@FreeBSD.org> Always create ipfw(4) hooks as long as module is loaded.

Now enabling ipfw(4) with sysctls controls only linkage of hooks to default
heads. When module is loaded fetch sysctls as tunables, to make it possible
to boot with ipfw(4) in kernel, but not linked to any pfil(9) hooks.
4ac89986fbfe79dbcf5b6529246c21b87976a8c2 14-Mar-2019 glebius <glebius@FreeBSD.org> PFIL_MEMPTR for ipfw link level hook

With new pfil(9) KPI it is possible to pass a void pointer with length
instead of mbuf pointer to a packet filter. Until this commit no filters
supported that, so pfil run through a shim function pfil_fake_mbuf().

Now the ipfw(4) hook named "default-link", that is instantiated when
net.link.ether.ipfw sysctl is on, supports processing pointer/length
packets natively.

- ip_fw_args now has union for either mbuf or void *, and if flags have
non-zero length, then we use the void *.
- through ipfw_chk() we handle mem/mbuf cases differently.
- ether_header goes away from args. It is ipfw_chk() responsibility
to do parsing of Ethernet header.
- ipfw_log() now uses different bpf APIs to log packets.

Although ipfw_chk() is now capable to process pointer/length packets,
this commit adds support for the link level hook only, see
ipfw_check_frame(). Potentially the IP processing hook ipfw_check_packet()
can be improved too, but that requires more changes since the hook
supports more complex actions: NAT, divert, etc.

Reviewed by: ae
Differential Revision: https://reviews.freebsd.org/D19357
f2e5fcd17cf0e3a7002cab85bfabc4f71f96e78e 14-Mar-2019 glebius <glebius@FreeBSD.org> Remove 'dir' argument from dummynet_io(). This makes it possible to make
dn_dir flags private to dummynet. There is still some room for improvement.
d751747546b5584bb94a4e5ac14aaa6b2b95224e 14-Mar-2019 glebius <glebius@FreeBSD.org> - Add more flags to ip_fw_args. At this changeset only IPFW_ARGS_IN and
IPFW_ARGS_OUT are utilized. They are intented to substitute the "dir"
parameter that is often passes together with args.
- Rename ip_fw_args.oif to ifp and now it is set to either input or
output interface, depending on IPFW_ARGS_IN/OUT bit set.
f666b9e53585dea02ac8381f9c9d5b1a3bb08f73 11-Mar-2019 ae <ae@FreeBSD.org> Add IP_FW_NAT64 to codes that ipfw_chk() can return.

It will be used by upcoming NAT64 changes. We use separate code
to avoid propogating EACCES error code to user level applications
when NAT64 consumes a packet.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
003a6bc983059d7573a12e4720c69c610f12dd16 31-Jan-2019 glebius <glebius@FreeBSD.org> Revert r316461: Remove "IPFW static rules" rmlock, and use pfil's global lock.

The pfil(9) system is about to be converted to epoch(9) synchronization, so
we need [temporarily] go back with ipfw internal locking.

Discussed with: ae
3ad6a5223c0c5c021292deb9a498a719a3bbdf08 10-Jan-2019 ae <ae@FreeBSD.org> Reduce the size of struct ip_fw_args from 240 to 128 bytes on amd64.
And refactor the code to avoid unneeded initialization to reduce overhead
of per-packet processing.

ipfw(4) can be invoked by pfil(9) framework for each packet several times.
Each call uses on-stack variable of type struct ip_fw_args to keep the
state of ipfw(4) processing. Currently this variable has 240 bytes size
on amd64. Each time ipfw(4) does bzero() on it, and then it initializes
some fields.

glebius@ has reported that they at Netflix discovered, that initialization
of this variable produces significant overhead on packet processing.
After patching I managed to increase performance of packet processing on
simple routing with ipfw(4) firewalling to about 11% from 9.8Mpps up to
11Mpps (Xeon E5-2660 v4@ + Mellanox 100G card).

Introduced new field flags, it is used to keep track of what fields was
initialized. Some fields were moved into the anonymous union, to reduce
the size. They all are mutually exclusive. dummypar field was unused, and
therefore it is removed. The hopstore6 field type was changed from
sockaddr_in6 to a bit smaller struct ip_fw_nh6. And now the size of struct
ip_fw_args is 128 bytes.

ipfw_chk() was modified to properly handle ip_fw_args.flags instead of
rely on checking for NULL pointers.

Reviewed by: gallatin
Obtained from: Yandex LLC
MFC after: 1 month
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D18690
285cf942458c7c3460a705afabf76bf8efcc2467 04-Dec-2018 ae <ae@FreeBSD.org> Reimplement how net.inet.ip.fw.dyn_keep_states works.

Turning on of this feature allows to keep dynamic states when parent
rule is deleted. But it works only when the default rule is
"allow from any to any".

Now when rule with dynamic opcode is going to be deleted, and
net.inet.ip.fw.dyn_keep_states is enabled, existing states will reference
named objects corresponding to this rule, and also reference the rule.
And when ipfw_dyn_lookup_state() will find state for deleted parent rule,
it will return the pointer to the deleted rule, that is still valid.
This implementation doesn't support O_LIMIT_PARENT rules.

The refcnt field was added to struct ip_fw to keep reference, also
next pointer added to be able iterate rules and not damage the content
when deleted rules are chained.

Named objects are referenced only when states are going to be deleted to
be able reuse kidx of named objects when new parent rules will be
installed.

ipfw_dyn_get_count() function was modified and now it also looks into
dynamic states and constructs maps of existing named objects. This is
needed to correctly export orphaned states into userland.

ipfw_free_rule() was changed to be global, since now dynamic state can
free rule, when it is expired and references counters becomes 1.

External actions subsystem also modified, since external actions can be
deregisterd and instances can be destroyed. In these cases deleted rules,
that are referenced by orphaned states, must be modified to prevent access
to freed memory. ipfw_dyn_reset_eaction(), ipfw_reset_eaction_instance()
functions added for these purposes.

Obtained from: Yandex LLC
MFC after: 2 months
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17532
545ce709f1845e167445fdfae25d84f7c2d5b2e5 07-Feb-2018 ae <ae@FreeBSD.org> Rework ipfw dynamic states implementation to be lockless on fast path.

o added struct ipfw_dyn_info that keeps all needed for ipfw_chk and
for dynamic states implementation information;
o added DYN_LOOKUP_NEEDED() macro that can be used to determine the
need of new lookup of dynamic states;
o ipfw_dyn_rule now becomes obsolete. Currently it used to pass
information from kernel to userland only.
o IPv4 and IPv6 states now described by different structures
dyn_ipv4_state and dyn_ipv6_state;
o IPv6 scope zones support is added;
o ipfw(4) now depends from Concurrency Kit;
o states are linked with "entry" field using CK_SLIST. This allows
lockless lookup and protected by mutex modifications.
o the "expired" SLIST field is used for states expiring.
o struct dyn_data is used to keep generic information for both IPv4
and IPv6;
o struct dyn_parent is used to keep O_LIMIT_PARENT information;
o IPv4 and IPv6 states are stored in different hash tables;
o O_LIMIT_PARENT states now are kept separately from O_LIMIT and
O_KEEP_STATE states;
o per-cpu dyn_hp pointers are used to implement hazard pointers and they
prevent freeing states that are locklessly used by lookup threads;
o mutexes to protect modification of lists in hash tables now kept in
separate arrays. 65535 limit to maximum number of hash buckets now
removed.
o Separate lookup and install functions added for IPv4 and IPv6 states
and for parent states.
o By default now is used Jenkinks hash function.

Obtained from: Yandex LLC
MFC after: 42 days
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D12685
78a6b0861813af31e1354fa407c5701e8764b4d6 27-Nov-2017 pfg <pfg@FreeBSD.org> sys: general adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

No functional change intended.
7c8e43528f6a3fd412bb215c998de23390bb3540 23-Nov-2017 ae <ae@FreeBSD.org> Modify ipfw's dynamic states KPI.

Hide the locking logic used in the dynamic states implementation from
generic code. Rename ipfw_install_state() and ipfw_lookup_dyn_rule()
function to have similar names: ipfw_dyn_install_state() and
ipfw_dyn_lookup_state(). Move dynamic rule counters updating to the
ipfw_dyn_lookup_state() function. Now this function return NULL when
there is no state and pointer to the parent rule when state is found.
Thus now there is no need to return pointer to dynamic rule, and no need
to hold bucket lock for this state. Remove ipfw_dyn_unlock() function.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D11657
4cd3b30b2f3a96ba7da21859ceee4c27e10239ea 22-Nov-2017 ae <ae@FreeBSD.org> Add ipfw_add_protected_rule() function that creates rule with 65535
number in the reserved set 31. Use this function to create default rule.

Obtained from: Yandex LLC
MFC after: 1 week
Sponsored by: Yandex LLC
818d823c95a89c36cf80ac0071df76b6e3b36d52 03-Apr-2017 ae <ae@FreeBSD.org> Remove "IPFW static rules" rmlock.

Make PFIL's lock global and use it for this purpose.
This reduces the number of locks needed to acquire for each packet.

Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
No objection from: #network
Differential Revision: https://reviews.freebsd.org/D10154
919c49487bce2bdb43ef579214f43d6923d369d5 05-Mar-2017 ae <ae@FreeBSD.org> Add IPv6 support to O_IP_DST_LOOKUP opcode.

o check the size of O_IP_SRC_LOOKUP opcode, it can not exceed the size of
ipfw_insn_u32;
o rename ipfw_lookup_table_extended() function into ipfw_lookup_table() and
remove old ipfw_lookup_table();
o use args->f_id.flow_id6 that is in host byte order to get DSCP value;
o add SCTP ports support to 'lookup src/dst-port' opcode;
o add IPv6 support to 'lookup src/dst-ip' opcode.

PR: 217292
Reviewed by: melifaro
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D9873
1808279ef345fea4be4eb24d83fac842ddd8f41a 17-Jan-2017 ae <ae@FreeBSD.org> Initialize IPFW static rules rmlock with RM_RECURSE flag.

This lock was replaced from rwlock in r272840. But unlike rwlock, rmlock
doesn't allow recursion on rm_rlock(), so at this time fix this with
RM_RECURSE flag. Later we need to change ipfw to avoid such recursions.

PR: 216171
Reported by: Eugene Grosbein
MFC after: 1 week
d9f2f3b3295caf4ecf6476beec016408a76dd78c 13-Aug-2016 ae <ae@FreeBSD.org> Add three helper function to manage tables from external modules.

ipfw_objhash_lookup_table_kidx does lookup kernel index of table;
ipfw_ref_table/ipfw_unref_table takes and releases reference to table.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
c71d3d8eda0b64e39d6ab48ffb474edb5397f3b1 13-Aug-2016 ae <ae@FreeBSD.org> Move logging via BPF support into separate file.

* make interface cloner VNET-aware;
* simplify cloner code and use if_clone_simple();
* migrate LOGIF_LOCK() to rmlock;
* add ipfw_bpf_mtap2() function to pass mbuf to BPF;
* introduce new additional ipfwlog0 pseudo interface. It differs from
ipfw0 by DLT type used in bpfattach. This interface is intended to
used by ipfw modules to dump packets with additional info attached.
Currently pflog format is used. ipfw_bpf_mtap2() function uses second
argument to determine which interface use for dumping. If dlen is equal
to ETHER_HDR_LEN it uses old ipfw0 interface, if dlen is equal to
PFLOG_HDRLEN - ipfwlog0 will be used.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
e679279326d9c15acc3c4b11f8f58161869354c1 19-Jul-2016 ae <ae@FreeBSD.org> Add named dynamic states support to ipfw(4).

The keep-state, limit and check-state now will have additional argument
flowname. This flowname will be assigned to dynamic rule by keep-state
or limit opcode. And then can be matched by check-state opcode or
O_PROBE_STATE internal opcode. To reduce possible breakage and to maximize
compatibility with old rulesets default flowname introduced.
It will be assigned to the rules when user has omitted state name in
keep-state and check-state opcodes. Also if name is ambiguous (can be
evaluated as rule opcode) it will be replaced to default.

Reviewed by: julian
Obtained from: Yandex LLC
MFC after: 1 month
Relnotes: yes
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D6674
00d578928eca75be320b36d37543a7e2a4f9fbdb 27-May-2016 grehan <grehan@FreeBSD.org> Create branch for bhyve graphics import.
f79f8e9de833c40831a97da242c164c934e5545f 17-May-2016 ae <ae@FreeBSD.org> Make named objects set-aware. Now it is possible to create named
objects with the same name in different sets.

Add optional manage_sets() callback to objects rewriting framework.
It is intended to implement handler for moving and swapping named
object's sets. Add ipfw_obj_manage_sets() function that implements
generic sets handler. Use new callback to implement sets support for
lookup tables.
External actions objects are global and they don't support sets.
Modify eaction_findbyname() to reflect this.
ipfw(8) now may fail to move rules or sets, because some named objects
in target set may have conflicting names.
Note that ipfw_obj_ntlv type was changed, but since lookup tables
actually didn't support sets, this change is harmless.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
eb4486dee856baad533f5cf9ea2e0ea1923187dc 06-May-2016 ae <ae@FreeBSD.org> Change the type of objhash_cb_t callback function to be able return an
error code. Use it to interrupt the loop in ipfw_objhash_foreach().

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
59075b268839813ba48a47fcfb3c05955b0839cc 05-May-2016 ae <ae@FreeBSD.org> Rename find_name_tlv_type() to ipfw_find_name_tlv_type() and make it
global. Use it in ip_fw_table.c instead of find_name_tlv() to reduce
duplicated code.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
4d9b1f8309d402ff30a915a7e44f5a9a185b2ef2 14-Apr-2016 ae <ae@FreeBSD.org> Add External Actions KPI to ipfw(9).

It allows implementing loadable kernel modules with new actions and
without needing to modify kernel headers and ipfw(8). The module
registers its action handler and keyword string, that will be used
as action name. Using generic syntax user can add rules with this
action. Also ipfw(8) can be easily modified to extend basic syntax
for external actions, that become a part base system.
Sample modules will coming soon.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
c1f7aad42e892ceef31024a3ad2f07b56cbb28b6 14-Apr-2016 ae <ae@FreeBSD.org> Change the type of 'etlv' field in struct named_object to uint16_t.
It should match with the type field in struct ipfw_obj_tlv.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
c259f15148fd9e7666d161b42e69949f8477257e 14-Apr-2016 ae <ae@FreeBSD.org> Move several functions related to opcode rewriting framework from
ip_fw_table.c into ip_fw_sockopt.c and make them static.

Obtained from: Yandex LLC
35a3cc037927f676a284514131055cd0ecb6bf89 23-Nov-2015 ae <ae@FreeBSD.org> Add destroy_object callback to object rewriting framework.
It is called when last reference to named object is going to be released
and allows to do additional cleanup for implementation of named objects.

Obtained from: Yandex LLC
Sponsored by: Yandex LLC
f4da06a164348aa3238344233e027dd635535865 03-Nov-2015 ae <ae@FreeBSD.org> Add ipfw_check_object_name_generic() function to do basic checks for an
object name correctness. Each type of object can do more strict checking
in own implementation. Do such checks for tables in check_table_name().

Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
750b62ddbe83065a7addaeebf7b25c178265dc35 03-Nov-2015 ae <ae@FreeBSD.org> Implement `ipfw internal olist` command to list named objects.

Reviewed by: melifaro
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
27508342ba7eaf88079d21b9309746d340c8f5e2 27-Aug-2015 melifaro <melifaro@FreeBSD.org> Fix packets/bytes accounting on i386.

Spotted by: julian
9f9f412505d9bb82d294f7f8d4d6327285488be3 20-Jul-2015 ae <ae@FreeBSD.org> Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from: Yandex LLC
MFC after: 2 weeks
Sponsored by: Yandex LLC
9f3d7ccd0754f8211ecc3752432a18716c9d33d5 27-Apr-2015 melifaro <melifaro@FreeBSD.org> Make rule table kernel-index rewriting support any kind of objects.

Currently we have tables identified by their names in userland
with internal kernel-assigned indices. This works the following way:

When userland wishes to communicate with kernel to add or change rule(s),
it makes indexed sorted array of table names
(internally ipfw_obj_ntlv entries), and refer to indices in that
array in rule manipulation.
Prior to committing new rule to the ruleset kernel
a) finds all referenced tables, bump their refcounts and change
values inside the opcodes to be real kernel indices
b) auto-creates all referenced but not existing tables and then
do a) for them.

Kernel does almost the same when exporting rules to userland:
prepares array of used tables in all rules in range, and
prepends it before the actual ruleset retaining actual in-kernel
indexes for that.

There is also special translation layer for legacy clients which is
able to provide 'real' indices for table names (basically doing atoi()).

While it is arguable that every subsystem really needs names instead of
numbers, there are several things that should be noted:

1) every non-singleton subsystem needs to store its runtime state
somewhere inside ipfw chain (and be able to get it fast)
2) we can't assume object numbers provided by humans will be dense.

Existing nat implementation (O(n) access and LIST inside chain) is a
good example.

Hence the following:
* Convert table-centric rewrite code to be more generic, callback-based
* Move most of the code from ip_fw_table.c to ip_fw_sockopt.c
* Provide abstract API to permit subsystems convert their objects
between userland string identifier and in-kernel index.
(See struct opcode_obj_rewrite) for more details
* Create another per-chain index (in next commit) shared among all subsystems
* Convert current NAT44 implementation to use new API, O(1) lookups,
shared index and names instead of numbers (in next commit).

Sponsored by: Yandex LLC
8ee4f19c0595d4c5a1b5edfbe92740fb80a562e6 13-Mar-2015 ae <ae@FreeBSD.org> Fix `ipfw fwd tablearg'. Use dedicated field nh4 in struct table_value
to obtain IPv4 next hop address in tablearg case.

Add `fwd tablearg' support for IPv6. ipfw(8) uses INADDR_ANY as next hop
address in O_FORWARD_IP opcode for specifying tablearg case. For IPv6 we
still use this opcode, but when packet identified as IPv6 packet, we
obtain next hop address from dedicated field nh6 in struct table_value.

Replace hopstore field in struct ip_fw_args with anonymous union and add
hopstore6 field. Use this field to copy tablearg value for IPv6.

Replace spare1 field in struct table_value with zoneid. Use it to keep
scope zone id for link-local IPv6 addresses. Since spare1 was used
internally, replace spare0 array with two variables spare0 and spare1.

Use getaddrinfo(3)/getnameinfo(3) functions for parsing and formatting
IPv6 addresses in table_value. Use zoneid field in struct table_value
to store sin6_scope_id value.

Since the kernel still uses embedded scope zone id to represent
link-local addresses, convert next_hop6 address into this form before
return from pfil processing. This also fixes in6_localip() check
for link-local addresses.

Differential Revision: https://reviews.freebsd.org/D2015
Obtained from: Yandex LLC
Sponsored by: Yandex LLC
0f5a4f05172d1e32bc8aab85dde9cdbbd48a007f 05-Feb-2015 melifaro <melifaro@FreeBSD.org> * Make sure table algorithm destroy hook is always called without locks
* Explicitly lock freeing interface references in ta_destroy_ifidx
* Change ipfw_iface_unref() to require UH lock
* Add forgotten ipfw_iface_unref() to destroy_ifidx_locked()

PR: kern/197276
Submitted by: lev
Sponsored by: Yandex LLC
6b3c0c962ed8f245b99beb4431c1fb84f777a714 09-Nov-2014 melifaro <melifaro@FreeBSD.org> Remove unused 'struct route' fields.
f871c30ccebd558ce962e60dea4fb68733c1be97 22-Oct-2014 luigi <luigi@FreeBSD.org> remove/fix old code for building ipfw and dummynet in userspace
bd97071f9a64d269ccb4fe0e42a216d414338617 18-Oct-2014 melifaro <melifaro@FreeBSD.org> Use IPFW_RULE_CNTR_SIZE macro instead of non-relevant ip_fw_cntr structure.

Found by: luigi
7203f96dc1f03f04013935007a77a7b3833221c4 07-Oct-2014 melifaro <melifaro@FreeBSD.org> * Fix crash in interface tracker due to using old "linked" field.
* Ensure we're flushing entries without any locks held.
* Free memory in (rare) case when interface tracker fails to register ifp.
* Add KASSERT on table values refcounts.
e2a6d825458c7f83350c32babd3d6258aac80d98 04-Oct-2014 melifaro <melifaro@FreeBSD.org> Switch ipfw to use rmlock for runtime locking.
03b9e62107da29243ccd8f49a709c5821f6a73eb 05-Sep-2014 melifaro <melifaro@FreeBSD.org> * Use modular opcode handling inside ipfw_ctl3() instead of static switch.
* Provide hints for subsystem initializers if they are called for
the first/last time.
* Convert every IP_FW3 opcode user to use new sopt API.
d8fb572c3627643813e23dfc45401656a2f9be44 03-Sep-2014 melifaro <melifaro@FreeBSD.org> Be consistent and use same arguments for ctl3 opcodes.
Move legacy IP_FW_TABLE_XGETSIZE handling to separate function.
a1eca3cc0cdd195bc172867295820b7b183f96ba 31-Aug-2014 melifaro <melifaro@FreeBSD.org> Add support for multi-field values inside ipfw tables.
This is the last major change in given branch.

Kernel changes:
* Use 64-bytes structures to hold multi-value variables.
* Use shared array to hold values from all tables (assume
each table algo is capable of holding 32-byte variables).
* Add some placeholders to support per-table value arrays in future.
* Use simple eventhandler-style API to ease the process of adding new
table items. Currently table addition may required multiple UH drops/
acquires which is quite tricky due to atomic table modificatio/swap
support, shared array resize, etc. Deal with it by calling special
notifier capable of rolling back state before actually performing
swap/resize operations. Original operation then restarts itself after
acquiring UH lock.
* Bump all objhash users default values to at least 64
* Fix custom hashing inside objhash.

Userland changes:
* Add support for dumping shared value array via "vlist" internal cmd.
* Some small print/fill_flags dixes to support u32 values.
* valtype is now bitmask of
<skipto|pipe|fib|nat|dscp|tag|divert|netgraph|limit|ipv4|ipv6>.
New values can hold distinct values for each of this types.
* Provide special "legacy" type which assumes all values are the same.
* More helpers/docs following..

Some examples:

3:41 [1] zfscurr0# ipfw table mimimi create valtype skipto,limit,ipv4,ipv6
3:41 [1] zfscurr0# ipfw table mimimi info
+++ table(mimimi), set(0) +++
kindex: 2, type: addr
references: 0, valtype: skipto,limit,ipv4,ipv6
algorithm: addr:radix
items: 0, size: 296
3:42 [1] zfscurr0# ipfw table mimimi add 10.0.0.5 3000,10,10.0.0.1,2a02:978:2::1
added: 10.0.0.5/32 3000,10,10.0.0.1,2a02:978:2::1
3:42 [1] zfscurr0# ipfw table mimimi list
+++ table(mimimi), set(0) +++
10.0.0.5/32 3000,0,10.0.0.1,2a02:978:2::1
631be4d79a281442cdac1cabdf754fca53d15eda 30-Aug-2014 melifaro <melifaro@FreeBSD.org> * Make objhash api a bit more abstract by providing ability to specify
own hash/compare functions.
* Add requirement for table algorithms to copy "valie" field in @add
callback instead of "prepare_add".
* Document existing requirement for table algorithms to store value
of deleted record to @tei.
a5e98ab07dc9b8eda689dbd99ddc1be8b569907f 14-Aug-2014 melifaro <melifaro@FreeBSD.org> Clean up kernel interaction in ip_fw_iface.c

Suggested by: ae
20eb17aed6d26d7d3c707c19a003ded76903f2dd 12-Aug-2014 melifaro <melifaro@FreeBSD.org> Change tablearg value to be 0 (try #2).
Most of the tablearg-supported opcodes does not accept 0 as valid value:
O_TAG, O_TAGGED, O_PIPE, O_QUEUE, O_DIVERT, O_TEE, O_SKIPTO, O_CALLRET,
O_NETGRAPH, O_NGTEE, O_NAT treats 0 as invalid input.

The rest are O_SETDSCP and O_SETFIB.
'Fix' them by adding high-order bit (0x8000) set for non-tablearg values.
Do translation in kernel for old clients (import_rule0 / export_rule0),
teach current ipfw(8) binary to add/remove given bit.

This change does not affect handling SETDSCP values, but limit
O_SETFIB values to 32767 instead of 65k. Since currently we have either
old (16) or new (2^32) max fibs, this should not be a big deal:
we're definitely OK for former and have to add another opcode to deal
with latter, regardless of tablearg value.
8c5ec3a86cdadfd07beda007922516cefc43f429 12-Aug-2014 melifaro <melifaro@FreeBSD.org> Simplify table auto-creation for old userland users.
377bb9d131699756f65c6ffe103071cfc9944d91 11-Aug-2014 melifaro <melifaro@FreeBSD.org> * Add support for batched add/delete for ipfw tables
* Add support for atomic batches add (all or none).
* Fix panic on deleting non-existing entry in radix algo.

Examples:

# si is empty
# ipfw table si add 1.1.1.1/32 1111 2.2.2.2/32 2222
added: 1.1.1.1/32 1111
added: 2.2.2.2/32 2222
# ipfw table si add 2.2.2.2/32 2200 4.4.4.4/32 4444
exists: 2.2.2.2/32 2200
added: 4.4.4.4/32 4444
ipfw: Adding record failed: record already exists
^^^^^ Returns error but keeps inserted items
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
# ipfw table si atomic add 3.3.3.3/32 3333 4.4.4.4/32 4400 5.5.5.5/32 5555
added(reverted): 3.3.3.3/32 3333
exists: 4.4.4.4/32 4400
ignored: 5.5.5.5/32 5555
ipfw: Adding record failed: record already exists
^^^^^ Returns error and reverts added records
# ipfw table si list
+++ table(si), set(0) +++
1.1.1.1/32 1111
2.2.2.2/32 2222
4.4.4.4/32 4444
5b47ece0e9bd2fe547ad679e1c249ece5427e9c2 09-Aug-2014 melifaro <melifaro@FreeBSD.org> * Use 2 32-bits field inside rule instead of 2 pointer to save skipto state.
* Introduce ipfw_reap_add() to unify unlinking rules/adding it to reap queue
* Unbreak FreeBSD7 export format.
deeb40d882b2cea0871cad31896ee9feda938ebb 08-Aug-2014 melifaro <melifaro@FreeBSD.org> Partially revert previous commit:
"0" value is perfectly valid for O_SETFIB and O_SETDSCP,
so tablearg remains to be 655535 for now.
bc102dcade457b5c55b2db567fb4f2aad6fe3f80 08-Aug-2014 melifaro <melifaro@FreeBSD.org> * Switch tablearg value from 65535 to 0.
* Use u16 table kidx instead of integer on for iface opcode.
* Provide compability layer for old clients.
c2c120701d7e4f33daa7f3ca7e852aed0b6a5f4c 07-Aug-2014 melifaro <melifaro@FreeBSD.org> Since all of base IP_FW opcodes has been converted to IP_FW3,
switch default sopt handler to ipfw_clt3.
Add some comments for ipfw_get_sopt* api.
61bb76b81376b5a5ada7bef245e71f67bd406af7 07-Aug-2014 melifaro <melifaro@FreeBSD.org> Kernel changes:
* Implement proper checks for switching between global and set-aware tables
* Split IP_FW_DEL mess into the following opcodes:
* IP_FW_XDEL (del rules matching pattern)
* IP_FW_XMOVE (move rules matching pattern to another set)
* IP_FW_SET_SWAP (swap between 2 sets)
* IP_FW_SET_MOVE (move one set to another one)
* IP_FW_SET_ENABLE (enable/disable sets)
* Add IP_FW_XZERO / IP_FW_XRESETLOG to finish IP_FW3 migration.
* Use unified ipfw_range_tlv as range description for all of the above.
* Check dynamic states IFF there was non-zero number of deleted dyn rules,
* Del relevant dynamic states with singe traversal instead of per-rule one.

Userland changes:
* Switch ipfw(8) to use new opcodes.
c7e5ac056769fc29e9dcdd64d0ae77cc056c3e8f 03-Aug-2014 melifaro <melifaro@FreeBSD.org> Implement O(1) skipto using indexed array.
This adds 512K (2 * sizeof(u32) * 65k) bytes to the memory footprint.
This feature is optionaly and may be turned on in any time
(however it starts immediately in this commit. This will be changed.)
a1876c68a21aab52ea5138c28b5c0aa9a2ccd3d0 02-Aug-2014 melifaro <melifaro@FreeBSD.org> * Fix case when returning more that 4096 bytes of data
* Use different approach to ensure algo has enough space to store N elements:
- explicitly ask algo (under UH_WLOCK) before/after insertion. This (along
with existing reallocation callbacks) really guarantees us that it is safe
to insert N elements at once while holding UH_WLOCK+WLOCK.
- remove old aflags/flags approach
389a8543468f4ac5261d44a2e499ca29accb3994 29-Jul-2014 melifaro <melifaro@FreeBSD.org> * Introduce ipfw_ctl3() handler and move all IP_FW3 opcodes there.
The long-term goal is to switch remaining opcodes to IP_FW3 versions
and use ipfw_ctl3() as default handler simplifying ipfw(4) interaction
with external world.
fa3f38a6a0f5431577dbb7d336d2468cd60edab8 28-Jul-2014 melifaro <melifaro@FreeBSD.org> * Add generic ipfw interface tracking API
* Rewrite interface tables to use interface indexes

Kernel changes:
* Add generic interface tracking API:
- ipfw_iface_ref (must call unlocked, performs lazy init if needed, allocates
state & bumps ref)
- ipfw_iface_add_ntfy(UH_WLOCK+WLOCK, links comsumer & runs its callback to
update ifindex)
- ipfw_iface_del_ntfy(UH_WLOCK+WLOCK, unlinks consumer)
- ipfw_iface_unref(unlocked, drops reference)
Additionally, consumer callbacks are called in interface withdrawal/departure.

* Rewrite interface tables to use iface tracking API. Currently tables are
implemented the following way:
runtime data is stored as sorted array of {ifidx, val} for existing interfaces
full data is stored inside namedobj instance (chained hashed table).

* Add IP_FW_XIFLIST opcode to dump status of tracked interfaces

* Pass @chain ptr to most non-locked algorithm callbacks:
(prepare_add, prepare_del, flush_entry ..). This may be needed for better
interaction of given algorithm an other ipfw subsystems

* Add optional "change_ti" algorithm handler to permit updating of
cached table_info pointer (happens in case of table_max resize)

* Fix small bug in ipfw_list_tables()
* Add badd (insert into sorted array) and bdel (remove from sorted array) funcs

Userland changes:
* Add "iflist" cmd to print status of currently tracked interface
* Add stringnum_cmp for better interface/table names sorting
3f7d90b38540b269777223f0d936ca2415f262ac 08-Jul-2014 melifaro <melifaro@FreeBSD.org> * Use different rule structures in kernel/userland.
* Switch kernel to use per-cpu counters for rules.
* Keep ABI/API.

Kernel changes:
* Each rules is now exported as TLV with optional extenable
counter block (ip_fW_bcounter for base one) and
ip_fw_rule for rule&cmd data.
* Counters needs to be explicitly requested by IPFW_CFG_GET_COUNTERS flag.
* Separate counters from rules in kernel and clean up ip_fw a bit.
* Pack each rule in IPFW_TLV_RULE_ENT tlv to ease parsing.
* Introduce versioning in container TLV (may be needed in future).
* Fix ipfw_cfg_lheader broken u64 alignment.

Userland changes:
* Use set_mask from cfg header when requesting config
* Fix incorrect read accouting in ipfw_show_config()
* Use IPFW_RULE_NOOPT flag instead of playing with _pad
* Fix "ipfw -d list": do not print counters for dynamic states
* Some small fixes
7189aec01e6afdff30c5da27f7d8465d6be09ce7 06-Jul-2014 melifaro <melifaro@FreeBSD.org> * Prepare to pass other dynamic states via ipfw_dump_config()

Kernel changes:
* Change dump format for dynamic states:
each state is now stored inside ipfw_obj_dyntlv
last dynamic state is indicated by IPFW_DF_LAST flag
* Do not perform sooptcopyout() for !SOPT_GET requests.

Userland changes:
* Introduce foreach_state() function handler to ease work
with different states passed by ipfw_dump_config().
99023231d3e6ab3f80cecab0626d12352069bedf 03-Jul-2014 melifaro <melifaro@FreeBSD.org> Fully switch to named tables:

Kernel changes:
* Introduce ipfw_obj_tentry table entry structure to force u64 alignment.
* Support "update-on-existing-key" "add" bahavior (TEI_FLAGS_UPDATED).
* Use "subtype" field to distingush between IPv4 and IPv6 table records
instead of previous hack.
* Add value type (vtype) field for kernel tables. Current types are
number,ip and dscp
* Fix sets mask retrieval for old binaries
* Fix crash while using interface tables

Userland changes:
* Switch ipfw_table_handler() to use named-only tables.
* Add "table NAME create [type {cidr|iface|u32} [valtype {number|ip|dscp}] ..."
* Switch ipfw_table_handler to match_token()-based parser.
* Switch ipfw_sets_handler to use new ipfw_get_config() for mask retrieval.
* Allow ipfw set X table ... syntax to permit using per-set table namespaces.
75913dd997a81341ee4e07a64ff5f6d7ccec1d2b 29-Jun-2014 melifaro <melifaro@FreeBSD.org> * Add new IP_FW_XADD opcode which permits to
a) specify table ids as names
b) add multiple rules at once.
Partially convert current code for atomic addition of multiple rules.
5d627fdb8b30e877afe6caaa8ca68a5e9e191bc4 28-Jun-2014 melifaro <melifaro@FreeBSD.org> Suppord showing named tables in ipfw(8) rule listing.

Kernel changes:
* change base TLV header to be u64 (so size can be u32).
* Introduce ipfw_obj_ctlv generc container TLV.
* Add IP_FW_XGET opcode which is now used for atomic configuration
retrieval. One can specify needed configuration pieces to retrieve
via flags field. Currently supported are
IPFW_CFG_GET_STATIC (static rules) and
IPFW_CFG_GET_STATES (dynamic states).
Other configuration pieces (tables, pipes, etc..) support is planned.

Userland changes:
* Switch ipfw(8) to use new IP_FW_XGET for rule listing.
* Split rule listing code get and show pieces.
* Make several steps forward towards libipfw:
permit printing states and rules(paritally) to supplied buffer.
do not die on malloc/kernel failure inside given printing functions.
stop assuming cmdline_opts is global symbol.
9ff102accc22e9f90303ba23b0ca1e9acbb1e5e2 27-Jun-2014 melifaro <melifaro@FreeBSD.org> Use different approach for filling large datasets to userspace:

Instead of trying to allocate bing contiguous chunk of memory,
use intermediate-sized (page size) buffer as sliding window
reducing number of sooptcopyout() calls to perform.

This reduces dump functions complexity and provides additional
layer of abstraction.

User-visible api consists of 2 functions:
ipfw_get_sopt_space() - gets contigious amount of storage (or NULL)
and
ipfw_get_sopt_header() - the same, but zeroes the rest of the buffer.
8bc233982f390be1e1a2881aaf1b552dd369f499 16-Jun-2014 melifaro <melifaro@FreeBSD.org> * Add IP_FW_TABLE_XCREATE / IP_FW_TABLE_XMODIFY opcodes.
* Add 'algoname' string to ipfw_xtable_info permitting to specify lookup
algoritm with parameters.
* Rework part of ipfw_rewrite_table_uidx()

Sponsored by: Yandex LLC
b06860b3e2dfa15bb7123a320ebcc231cbd33939 15-Jun-2014 melifaro <melifaro@FreeBSD.org> Simplify opcode handling.

* Use one u16 from op3 header to implement opcode versioning.
* IP_FW_TABLE_XLIST has now 2 handlers, for ver.0 (old) and ver.1 (current).
* Every getsockopt request is now handled in ip_fw_table.c
* Rename new opcodes:
IP_FW_OBJ_DEL -> IP_FW_TABLE_XDESTROY
IP_FW_OBJ_LISTSIZE -> IP_FW_TABLES_XGETSIZE
IP_FW_OBJ_LIST -> IP_FW_TABLES_XLIST
IP_FW_OBJ_INFO -> IP_FW_TABLE_XINFO
IP_FW_OBJ_INFO -> IP_FW_TABLE_XFLUSH

* Add some docs about using given opcodes.
* Group some legacy opcode/handlers.
0001953a3587f852c2a6b4d935071dc2523dff84 14-Jun-2014 melifaro <melifaro@FreeBSD.org> Move most of external table structures/functions to separate ip_fw_table.h
f9fb63fe8c86b065753da183636bf586e6e03258 14-Jun-2014 melifaro <melifaro@FreeBSD.org> Add API to ease adding new algorithms/new tabletypes to ipfw.

Kernel-side changelog:
* Split general tables code and algorithm-specific table data.
Current algorithms (IPv4/IPv6 radix and interface tables radix) moved to
new ip_fw_table_algo.c file.
Tables code now supports any algorithm implementing the following callbacks:
+struct table_algo {
+ char name[64];
+ int idx;
+ ta_init *init;
+ ta_destroy *destroy;
+ table_lookup_t *lookup;
+ ta_prepare_add *prepare_add;
+ ta_prepare_del *prepare_del;
+ ta_add *add;
+ ta_del *del;
+ ta_flush_entry *flush_entry;
+ ta_foreach *foreach;
+ ta_dump_entry *dump_entry;
+ ta_dump_xentry *dump_xentry;
+};

* Change ->state, ->xstate, ->tabletype fields of ip_fw_chain to
->tablestate pointer (array of 32 bytes structures necessary for
runtime lookups (can be probably shrinked to 16 bytes later):

+struct table_info {
+ table_lookup_t *lookup; /* Lookup function */
+ void *state; /* Lookup radix/other structure */
+ void *xstate; /* eXtended state */
+ u_long data; /* Hints for given func */
+};

* Add count method for namedobj instance to ease size calculations
* Bump ip_fw3 buffer in ipfw_clt 128->256 bytes.
* Improve bitmask resizing on tables_max change.
* Remove table numbers checking from most places.
* Fix wrong nesting in ipfw_rewrite_table_uidx().

* Add IP_FW_OBJ_LIST opcode (list all objects of given type, currently
implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_LISTSIZE (get buffer size to hold IP_FW_OBJ_LIST data,
currenly implemented for IPFW_OBJTYPE_TABLE).
* Add IP_FW_OBJ_INFO (requests info for one object of given type).

Some name changes:
s/ipfw_xtable_tlv/ipfw_obj_tlv/ (no table specifics)
s/ipfw_xtable_ntlv/ipfw_obj_ntlv/ (no table specifics)

Userland changes:
* Add do_set3() cmd to ipfw2 to ease dealing with op3-embeded opcodes.
* Add/improve support for destroy/info cmds.
01ec53e019425da60623197883cf086045df2974 12-Jun-2014 melifaro <melifaro@FreeBSD.org> Make ipfw tables use names as used-level identifier internally:

* Add namedobject set-aware api capable of searching/allocation objects by their name/idx.
* Switch tables code to use string ids for configuration tasks.
* Change locking model: most configuration changes are protected with UH lock, runtime-visible are protected with both locks.
* Reduce number of arguments passed to ipfw_table_add/del by using separate structure.
* Add internal V_fw_tables_sets tunable (set to 0) to prepare for set-aware tables (requires opcodes/client support)
* Implement typed table referencing (and tables are implicitly allocated with all state like radix ptrs on reference)
* Add "destroy" ipfw(8) using new IP_FW_DELOBJ opcode

Namedobj more detailed:
* Blackbox api providing methods to add/del/search/enumerate objects
* Statically-sized hashes for names/indexes
* Per-set bitmask to indicate free indexes
* Separate methods for index alloc/delete/resize

Basically, there should not be any user-visible changes except the following:
* reducing table_max is not supported
* flush & add change table type won't work if table is referenced

Sponsored by: Yandex LLC
89bf7e80ea7ce88fd4bbd25c5f6576c40dea5acd 08-May-2014 melifaro <melifaro@FreeBSD.org> Merge r258708, r258711, r260247, r261117.

r258708:
Check ipfw table numbers in both user and kernel space before rule addition.
Found by: Saychik Pavel <umka@localka.net>

r258711:
Simplify O_NAT opcode handling.

r260247:
Use rnh_matchaddr instead of rnh_lookup for longest-prefix match.
rnh_lookup is effectively the same as rnh_matchaddr if called with
empy network mask.

r261117:
Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields
eb1a5f8de9f7ea602c373a710f531abbf81141c4 21-Feb-2014 gjb <gjb@FreeBSD.org> Move ^/user/gjb/hacking/release-embedded up one directory, and remove
^/user/gjb/hacking since this is likely to be merged to head/ soon.

Sponsored by: The FreeBSD Foundation
c32089edcacd841a4e3b0c0efb86678a460efe0e 24-Jan-2014 melifaro <melifaro@FreeBSD.org> Reorder struct ip_fw_chain:
* move rarely-used fields down
* move uh_lock to different cacheline
* remove some usused fields

Sponsored by: Yandex LLC
6b01bbf146ab195243a8e7d43bb11f8835c76af8 27-Dec-2013 gjb <gjb@FreeBSD.org> Copy head@r259933 -> user/gjb/hacking/release-embedded for initial
inclusion of (at least) arm builds with the release.

Sponsored by: The FreeBSD Foundation
583ac348099c237361e8e965845ae09e8bb4255f 24-Aug-2013 trociny <trociny@FreeBSD.org> Make ipfw nat init/unint work correctly for VIMAGE:

* Do per vnet instance cleanup (previously it was only for vnet0 on
module unload, and led to libalias leaks and possible panics due to
stale pointer dereferences).

* Instead of protecting ipfw hooks registering/deregistering by only
vnet0 lock (which does not prevent pointers access from another
vnets), introduce per vnet ipfw_nat_loaded variable. The variable is
set after hooks are registered and unset before they are deregistered.

* Devirtualize ifaddr_event_tag as we run only one event handler for
all vnets.

* It is supposed that ifaddr_change event handler is called in the
interface vnet context, so add an assertion.

Reviewed by: zec
MFC after: 2 weeks
23037c29f14074030f1ac9e98ad9fcccf47ada78 19-Mar-2013 ae <ae@FreeBSD.org> Separate the locking macros that are used in the packet flow path
from others. This helps easy switch to use pfil(4) lock.
a7a75993c7299476ce9044647924e53d68258b1b 23-Dec-2012 melifaro <melifaro@FreeBSD.org> Add parentheses to IP_FW_ARG_TABLEARG() definition.

Suggested by: glebius
MFC with: r244633
911df5a3324405f742e5c9eebd47c0f0bd606d09 23-Dec-2012 melifaro <melifaro@FreeBSD.org> Use unified IP_FW_ARG_TABLEARG() macro for most tablearg checks.
Log real value instead of IP_FW_TABLEARG (65535) in ipfw_log().

Noticed by: Vitaliy Tokarenko <rphone@ukr.net>
MFC after: 2 weeks
6a45724ec724b0718130f941e8bd3ded5ef85a03 30-Nov-2012 melifaro <melifaro@FreeBSD.org> Use common macros for working with rule/dynamic counters.
This is done as preparation to introduce per-cpu ipfw counters.

MFC after: 3 weeks
c07e3ec124db785f32d1fdd54ed55642750e07a6 30-Nov-2012 melifaro <melifaro@FreeBSD.org> Make ipfw dynamic states operations SMP-ready.

* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.

Discussed with: ipfw
MFC after: 3 weeks
Sponsored by: Yandex LLC
e570ee385419f97e7ebb5059236d745f587e3306 05-Nov-2012 melifaro <melifaro@FreeBSD.org> Add assertion to enforce 'nat global' locking requierements changed by r241908.

Suggested by: adrian, glebius
MFC after: 3 days
0ccf4838d7a8b4da2c3beaac7ea1fd977aa0ed11 14-Sep-2012 glebius <glebius@FreeBSD.org> o Create directory sys/netpfil, where all packet filters should
reside, and move there ipfw(4) and pf(4).

o Move most modified parts of pf out of contrib.

Actual movements:

sys/contrib/pf/net/*.c -> sys/netpfil/pf/
sys/contrib/pf/net/*.h -> sys/net/
contrib/pf/pfctl/*.c -> sbin/pfctl
contrib/pf/pfctl/*.h -> sbin/pfctl
contrib/pf/pfctl/pfctl.8 -> sbin/pfctl
contrib/pf/pfctl/*.4 -> share/man/man4
contrib/pf/pfctl/*.5 -> share/man/man5

sys/netinet/ipfw -> sys/netpfil/ipfw

The arguable movement is pf/net/*.h -> sys/net. There are
future plans to refactor pf includes, so I decided not to
break things twice.

Not modified bits of pf left in contrib: authpf, ftp-proxy,
tftp-proxy, pflogd.

The ipfw(4) movement is planned to be merged to stable/9,
to make head and stable match.

Discussed with: bz, luigi