History log of /freebsd-head/sys/sys/file.h
Revision Date Author Comments
73ba42a4914fc50a6a9e626e4263ef9c4cd4b275 14-Feb-2020 kevans <kevans@FreeBSD.org> u_char -> vm_prot_t in a couple of places, NFC

The latter is a typedef of the former; the typedef exists and these bits are
representing vmprot values, so use the correct type.

Submitted by: sigsys@gmail.com
MFC after: 3 days
89798a71d45681b347196b8e366959240dd087df 08-Jan-2020 kevans <kevans@FreeBSD.org> posix_fallocate: push vnop implementation into the fileop layer

This opens the door for other descriptor types to implement
posix_fallocate(2) as needed.

Reviewed by: kib, bcr (manpages)
Differential Revision: https://reviews.freebsd.org/D23042
9c60b86beb253f0df1c80dbd129b2303219d94e3 11-Dec-2019 mjg <mjg@FreeBSD.org> fd: static-ize and devolatile openfiles

Almost all access is using atomics. The only read is sysctl which should use
a whole-int-at-a-time friendly read internally.
28f9e44110009ae9f1f64b6389464ebd62f1790d 05-Oct-2019 mjg <mjg@FreeBSD.org> devfs: plug redundant bwillwrite avoidance

vn_write already checks for vnode type to see if bwillwrite should be called.

This effectively reverts r244643.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21905
13d4dfe478f82d11e0c7d467bf4cd4e03a0112b6 25-Sep-2019 kevans <kevans@FreeBSD.org> [1/3] Add mostly Linux-compatible file sealing support

File sealing applies protections against certain actions
(currently: write, growth, shrink) at the inode level. New fileops are added
to accommodate seals - EINVAL is returned by fcntl(2) if they are not
implemented.

Reviewed by: markj, kib
Differential Revision: https://reviews.freebsd.org/D21391
7d29da5483221ca7732e6aa620946fa805283dbb 21-Jul-2019 kib <kib@FreeBSD.org> Check and avoid overflow when incrementing fp->f_count in
fget_unlocked() and fhold().

On sufficiently large machine, f_count can be legitimately very large,
e.g. malicious code can dup same fd up to the per-process
filedescriptors limit, and then fork as much as it can.
On some smaller machine, I see
kern.maxfilesperproc: 939132
kern.maxprocperuid: 34203
which already overflows u_int. More, the malicious code can create
transient references by sending fds over unix sockets.

I realized that this check is missed after reading
https://secfault-security.com/blog/FreeBSD-SA-1902.fd.html

Reviewed by: markj (previous version), mjg
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D20947
927d2d494a9cb7f9497c82759439b82ac7e4516e 20-Jun-2019 asomers <asomers@FreeBSD.org> fcntl: fix overflow when setting F_READAHEAD

VOP_READ and VOP_WRITE take the seqcount in blocks in a 16-bit field.
However, fcntl allows you to set the seqcount in bytes to any nonnegative
31-bit value. The result can be a 16-bit overflow, which will be
sign-extended in functions like ffs_read. Fix this by sanitizing the
argument in kern_fcntl. As a matter of policy, limit to IO_SEQMAX rather
than INT16_MAX.

Also, fifos have overloaded the f_seqcount field for a completely different
purpose ever since r238936. Formalize that by using a union type.

Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20710
6615ed4c6149c9810ba766e318f591d44a0596df 05-Jul-2018 brooks <brooks@FreeBSD.org> Make struct xinpcb and friends word-size independent.

Replace size_t members with ksize_t (uint64_t) and pointer members
(never used as pointers in userspace, but instead as unique
idenitifiers) with kvaddr_t (uint64_t). This makes the structs
identical between 32-bit and 64-bit ABIs.

On 64-bit bit systems, the ABI is maintained. On 32-bit systems,
this is an ABI breaking change. The ABI of most of these structs
was previously broken in r315662. This also imposes a small API
change on userspace consumers who must handle kernel pointers
becoming virtual addresses.

PR: 228301 (exp-run by antoine)
Reviewed by: jtl, kib, rwatson (various versions)
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15386
4736ccfd9c3411d50371d7f21f9450a47c19047e 20-Nov-2017 pfg <pfg@FreeBSD.org> sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
7e6cabd06e6caa6a02eeb86308dc0cb3f27e10da 28-Feb-2017 imp <imp@FreeBSD.org> Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96
396e17522fb9168762113681cc46d25b7aa307e2 26-Feb-2017 dchagin <dchagin@FreeBSD.org> Implement timerfd family syscalls.

MFC after: 1 month
92b4fcfcce935d738065e8bbdd434a431c26722b 10-Feb-2017 kib <kib@FreeBSD.org> Fix r313495.

The file type DTYPE_VNODE can be assigned as a fallback if VOP_OPEN()
did not initialized file type. This is a typical code path used by
normal file systems.

Also, change error returned for inappropriate file type used for
O_EXLOCK to EOPNOTSUPP, as declared in the open(2) man page.

Reported by: cy, dhw, Iblis Lin <iblis@hs.ntnu.edu.tw>
Tested by: dhw
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
eaea1f53fc3306343a58a9a09856b6a85a9d3fda 13-Jan-2017 glebius <glebius@FreeBSD.org> Remove deprecated fgetsock() and fputsock().
00d578928eca75be320b36d37543a7e2a4f9fbdb 27-May-2016 grehan <grehan@FreeBSD.org> Create branch for bhyve graphics import.
e05176a63dbba4794d3d611cf9072885b3cf1eb3 29-Mar-2016 glebius <glebius@FreeBSD.org> The sendfile(2) allows to send extra data from userspace before the file
data (headers). Historically the size of the headers was not checked
against the socket buffer space. Application could easily overcommit the
socket buffer space.

With the new sendfile (r293439) the problem remained, but a KASSERT was
inserted that checked that amount of data written to the socket matches
its space. In case when size of headers is bigger that socket space,
KASSERT fires. Without INVARIANTS the new sendfile won't panic, but
would report incorrect amount of bytes sent.

o With this change, the headers copyin is moved down into the cycle, after
the sbspace() check. The uio size is trimmed by socket space there,
which fixes the overcommit problem and its consequences.
o The compatibility handling for FreeBSD 4 sendfile headers API is pushed
up the stack to syscall wrappers. This required a copy and paste of the
code, but in turn this allowed to remove extra stack carried parameter
from fo_sendfile_t, and embrace entire compat code into #ifdef. If in
future we got more fo_sendfile_t function, the copy and paste level would
even reduce.

Reviewed by: emax, gallatin, Maxim Dounin <mdounin mdounin.ru>
Tested by: Vitalij Satanivskij <satan ukr.net>
Sponsored by: Netflix
be47bc68fb065fc834ff51fea7df108abeae031c 01-Mar-2016 jhb <jhb@FreeBSD.org> Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.

Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289
1eeab3feb9befa95bc1fff51cd9b0caa2f90b0d7 09-Jan-2016 dchagin <dchagin@FreeBSD.org> MFC r283444:

Implement eventfd system call.
6348241c12da94254c6b2797b8b3cedaf6642db1 30-Sep-2015 markj <markj@FreeBSD.org> As a step towards the elimination of PG_CACHED pages, rework the handling
of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to
the head of the inactive queue instead of being cached.

This affects the implementation of POSIX_FADV_NOREUSE as well, since it
works by applying POSIX_FADV_DONTNEED to file ranges after they have been
read or written. At that point the corresponding buffers may still be
dirty, so the previous implementation would coalesce successive ranges and
apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the
dirty buffers would eventually be cached. To preserve this behaviour in an
efficient manner, this change adds a new buf flag, B_NOREUSE, which causes
the pages backing a VMIO buf to be placed at the head of the inactive queue
when the buf is released. POSIX_FADV_NOREUSE then works by setting this
flag in bufs that underlie the specified range.

Reviewed by: alc, kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3726
bba1e1e047f57e141ca66bc44e520fdd2f3c86a2 04-Jun-2015 jhb <jhb@FreeBSD.org> Add a new file operations hook for mmap operations. File type-specific
logic is now placed in the mmap hook implementation rather than requiring
it to be placed in sys/vm/vm_mmap.c. This hook allows new file types to
support mmap() as well as potentially allowing mmap() for existing file
types that do not currently support any mapping.

The vm_mmap() function is now split up into two functions. A new
vm_mmap_object() function handles the "back half" of vm_mmap() and accepts
a referenced VM object to map rather than a (handle, handle_type) tuple.
vm_mmap() is now reduced to converting a (handle, handle_type) tuple to a
a VM object and then calling vm_mmap_object() to handle the actual mapping.
The vm_mmap() function remains for use by other parts of the kernel
(e.g. device drivers and exec) but now only supports mapping vnodes,
character devices, and anonymous memory.

The mmap() system call invokes vm_mmap_object() directly with a NULL object
for anonymous mappings. For mappings using a file descriptor, the
descriptors fo_mmap() hook is invoked instead. The fo_mmap() hook is
responsible for performing type-specific checks and adjustments to
arguments as well as possibly modifying mapping parameters such as flags
or the object offset. The fo_mmap() hook routines then call
vm_mmap_object() to handle the actual mapping.

The fo_mmap() hook is optional. If it is not set, then fo_mmap() will
fail with ENODEV. A fo_mmap() hook is implemented for regular files,
character devices, and shared memory objects (created via shm_open()).

While here, consistently use the VM_PROT_* constants for the vm_prot_t
type for the 'prot' variable passed to vm_mmap() and vm_mmap_object()
as well as the vm_mmap_vnode() and vm_mmap_cdev() helper routines.
Previously some places were using the mmap()-specific PROT_* constants
instead. While this happens to work because PROT_xx == VM_PROT_xx,
using VM_PROT_* is more correct.

Differential Revision: https://reviews.freebsd.org/D2658
Reviewed by: alc (glanced over), kib
MFC after: 1 month
Sponsored by: Chelsio
d7e47c502af91003d95e0c9cf9f0035d3fe14893 24-May-2015 dchagin <dchagin@FreeBSD.org> Implement eventfd system call.

Differential Revision: https://reviews.freebsd.org/D1094
In collaboration with: Jilles Tjoelker
0a219ba739034ca95b28aa25c7cba77221c8a5b0 17-Feb-2015 mjg <mjg@FreeBSD.org> filedesc: simplify fget_unlocked & friends

Introduce fget_fcntl which performs appropriate checks when needed.
This removes a branch from fget_unlocked.

Introduce fget_mmap dealing with cap_rights_to_vmprot conversion.
This removes a branch from _fget.

Modify fget_unlocked to pass sequence counter to interested callers so
that they can perform their own checks and make sure the result was
otained from stable & current state.

Reviewed by: silence on -hackers
53273c84d07812ce5a86db41d425a50e20397b64 11-Nov-2014 glebius <glebius@FreeBSD.org> Remove SF_KQUEUE code. This code was developed at Netflix, but was not
ever used. It didn't go into stable/10, neither was documented.
It might be useful, but we collectively decided to remove it, rather
leave it abandoned and unmaintained. It is removed in one single
commit, so restoring it should be easy, if anyone wants to reopen
this idea.

Sponsored by: Netflix
c60179fc3c7373df6621acbd14a4a663a7d913c6 26-Sep-2014 mjg <mjg@FreeBSD.org> MFC r270993:

Fix up proc_realparent to always return correct process.

Prior to the change it would always return initproc for non-traced processes.

This fixes a regression in inferior().

Approved by: re (marius)
8f082668d04c6a91668059ff9bdebcfa153839f0 22-Sep-2014 jhb <jhb@FreeBSD.org> Add a new fo_fill_kinfo fileops method to add type-specific information to
struct kinfo_file.
- Move the various fill_*_info() methods out of kern_descrip.c and into the
various file type implementations.
- Rework the support for kinfo_ofile to generate a suitable kinfo_file object
for each file and then convert that to a kinfo_ofile structure rather than
keeping a second, different set of code that directly manipulates
type-specific file information.
- Remove the shm_path() and ksem_info() layering violations.

Differential Revision: https://reviews.freebsd.org/D775
Reviewed by: kib, glebius (earlier version)
4cd91e9d81f8eee5a5ab7b6250d49c03383d1b96 12-Sep-2014 jhb <jhb@FreeBSD.org> Fix various issues with invalid file operations:
- Add invfo_rdwr() (for read and write), invfo_ioctl(), invfo_poll(),
and invfo_kqfilter() for use by file types that do not support the
respective operations. Home-grown versions of invfo_poll() were
universally broken (they returned an errno value, invfo_poll()
uses poll_no_poll() to return an appropriate event mask). Home-grown
ioctl routines also tended to return an incorrect errno (invfo_ioctl
returns ENOTTY).
- Use the invfo_*() functions instead of local versions for
unsupported file operations.
- Reorder fileops members to match the order in the structure definition
to make it easier to spot missing members.
- Add several missing methods to linuxfileops used by the OFED shim
layer: fo_write(), fo_truncate(), fo_kqfilter(), and fo_stat(). Most
of these used invfo_*(), but a dummy fo_stat() implementation was
added.
1ac724b05e688dc4ddb61cf32a78e861ceab2383 26-Aug-2014 glebius <glebius@FreeBSD.org> - Remove socket file operations declaration from sys/file.h.
- Make them static in sys_socket.c.
- Provide generic invfo_truncate() instead of soo_truncate().

Sponsored by: Netflix
Sponsored by: Nginx, Inc.
c5c0a26f761859a952b782533f0d2ec25a1ec748 26-Aug-2014 mjg <mjg@FreeBSD.org> Fix up races with f_seqcount handling.

It was possible that the kernel would overwrite user-supplied hint.

Abuse vnode lock for this purpose.

In collaboration with: kib
MFC after: 1 week
eb1a5f8de9f7ea602c373a710f531abbf81141c4 21-Feb-2014 gjb <gjb@FreeBSD.org> Move ^/user/gjb/hacking/release-embedded up one directory, and remove
^/user/gjb/hacking since this is likely to be merged to head/ soon.

Sponsored by: The FreeBSD Foundation
6b01bbf146ab195243a8e7d43bb11f8835c76af8 27-Dec-2013 gjb <gjb@FreeBSD.org> Copy head@r259933 -> user/gjb/hacking/release-embedded for initial
inclusion of (at least) arm builds with the release.

Sponsored by: The FreeBSD Foundation
86274dd213a33a0a9012f50b3209adc1d5a20bed 01-Dec-2013 adrian <adrian@FreeBSD.org> Migrate the sendfile_sync structure into a public(ish) API in preparation
for extending and reusing it.

The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper,
used to indicate that the backing store for a group of mbufs has completed.
It's only being used by sendfile for now and it's only implementing a
sleep/wakeup rendezvous. However, there are other potential signaling
paths (kqueue) and other potential uses (socket zero-copy write) where the
same mechanism would also be useful.

So, with that in mind:

* extract the sendfile_sync code out into sf_sync_*() methods
* teach the sf_sync_alloc method about the current config flag -
it will eventually know about kqueue.
* move the sendfile_sync code out of do_sendfile() - the only thing
it now knows about is the sfs pointer. The guts of the sync
rendezvous (setup, rendezvous/wait, free) is now done in the
syscall wrapper.
* .. and teach the 32-bit compat sendfile call the same.

This should be a no-op. It's primarily preparation work for teaching
the sendfile_sync about kqueue notification.

Tested:

* Peter Holm's sendfile stress / regression scripts

Sponsored by: Netflix, Inc.
fae003a06921a2c437637783cab8833afbf91fa6 18-Sep-2013 rdivacky <rdivacky@FreeBSD.org> Revert r255672, it has some serious flaws, leaking file references etc.

Approved by: re (delphij)
d57db3eeadba8647b04947a065402b4fa3a37b23 18-Sep-2013 rdivacky <rdivacky@FreeBSD.org> Implement epoll support in Linuxulator. This is a tiny wrapper around kqueue
to implement epoll subset of functionality. The kqueue user data are 32bit
on i386 which is not enough for epoll user data so this patch overrides
kqueue fileops to maintain enough space in struct file.

Initial patch developed by me in 2007 and then extended and finished
by Yuri Victorovich.

Approved by: re (delphij)
Sponsored by: Google Summer of Code
Submitted by: Yuri Victorovich <yuri at rawbw dot com>
Tested by: Yuri Victorovich <yuri at rawbw dot com>
029a6f5d92dc57925b5f155d94d6e01fdab7a45d 05-Sep-2013 pjd <pjd@FreeBSD.org> Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation
6a459eb27c6a215b645e8673b5275f94003510a4 21-Aug-2013 kib <kib@FreeBSD.org> Make the seek a method of the struct fileops.

Tested by: pho
Sponsored by: The FreeBSD Foundation
408a640438d54402be29b92aefda7ba7b30eea12 16-Aug-2013 kib <kib@FreeBSD.org> Restore the previous sendfile(2) behaviour on the block devices.
Provide valid .fo_sendfile method for several missed struct fileops.

Reviewed by: glebius
Sponsored by: The FreeBSD Foundation
722a1a5e5d54a4935a4136368f443f6c88ca0d71 15-Aug-2013 glebius <glebius@FreeBSD.org> Make sendfile() a method in the struct fileops. Currently only
vnode backed file descriptors have this method implemented.

Reviewed by: kib
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
f07ebb8888ea42f744890a727e8f6799a1086915 02-Mar-2013 pjd <pjd@FreeBSD.org> Merge Capsicum overhaul:

- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:

CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.

Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).

Added CAP_SYMLINKAT:
- Allow for symlinkat(2).

Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).

Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.

Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.

Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.

Removed CAP_MAPEXEC.

CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.

Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.

CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).

CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).

Added convinient defines:

#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE

#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Added defines for backward API compatibility:

#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib
c6bad3bef79216ab7313dba42827d02cecb65b47 23-Dec-2012 kib <kib@FreeBSD.org> Do not force a writer to the devfs file to drain the buffer writes.

Requested and tested by: Ian Lepore <freebsd@damnhippie.dyndns.org>
MFC after: 2 weeks
9cc8fb25a6c747a4ce4eff0392c16c81d2135af8 08-Jul-2012 attilio <attilio@FreeBSD.org> MFC
1e0d3fae01871e2dc80f05ead912d9a1d4f56f15 08-Jul-2012 mjg <mjg@FreeBSD.org> Unbreak handling of descriptors opened with O_EXEC by fexecve(2).

While here return EBADF for descriptors opened for writing (previously it was ETXTBSY).

Add fgetvp_exec function which performs appropriate checks.

PR: kern/169651
In collaboration with: kib
Approved by: trasz (mentor)
MFC after: 1 week
53224f018aac13056c11af9d1233317b1754149c 02-Jul-2012 kib <kib@FreeBSD.org> Extend the KPI to lock and unlock f_offset member of struct file. It
now fully encapsulates all accesses to f_offset, and extends f_offset
locking to other consumers that need it, in particular, to lseek() and
variants of getdirentries().

Ensure that on 32bit architectures f_offset, which is 64bit quantity,
always read and written under the mtxpool protection. This fixes
apparently easy to trigger race when parallel lseek()s or lseek() and
read/write could destroy file offset.

The already broken ABI emulations, including iBCS and SysV, are not
converted (yet).

Tested by: pho
No objections from: jhb
MFC after: 3 weeks
571562fffb435a478f0591879dd19fe9227cb52e 19-Jun-2012 jhb <jhb@FreeBSD.org> Further refine the implementation of POSIX_FADV_NOREUSE.

First, extend the changes in r230782 to better handle the common case
of using NOREUSE with sequential reads. A NOREUSE file descriptor
will now track the last implicit DONTNEED request it made as a result
of a NOREUSE read. If a subsequent NOREUSE read is adjacent to the
previous range, it will apply the DONTNEED request to the entire range
of both the previous read and the current read. The effect is that
each read of a file accessed sequentially will apply the DONTNEED
request to the entire range that has been read. This allows NOREUSE
to properly handle misaligned reads by flushing each buffer to cache
once it has been completely read.

Second, apply the same changes made to read(2) by r230782 and this
change to writes. This provides much better performance in the
sequential write case as it allows writes to still be clustered. It
also provides much better performance for misaligned writes. It does
mean that NOREUSE will be generally ineffective for non-sequential
writes as the current implementation relies on a future NOREUSE
write's implicit DONTNEED request to flush the dirty buffer from the
current write.

MFC after: 2 weeks
6e9e854884e3fcfa127843dd7969c7d3015c76be 08-Nov-2011 attilio <attilio@FreeBSD.org> MFC
637fddd99966303c683109ca5fe108ce7241799d 06-Nov-2011 ed <ed@FreeBSD.org> Remove MALLOC_DECLAREs of nonexisting malloc-pools.

After careful grepping, it seems none of these pools can be found in our
source tree. They are not in use, nor are they defined.
78c075174e74e727279365476d0d076d6c3e3075 04-Nov-2011 jhb <jhb@FreeBSD.org> Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types. The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are
thus filesystem independent. Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used. These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request. This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode. This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue. The second change adds
a new vm_object_page_cache() method. This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by: jilles, kib, arch@
Approved by: re (kib)
MFC after: 1 month
5ecd1c9d4080f3ae8a48c02523542b308b562160 18-Aug-2011 jonathan <jonathan@FreeBSD.org> Add experimental support for process descriptors

A "process descriptor" file descriptor is used to manage processes
without using the PID namespace. This is required for Capsicum's
Capability Mode, where the PID namespace is unavailable.

New system calls pdfork(2) and pdkill(2) offer the functional equivalents
of fork(2) and kill(2). pdgetpid(2) allows querying the PID of the remote
process for debugging purposes. The currently-unimplemented pdwait(2) will,
in the future, allow querying rusage/exit status. In the interim, poll(2)
may be used to check (and wait for) process termination.

When a process is referenced by a process descriptor, it does not issue
SIGCHLD to the parent, making it suitable for use in libraries---a common
scenario when using library compartmentalisation from within large
applications (such as web browsers). Some observers may note a similarity
to Mach task ports; process descriptors provide a subset of this behaviour,
but in a UNIX style.

This feature is enabled by "options PROCDESC", but as with several other
Capsicum kernel features, is not enabled by default in GENERIC 9.0.

Reviewed by: jhb, kib
Approved by: re (kib), mentor (rwatson)
Sponsored by: Google Inc
011f42054d1f861cd2435866ba646fa0cf752103 16-Aug-2011 kib <kib@FreeBSD.org> Add the fo_chown and fo_chmod methods to struct fileops and use them
to implement fchown(2) and fchmod(2) support for several file types
that previously lacked it. Add MAC entries for chown/chmod done on
posix shared memory and (old) in-kernel posix semaphores.

Based on the submission by: glebius
Reviewed by: rwatson
Approved by: re (bz)
4af919b491560ff051b65cdf1ec730bdeb820b2e 11-Aug-2011 rwatson <rwatson@FreeBSD.org> Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc
6abbb93d5fb70390974d5b0bb73e75616bd9c39a 05-Jul-2011 jonathan <jonathan@FreeBSD.org> Rework _fget to accept capability parameters.

This new version of _fget() requires new parameters:
- cap_rights_t needrights
the rights that we expect the capability's rights mask to include
(e.g. CAP_READ if we are going to read from the file)

- cap_rights_t *haverights
used to return the capability's rights mask (ignored if NULL)

- u_char *maxprotp
the maximum mmap() rights (e.g. VM_PROT_READ) that can be permitted
(only used if we are going to mmap the file; ignored if NULL)

- int fget_flags
FGET_GETCAP if we want to return the capability itself, rather than
the underlying object which it wraps

Approved by: mentor (rwatson), re (Capsicum blanket)
Sponsored by: Google Inc
d7717a507b09f7de9731276fb39cd69e7407a3bc 01-Jul-2011 jonathan <jonathan@FreeBSD.org> Define cap_rights_t and DTYPE_CAPABILITY, which are required to
implement Capsicum capabilities.

Approved by: mentor (rwatson), re (bz)
2d7d8c05e7404fbebf1f0fe24c13bc5bb58d2338 21-Mar-2011 jeff <jeff@FreeBSD.org> - Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.
09f9c897d33c41618ada06fbbcf1a9b3812dee53 19-Oct-2010 jamie <jamie@FreeBSD.org> A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.
4c502da7ff2e402a32fac30f893cc0fdbc0219bb 11-Jun-2010 jkim <jkim@FreeBSD.org> Apply band-aid around function-like macro fdrop() without turning it into
a real (inline) function or applying void casting for all its consumers.
In most of places, the "return value" is not checked nor assigned, which
causes too many warnings for some smart compilers, i.e., clang.

Found by: clang
f1216d1f0ade038907195fc114b7e630623b402c 19-Mar-2010 delphij <delphij@FreeBSD.org> Create a custom branch where I will be able to do the merge.
ce942064c6a24172c0d5b08847b0f672844f2509 28-Dec-2009 ed <ed@FreeBSD.org> Use ANSI declarations instead of K&R.
b3e5e62b89b5aaa393ce9895dc25552e19fb0254 01-Jan-2009 obrien <obrien@FreeBSD.org> style(9)

Verfied with: svn diff -x -Bbw file.h showing empty diff
19b6af98ec71398e77874582eb84ec5310c7156f 22-Nov-2008 dfr <dfr@FreeBSD.org> Clone Kip's Xen on stable/6 tree so that I can work on improving FreeBSD/amd64
performance in Xen's HVM mode.
cf5320822f93810742e3d4a1ac8202db8482e633 19-Oct-2008 lulf <lulf@FreeBSD.org> - Import the HEAD csup code which is the basis for the cvsmode work.
cc3116a9380fe32a751b584f3d8083698ccfba15 20-Aug-2008 ed <ed@FreeBSD.org> Integrate the new MPSAFE TTY layer to the FreeBSD operating system.

The last half year I've been working on a replacement TTY layer for the
FreeBSD kernel. The new TTY layer was designed to improve the following:

- Improved driver model:

The old TTY layer has a driver model that is not abstract enough to
make it friendly to use. A good example is the output path, where the
device drivers directly access the output buffers. This means that an
in-kernel PPP implementation must always convert network buffers into
TTY buffers.

If a PPP implementation would be built on top of the new TTY layer
(still needs a hooks layer, though), it would allow the PPP
implementation to directly hand the data to the TTY driver.

- Improved hotplugging:

With the old TTY layer, it isn't entirely safe to destroy TTY's from
the system. This implementation has a two-step destructing design,
where the driver first abandons the TTY. After all threads have left
the TTY, the TTY layer calls a routine in the driver, which can be
used to free resources (unit numbers, etc).

The pts(4) driver also implements this feature, which means
posix_openpt() will now return PTY's that are created on the fly.

- Improved performance:

One of the major improvements is the per-TTY mutex, which is expected
to improve scalability when compared to the old Giant locking.
Another change is the unbuffered copying to userspace, which is both
used on TTY device nodes and PTY masters.

Upgrading should be quite straightforward. Unlike previous versions,
existing kernel configuration files do not need to be changed, except
when they reference device drivers that are listed in UPDATING.

Obtained from: //depot/projects/mpsafetty/...
Approved by: philip (ex-mentor)
Discussed: on the lists, at BSDCan, at the DevSummit
Sponsored by: Snow B.V., the Netherlands
dcons(4) fixed by: kan
22ff03f33782b6e03626ec22f322eddfdf99999d 19-Aug-2008 kib <kib@FreeBSD.org> MFC r179175:
Implement the per-open file data for the cdev.
The td_fpop member of the struct thread is appended to the end, and cleared
in the thread allocator to keep struct thread KBI-compatible on RELENG_7.

MFC r181635:
Remove unnecessary locking around pointer fetch.
411d06839511bd58b0df613c30e0ff8b09022ed1 27-Jun-2008 jhb <jhb@FreeBSD.org> Rework the lifetime management of the kernel implementation of POSIX
semaphores. Specifically, semaphores are now represented as new file
descriptor type that is set to close on exec. This removes the need for
all of the manual process reference counting (and fork, exec, and exit
event handlers) as the normal file descriptor operations handle all of
that for us nicely. It is also suggested as one possible implementation
in the spec and at least one other OS (OS X) uses this approach.

Some bugs that were fixed as a result include:
- References to a named semaphore whose name is removed still work after
the sem_unlink() operation. Prior to this patch, if a semaphore's name
was removed, valid handles from sem_open() would get EINVAL errors from
sem_getvalue(), sem_post(), etc. This fixes that.
- Unnamed semaphores created with sem_init() were not cleaned up when a
process exited or exec'd. They were only cleaned up if the process
did an explicit sem_destroy(). This could result in a leak of semaphore
objects that could never be cleaned up.
- On the other hand, if another process guessed the id (kernel pointer to
'struct ksem' of an unnamed semaphore (created via sem_init)) and had
write access to the semaphore based on UID/GID checks, then that other
process could manipulate the semaphore via sem_destroy(), sem_post(),
sem_wait(), etc.
- As part of the permission check (UID/GID), the umask of the proces
creating the semaphore was not honored. Thus if your umask denied group
read/write access but the explicit mode in the sem_init() call allowed
it, the semaphore would be readable/writable by other users in the
same group, for example. This includes access via the previous bug.
- If the module refused to unload because there were active semaphores,
then it might have deregistered one or more of the semaphore system
calls before it noticed that there was a problem. I'm not sure if
this actually happened as the order that modules are discovered by the
kernel linker depends on how the actual .ko file is linked. One can
make the order deterministic by using a single module with a mod_event
handler that explicitly registers syscalls (and deregisters during
unload after any checks). This also fixes a race where even if the
sem_module unloaded first it would have destroyed locks that the
syscalls might be trying to access if they are still executing when
they are unloaded.

XXX: By the way, deregistering system calls doesn't do any blocking
to drain any threads from the calls.
- Some minor fixes to errno values on error. For example, sem_init()
isn't documented to return ENFILE or EMFILE if we run out of semaphores
the way that sem_open() can. Instead, it should return ENOSPC in that
case.

Other changes:
- Kernel semaphores now use a hash table to manage the namespace of
named semaphores nearly in a similar fashion to the POSIX shared memory
object file descriptors. Kernel semaphores can now also have names
longer than 14 chars (up to MAXPATHLEN) and can include subdirectories
in their pathname.
- The UID/GID permission checks for access to a named semaphore are now
done via vaccess() rather than a home-rolled set of checks.
- Now that kernel semaphores have an associated file object, the various
MAC checks for POSIX semaphores accept both a file credential and an
active credential. There is also a new posixsem_check_stat() since it
is possible to fstat() a semaphore file descriptor.
- A small set of regression tests (using the ksem API directly) is present
in src/tools/regression/posixsem.

Reported by: kris (1)
Tested by: kris
Reviewed by: rwatson (lightly)
MFC after: 1 month
268a4c430f25bc479b5e1e097f51394aa0f1fffb 26-May-2008 pjd <pjd@FreeBSD.org> Use _WANT_FILE to make struct file visible from userland. This is
similar to _WANT_UCRED and _WANT_PRISON and seems to be much nicer than
defining _KERNEL.
It is also needed for my sys/refcount.h change going in soon.
4d240aa98e28b74ea9808f1ed3cd852399f4c1ba 25-May-2008 attilio <attilio@FreeBSD.org> Replace direct atomic operation for the file refcount witht the
refcount interface.
It also introduces the correct usage of memory barriers, as sometimes
fdrop() and fhold() are used with shared locks, which don't use any
release barrier.
5971791c189c2ddb0bee3aadd934e064d60bf299 21-May-2008 kib <kib@FreeBSD.org> Implement the per-open file data for the cdev.

The patch does not change the cdevsw KBI. Management of the data is
provided by the functions
int devfs_set_cdevpriv(void *priv, cdevpriv_dtr_t dtr);
int devfs_get_cdevpriv(void **datap);
void devfs_clear_cdevpriv(void);
All of the functions are supposed to be called from the cdevsw method
contexts.

- devfs_set_cdevpriv assigns the priv as private data for the file
descriptor which is used to initiate currently performed driver
operation. dtr is the function that will be called when either the
last refernce to the file goes away, the device is destroyed or
devfs_clear_cdevpriv is called.
- devfs_get_cdevpriv is the obvious accessor.
- devfs_clear_cdevpriv allows to clear the private data for the still
open file.

Implementation keeps the driver-supplied pointers in the struct
cdev_privdata, that is referenced both from the struct file and struct
cdev, and cannot outlive any of the referee.

Man pages will be provided after the KPI stabilizes.

Reviewed by: jhb
Useful suggestions from: jeff, antoine
Debugging help and tested by: pho
MFC after: 1 month
8cd9437636744162d1427275b2fe66cf8ccef25c 08-Jan-2008 jhb <jhb@FreeBSD.org> Add a new file descriptor type for IPC shared memory objects and use it to
implement shm_open(2) and shm_unlink(2) in the kernel:
- Each shared memory file descriptor is associated with a swap-backed vm
object which provides the backing store. Each descriptor starts off with
a size of zero, but the size can be altered via ftruncate(2). The shared
memory file descriptors also support fstat(2). read(2), write(2),
ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared
memory file descriptors.
- shm_open(2) and shm_unlink(2) are now implemented as system calls that
manage shared memory file descriptors. The virtual namespace that maps
pathnames to shared memory file descriptors is implemented as a hash
table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash
of the pathname.
- As an extension, the constant 'SHM_ANON' may be specified in place of the
path argument to shm_open(2). In this case, an unnamed shared memory
file descriptor will be created similar to the IPC_PRIVATE key for
shmget(2). Note that the shared memory object can still be shared among
processes by sharing the file descriptor via fork(2) or sendmsg(2), but
it is unnamed. This effectively serves to implement the getmemfd() idea
bandied about the lists several times over the years.
- The backing store for shared memory file descriptors are garbage
collected when they are not referenced by any open file descriptors or
the shm_open(2) virtual namespace.

Submitted by: dillon, peter (previous versions)
Submitted by: rwatson (I based this on his version)
Reviewed by: alc (suggested converting getmemfd() to shm_open())
f8a246b9791d1450cf4945cc7b38f651a3a456ee 07-Jan-2008 jhb <jhb@FreeBSD.org> Make ftruncate a 'struct file' operation rather than a vnode operation.
This makes it possible to support ftruncate() on non-vnode file types in
the future.
- 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on
a given file descriptor.
- ftruncate() moves to kern/sys_generic.c and now just fetches a file
object and invokes fo_truncate().
- The vnode-specific portions of ftruncate() move to vn_truncate() in
vfs_vnops.c which implements fo_truncate() for vnode file types.
- Non-vnode file types return EINVAL in their fo_truncate() method.

Submitted by: rwatson
ce1863880500c459eb1395c1d6f81819e02e6608 30-Dec-2007 jeff <jeff@FreeBSD.org> Remove explicit locking of struct file.
- Introduce a finit() which is used to initailize the fields of struct file
in such a way that the ops vector is only valid after the data, type,
and flags are valid.
- Protect f_flag and f_count with atomic operations.
- Remove the global list of all files and associated accounting.
- Rewrite the unp garbage collection such that it no longer requires
the global list of all files and instead uses a list of all unp sockets.
- Mark sockets in the accept queue so we don't incorrectly gc them.

Tested by: kris, pho
82609b5afe99447dbb5bccaca2572497a1bb4c88 12-Jan-2007 jhb <jhb@FreeBSD.org> MFC: Close a race between UNIX domain pcb garbage collection (unp_gc()) and
file descriptor teardown (fdrop()) by adding a new garbage collection flag
FWAIT.
256d3cdbaf7ae58daa236a7354b03d5b94182a94 05-Jan-2007 jhb <jhb@FreeBSD.org> - Close a race between enumerating UNIX domain socket pcb structures via
sysctl and socket teardown by adding a reference count to the UNIX domain
pcb object and fixing the sysctl that enumerates unpcbs to grab a
reference on each unpcb while it builds the list to copy out to userland.
- Close a race between UNIX domain pcb garbage collection (unp_gc()) and
file descriptor teardown (fdrop()) by adding a new garbage collection
flag FWAIT. unp_gc() sets FWAIT while it walks the message buffers
in a UNIX domain socket looking for nested file descriptor references
and clears the flag when it is finished. fdrop() checks to see if the
flag is set on a file descriptor whose refcount just dropped to 0 and
waits for unp_gc() to clear the flag before completely destroying the
file descriptor.

MFC after: 1 week
Reviewed by: rwatson
Submitted by: ups
Hopefully makes the panics go away: mx1
c1361e659d6e2cdcb4b1de38ff151b53bc489658 30-May-2006 ps <ps@FreeBSD.org> MFC: Allow concurrent read(2)/readv(2) access to a file. Lock file
offset against multiple read calls.
83af4cc4336ad4d5435cc793284c93d7f3a3d3df 16-May-2006 ps <ps@FreeBSD.org> Allow concurrent read(2)/readv(2) access to a file.
Lock file offset against multiple read calls.

Submitted by: ups
Obtained from: Yahoo!
MFC after: 2 weeks
dac7c81b6209bb2637aa4305624463372d836b5b 26-Nov-2005 davidxu <davidxu@FreeBSD.org> Bring in experimental kernel support for POSIX message queue.
dc9f809dd574faccecacee96f8fafcefeb7151aa 10-Feb-2005 phk <phk@FreeBSD.org> Make some file/filedesc related functions static
d2c9006fb572d62b689561459965dc5b05674746 02-Feb-2005 rwatson <rwatson@FreeBSD.org> Add a place-holder f_label void * for a future struct label pointer
required for the port of SELinux FLASK/TE to FreeBSD using the MAC
Framework.
f0bf889d0d2ea7d83fd3b67266a98c89cdf14853 07-Jan-2005 imp <imp@FreeBSD.org> /* -> /*- for license, minor formatting changes
9b4cd725f10562da76638ddca6a4eebdb665764c 01-Dec-2004 phk <phk@FreeBSD.org> "nfiles" is a bad name for a global variable. Call it "openfiles" instead
as this is more correct and matches the sysctl variable.
82ae7820c58e438586346a2d62e6e5a3af3dad2c 08-Nov-2004 phk <phk@FreeBSD.org> Save a pointless FILE_LOCK_ASSERT() call in fhold.
9695efd3c3122859d53062407f5db55fd4d762d8 19-Jun-2004 phk <phk@FreeBSD.org> Add the f_vnode pointer to struct xfile, shortly it will no longer be
identical to f_data by definition.
9e1925f5605eb84767aaedbc35ab643a8f9f90a5 07-Apr-2004 imp <imp@FreeBSD.org> Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core
c81c59299bd255eb5ab7510c9e84e9c54b8003d0 22-Jun-2003 phk <phk@FreeBSD.org> Add a f_vnode field to struct file.

Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.
aca3c4a9f0555df7106b9e3e1b59a9864ea4c08c 20-Jun-2003 phk <phk@FreeBSD.org> Move FMARK and FDEFER til sys/file.h where they belong.

Order the fields in struct file in sections after their scope.
a81d7fdac76035cdd1a7cedd1187294b1688f7c0 18-Jun-2003 phk <phk@FreeBSD.org> Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.
29fb7c2bce463e61580d0653a141df43ac982c4f 15-Feb-2003 alfred <alfred@FreeBSD.org> Do not allow kqueues to be passed via unix domain sockets.
ccd5574cc6e61b8fbf6b5ed907375f42e19b54f8 13-Jan-2003 dillon <dillon@FreeBSD.org> Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.
ddf9ef103e0a611c9a01425a28baf8a612b0d114 12-Jan-2003 dillon <dillon@FreeBSD.org> Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.
22ca3b530e0ece1ccb72f0d5bd2b07d9356e9eb0 24-Dec-2002 phk <phk@FreeBSD.org> White-space changes.
26984a400376cf63b49953e6661acbf02e70c62d 23-Dec-2002 phk <phk@FreeBSD.org> Move the declaration of the socket fileops from socketvar.h to file.h.
This allows us to use the new typedefs and removes the needs for a number
of forward struct declarations in socketvar.h
b9e78196906ce4e1ddcc9226147152ae434299af 23-Dec-2002 phk <phk@FreeBSD.org> Detediousficate declaration of fileops array members by introducing
typedefs for them.
0196f79e00ff24f3920f16802c98c418bb637a67 04-Oct-2002 sam <sam@FreeBSD.org> add DTYPE_CRYPTO for use by /dev/crypto support
024665d45e967fede40c4d15da08470800f4bd63 17-Sep-2002 mike <mike@FreeBSD.org> Include <sys/types.h> directly rather than depending on <sys/fcntl.h>
to include it. This could be avoided by adding the necessary typedefs
here, or by making users of <sys/file.h> include <sys/types.h> first.
3246fbf45f089a96288563f2d5071bfbde5f99df 17-Aug-2002 rwatson <rwatson@FreeBSD.org> In continuation of early fileop credential changes, modify fo_ioctl() to
accept an 'active_cred' argument reflecting the credential of the thread
initiating the ioctl operation.

- Change fo_ioctl() to accept active_cred; change consumers of the
fo_ioctl() interface to generally pass active_cred from td->td_ucred.
- In fifofs, initialize filetmp.f_cred to ap->a_cred so that the
invocations of soo_ioctl() are provided access to the calling f_cred.
Pass ap->a_td->td_ucred as the active_cred, but note that this is
required because we don't yet distinguish file_cred and active_cred
in invoking VOP's.
- Update kqueue_ioctl() for its new argument.
- Update pipe_ioctl() for its new argument, pass active_cred rather
than td_ucred to MAC for authorization.
- Update soo_ioctl() for its new argument.
- Update vn_ioctl() for its new argument, use active_cred rather than
td->td_ucred to authorize VOP_IOCTL() and the associated VOP_GETATTR().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
2b82cd24f10c789221e2b4edc59b96a7733b9e71 16-Aug-2002 rwatson <rwatson@FreeBSD.org> Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
e75c68c9502ec3d258fa4b4df5c672b7f11b4490 15-Aug-2002 rwatson <rwatson@FreeBSD.org> For some reason, the flags and td arguments in the fo_read prototype
were reversed. Correct this with no functional change.
44404e4547aee87b255582d4e6395551869e29b1 15-Aug-2002 rwatson <rwatson@FreeBSD.org> In order to better support flexible and extensible access control,
make a series of modifications to the credential arguments relating
to file read and write operations to cliarfy which credential is
used for what:

- Change fo_read() and fo_write() to accept "active_cred" instead of
"cred", and change the semantics of consumers of fo_read() and
fo_write() to pass the active credential of the thread requesting
an operation rather than the cached file cred. The cached file
cred is still available in fo_read() and fo_write() consumers
via fp->f_cred. These changes largely in sys_generic.c.

For each implementation of fo_read() and fo_write(), update cred
usage to reflect this change and maintain current semantics:

- badfo_readwrite() unchanged
- kqueue_read/write() unchanged
pipe_read/write() now authorize MAC using active_cred rather
than td->td_ucred
- soo_read/write() unchanged
- vn_read/write() now authorize MAC using active_cred but
VOP_READ/WRITE() with fp->f_cred

Modify vn_rdwr() to accept two credential arguments instead of a
single credential: active_cred and file_cred. Use active_cred
for MAC authorization, and select a credential for use in
VOP_READ/WRITE() based on whether file_cred is NULL or not. If
file_cred is provided, authorize the VOP using that cred,
otherwise the active credential, matching current semantics.

Modify current vn_rdwr() consumers to pass a file_cred if used
in the context of a struct file, and to always pass active_cred.
When vn_rdwr() is used without a file_cred, pass NOCRED.

These changes should maintain current semantics for read/write,
but avoid a redundant passing of fp->f_cred, as well as making
it more clear what the origin of each credential is in file
descriptor read/write operations.

Follow-up commits will make similar changes to other file descriptor
operations, and modify the MAC framework to pass both credentials
to MAC policy modules so they can implement either semantic for
revocation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
43aa4c758e77b281df2f306b5025c1ea7e0030c0 31-Jul-2002 des <des@FreeBSD.org> Add struct xfile, which will be used instead of struct file for sysctl
purposes.

Sponsored by: DARPA, NAI Labs
d1cbf6a1d1f96288005329dfaca2aaffbd884d81 29-Jun-2002 alfred <alfred@FreeBSD.org> More caddr_t removal, make fo_ioctl take a void * instead of a caddr_t.
386db368a001c218d07991234f444f700d59eaca 28-Jun-2002 alfred <alfred@FreeBSD.org> change f_data field in struct file from caddr_t to void *.
22846f623a1d04d46be5fbaac2b93e7635e43086 17-Jun-2002 alfred <alfred@FreeBSD.org> remove bogus comment, select/poll do NOT need to fhold as they hold the
filedesc lock.
style(9) fixes, add blank line at start of functions with no local variables.
8362010d3337dbf98da10017f6603904e740724b 26-Mar-2002 bde <bde@FreeBSD.org> Removed some namespace pollution (unnecessary nested includes).
d27d5e3b44ff619a0151aa86993697a211e61bc7 23-Mar-2002 bde <bde@FreeBSD.org> Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.
3b2d03b60a11ce28e58a87212bcccedd306f2c81 19-Mar-2002 alfred <alfred@FreeBSD.org> Remove __P
c1ec96c2c136d84837e335789e3e768b7731fe5e 23-Feb-2002 des <des@FreeBSD.org> Fix namespace pollution introduced in previous commit.
a09da298590e8c11ebafa37f79e0046814665237 23-Feb-2002 tanimura <tanimura@FreeBSD.org> Lock struct pgrp, session and sigio.

New locks are:

- pgrpsess_lock which locks the whole pgrps and sessions,
- pg_mtx which protects the pgrp members, and
- s_mtx which protects the session members.

Please refer to sys/proc.h for the coverage of these locks.

Changes on the pgrp/session interface:

- pgfind() needs the pgrpsess_lock held.

- The caller of enterpgrp() is responsible to allocate a new pgrp and
session.

- Call enterthispgrp() in order to enter an existing pgrp.

- pgsignal() requires a pgrp lock held.

Reviewed by: jhb, alfred
Tested on: cvsup.jp.FreeBSD.org
(which is a quad-CPU machine running -current)
1d6432ede3d9347bff05575e2e59a970c49ba052 20-Jan-2002 alfred <alfred@FreeBSD.org> use mutex pools for "struct file" locking.
fix indentation of FILE_LOCK/UNLOCK macros while I'm here.
03877ff1562f87f8967af3bd0a43560e3b228820 14-Jan-2002 alfred <alfred@FreeBSD.org> Remove requirement for queue.h by consumers by moving its inclusion
before other headers that require it.

Pointed out by: ru, bde
1f82bc18d1d1e906cd9ed68039acb051fa6e11cf 14-Jan-2002 alfred <alfred@FreeBSD.org> Replace ffind_* with fget calls.

Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().
5e2f4cf200203b968666acb6979aaccf7d62bbf5 13-Jan-2002 alfred <alfred@FreeBSD.org> Include sys/_lock.h and sys/_mutex.h to reduce namespace pollution.

Requested by: jhb
2344f261e7662c2f3f31fa8a99b424aa4e498a63 13-Jan-2002 alfred <alfred@FreeBSD.org> Remove file locking debug cruft.
844237b3960bfbf49070d6371a84f67f9e3366f6 13-Jan-2002 alfred <alfred@FreeBSD.org> SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.
86ed17d675cb503ddb3f71f8b6f7c3af530bb29a 17-Nov-2001 dillon <dillon@FreeBSD.org> Give struct socket structures a ref counting interface similar to
vnodes. This will hopefully serve as a base from which we can
expand the MP code. We currently do not attempt to obtain any
mutex or SX locks, but the door is open to add them when we nail
down exactly how that part of it is going to work.
e3b965f7d57557c7273b062793ee6de6ff40223d 14-Nov-2001 dillon <dillon@FreeBSD.org> remove holdfp()

Replace uses of holdfp() with fget*() or fgetvp*() calls as appropriate

introduce fget(), fget_read(), fget_write() - these functions will take
a thread and file descriptor and return a file pointer with its ref
count bumped.

introduce fgetvp(), fgetvp_read(), fgetvp_write() - these functions will
take a thread and file descriptor and return a vref()'d vnode.

*_read() requires that the file pointer be FREAD, *_write that it be
FWRITE.

This continues the cleanup of struct filedesc and struct file access
routines which, when are all through with it, will allow us to then
make the API calls MP safe and be able to move Giant down into the fo_*
functions.
1d2d921ea8dc1db60c8ed47fe6c7ce2048325f76 13-Sep-2001 obrien <obrien@FreeBSD.org> Re-apply rev 1.178 -- style(9) the structure definitions.
I have to wonder how many other changes were lost in the KSE mildstone 2 merge.
5596676e6c6c1e81e899cd0531f9b1c28a292669 12-Sep-2001 julian <julian@FreeBSD.org> KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha
a179ee09ab9ca2d9d1d09dc4752c53a13609f5e9 24-May-2001 dillon <dillon@FreeBSD.org> This patch implements O_DIRECT about 80% of the way. It takes a patchset
Tor created a while ago, removes the raw I/O piece (that has cache coherency
problems), and adds a buffer cache / VM freeing piece.

Essentially this patch causes O_DIRECT I/O to not be left in the cache, but
does not prevent it from going through the cache, hence the 80%. For
the last 20% we need a method by which the I/O can be issued directly to
buffer supplied by the user process and bypass the buffer cache entirely,
but still maintain cache coherency.

I also have the code working under -stable but the changes made to sys/file.h
may not be MFCable, so an MFC is not on the table yet.

Submitted by: tegge, dillon
11781a7431fab609cd00058a63ac09ccddb16854 15-Feb-2001 jlemon <jlemon@FreeBSD.org> Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter
961b97d43458f3c57241940cabebb3bedf7e4c00 26-May-2000 jake <jake@FreeBSD.org> Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others
d93fbc99166053b75c2eeb69b5cb603cfaf79ec0 23-May-2000 jake <jake@FreeBSD.org> Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd
1adaa9f038cbd99c89f9cf7c103251180bae45ff 15-May-2000 dillon <dillon@FreeBSD.org> Change f_count and f_msgcount from short to int, solving the rollover
problem demonstrated by the PR. MFC to follow.

PR: kern/18346
c41c876463ee8c302f06554537e0fb22a3fcdca4 16-Apr-2000 jlemon <jlemon@FreeBSD.org> Introduce kqueue() and kevent(), a kernel event notification facility.
b42951578188c5aab5c9f8cbcde4a743f8092cdc 02-Apr-2000 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'ALSA'.
057e33d02ca1d636be1b99e212ebb7911cf4fc62 02-Apr-2000 dillon <dillon@FreeBSD.org> Change the write-behind code to take more care when starting
async I/O's. The sequential read heuristic has been extended to
cover writes as well. We continue to call cluster_write() normally,
thus blocks in the file will still be reallocated for large (but still
random) I/O's, but I/O will only be initiated for truely sequential
writes.

This solves a number of annoying situations, especially with DBM (hash
method) writes, and also has the side effect of fixing a number of
(stupid) benchmarks.

Reviewed-by: mckusick
15b9bcb121e1f3735a2c98a11afdb52a03301d7e 29-Dec-1999 peter <peter@FreeBSD.org> Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
f004a3adc81b470b4c732b91c5dcbcac24dc398f 08-Nov-1999 peter <peter@FreeBSD.org> Create a fileops fo_stat() entry point. This will enable collection
of a bunch of duplicated code that breaks (read: panic) when a new
file type is added and some switch/case entries are missed.
140cb4ff83b0061eeba0756f708f3f7c117e76e5 19-Sep-1999 green <green@FreeBSD.org> This is what was "fdfix2.patch," a fix for fd sharing. It's pretty
far-reaching in fd-land, so you'll want to consult the code for
changes. The biggest change is that now, you don't use
fp->f_ops->fo_foo(fp, bar)
but instead
fo_foo(fp, bar),
which increments and decrements the fp refcount upon entry and exit.
Two new calls, fhold() and fdrop(), are provided. Each does what it
seems like it should, and if fdrop() brings the refcount to zero, the
fd is freed as well.

Thanks to peter ("to hell with it, it looks ok to me.") for his review.
Thanks to msmith for keeping me from putting locks everywhere :)

Reviewed by: peter
3b842d34e82312a8004a7ecd65ccdb837ef72ac1 28-Aug-1999 peter <peter@FreeBSD.org> $Id$ -> $FreeBSD$
c03366a55d4ace981b016ae999ae67675c486cdd 04-Aug-1999 green <green@FreeBSD.org> Fix fd race conditions (during shared fd table usage.) Badfileops is
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.

This does not fix other problems mentioned in this PR than the first
one.

PR: 11629
Reviewed by: peter
f13dd5fa6d2de09245e582c2792228f2653fd2a8 04-Apr-1999 dt <dt@FreeBSD.org> Add standard padding argument to pread and pwrite syscall. That should make them
NetBSD compatible.

Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that
the offset is set in the struct uio).

Factor out some common code from read/pread/write/pwrite syscalls.
1d5f38ac2264102518a09c66a7b285f57e81e67e 07-Jun-1998 dfr <dfr@FreeBSD.org> This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
0506343883d62f6649f7bbaf1a436133cef6261d 11-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'jb'.
7c6e96080c4fb49bf912942804477d202a53396c 10-Jan-1998 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'JB'.
6c224bdd6e4fe8f6e79f250fa8e78e3f70ec5ed5 23-Nov-1997 bde <bde@FreeBSD.org> Fixed duplicate definitions of M_FILE (one static).
a29a9749e55aaa01b56952729af076163aa661ed 14-Sep-1997 peter <peter@FreeBSD.org> Update interfaces for poll()
0d3591bdbd3a026b25e6400f8cc2717489321ee5 23-Mar-1997 bde <bde@FreeBSD.org> Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.
94b6d727947e1242356988da003ea702d41a97de 22-Feb-1997 peter <peter@FreeBSD.org> Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
808a36ef658c1810327b5d329469bcf5dad24b28 14-Jan-1997 jkh <jkh@FreeBSD.org> Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
04a0ee9a2a23ea1925b6625e3268da60ccc7944b 29-Dec-1996 dyson <dyson@FreeBSD.org> This commit is the embodiment of some VFS read clustering improvements.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).

NOTE: THAT LKMS MUST BE REBUILT!!!
d8370232258e75a57361cafc2243862df35d18f1 19-Dec-1996 bde <bde@FreeBSD.org> Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.

The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).
51ff523803ffa72907ef0cb5d9f95e36ac2b1b73 03-Sep-1996 bde <bde@FreeBSD.org> Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.
73a498e93ef77f792f958b4a1ea0d9ad0490888a 11-Mar-1996 peter <peter@FreeBSD.org> Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[new sys/syscallargs.h file, to be "cvs rm"ed]
4ce83d9416d326466a861477103572379108d59d 11-Mar-1996 hsu <hsu@FreeBSD.org> Merge in Lite2: LIST replacement for f_filef, f_fileb, and filehead.
Did not accept change of second argument to ioctl from int to u_long.
Reviewed by: davidg & bde
894b801eee9d8be5ef1af76d3e74a844a61ff35d 28-Jan-1996 dyson <dyson@FreeBSD.org> Enable the new fast pipe code. The old pipes can be used with the
"OLD_PIPE" config option.
86f1bc4514fdcfd255f37f3218fe234bdc3664fc 05-Nov-1995 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'LINUX'.
289f11acb49b6dbb3081e09bf94a86f008f55814 16-Mar-1995 bde <bde@FreeBSD.org> Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.
2e14d9ebc3d3592c67bdf625af9ebe0dfc386653 14-Mar-1995 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'MATT_THOMAS'.
a5eaebecd08159cf578453b271353bd78121b55e 20-Feb-1995 guido <guido@FreeBSD.org> Implement maxprocperuid and maxfilesperproc. They are tunable
via sysctl(8). The initial value of maxprocperuid is maxproc-1,
that of maxfilesperproc is maxfiles (untill maxfile will disappear)

Now it is at least possible to prohibit one user opening maxfiles

-Guido

Submitted by:
Obtained from:
34cd81d75f398ee455e61969b118639dacbfd7a6 23-Sep-1994 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'MACKERRAS'.
1f5dfa1be240017a7a4d938578b9a01ce1082391 21-Aug-1994 paul <paul@FreeBSD.org> Made them all idempotent.
Reviewed by:
Submitted by:
e16baf7a5fe7ac1453381d0017ed1dcdeefbc995 07-Aug-1994 cvs2svn <cvs2svn@FreeBSD.org> This commit was manufactured by cvs2svn to create branch 'SUNRPC'.
8d205697aac53476badf354623abd4e1c7bc5aff 02-Aug-1994 dg <dg@FreeBSD.org> Added $Id$
8fb65ce818b3e3c6f165b583b910af24000768a5 24-May-1994 rgrimes <rgrimes@FreeBSD.org> BSD 4.4 Lite Kernel Sources