Add an external mbuf buffer type that holds

multiple unmapped pages.

Unmapped mbufs allow sendfile to carry multiple pages of data in a
single mbuf, without mapping those pages.  It is a requirement for
Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web
serving workloads when used by sendfile, due to effectively
compressing socket buffers by an order of magnitude, and hence
reducing cache misses.

For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer
now points to a struct mbuf_ext_pgs structure instead of a data
buffer.  This structure contains an array of physical addresses (this
reduces cache misses compared to an earlier version that stored an
array of vm_page_t pointers).  It also stores additional fields needed
for in-kernel TLS such as the TLS header and trailer data that are
currently unused.  To more easily detect these mbufs, the M_NOMAP flag
is set in m_flags in addition to M_EXT.

Various functions like m_copydata() have been updated to safely access
packet contents (using uiomove_fromphys()), to make things like BPF
safe.

NIC drivers advertise support for unmapped mbufs on transmit via a new
IFCAP_NOMAP capability.  This capability can be toggled via the new
'nomap' and '-nomap' ifconfig(8) commands.  For NIC drivers that only
transmit packet contents via DMA and use bus_dma, adding the
capability to if_capabilities and if_capenable should be all that is
required.

If a NIC does not support unmapped mbufs, they are converted to a
chain of mapped mbufs (using sf_bufs to provide the mapping) in
ip_output or ip6_output.  If an unmapped mbuf requires software
checksums, it is also converted to a chain of mapped mbufs before
computing the checksum.

Submitted by:	gallatin (earlier version)
Reviewed by:	gallatin, hselasky, rrs
Discussed with:	ae, kp (firewalls)
Relnotes:	yes
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D20616
This commit is contained in:
jhb 2019-06-29 00:48:33 +00:00 committed by Sebastian Huber
parent d41e144869
commit 2f55e1fa06
1 changed files with 1 additions and 0 deletions

View File

@ -246,6 +246,7 @@ struct if_data {
#define IFCAP_HWSTATS 0x800000 /* manages counters internally */
#define IFCAP_TXRTLMT 0x1000000 /* hardware supports TX rate limiting */
#define IFCAP_HWRXTSTMP 0x2000000 /* hardware rx timestamping */
#define IFCAP_NOMAP 0x4000000 /* can TX unmapped mbufs */
#define IFCAP_HWCSUM_IPV6 (IFCAP_RXCSUM_IPV6 | IFCAP_TXCSUM_IPV6)