Verified Commit b9c7e7e2 authored by micah's avatar micah 💬
Browse files

git subrepo pull float

subrepo:
  subdir:   "float"
  merged:   "2cbbc5b"
upstream:
  origin:   "https://git.autistici.org/ai3/float.git"
  branch:   "master"
  commit:   "2cbbc5b"
git-subrepo:
  version:  "0.4.1"
  origin:   "https://github.com/ingydotnet/git-subrepo"
  commit:   "a04d8c2"
parent 94c22c4a
......@@ -6,7 +6,7 @@
[subrepo]
remote = https://git.autistici.org/ai3/float.git
branch = master
commit = 61177d5be20b4f84efaa307e2de26b9bc225b091
parent = ed7b87e3a8bae8689e21021c6247ad067a848360
commit = 2cbbc5b2199d1a659e9ea177a7cbc5b59e8fbaef
parent = 94c22c4abe4a4277a48f0b71af23a70b173798b8
cmdver = 0.4.1
method = merge
......@@ -65,7 +65,7 @@ and in README files for individual Ansible roles:
### General Documentation
* [Quick start guide](docs/quickstart.md)
* [Reference](docs/reference.md)
* [Reference](docs/reference.md) ([PDF](docs/reference.pdf))
* [Testing](docs/testing.md)
# Requirements
......
......@@ -2,7 +2,7 @@ DOTS = $(wildcard *.dot)
SVGS = $(DOTS:%.dot=%.svg)
PNGS = $(DOTS:%.dot=%.png)
all: $(SVGS) $(PNGS)
all: $(SVGS) $(PNGS) reference.pdf
%.svg: %.dot
dot -Tsvg -o$@ $<
......@@ -10,3 +10,6 @@ all: $(SVGS) $(PNGS)
%.png: %.dot
dot -Tpng -o$@ $<
%.pdf: %.md
awk '/^# Services/,/EOF/ {print}' $< \
| pandoc -V 'title:Float Reference' --from=gfm --output=$@ --toc
......@@ -405,12 +405,14 @@ myservice:
- template:
src: myservice.conf.j2
dest: /etc/myservice.conf
group: docker-myservice
mode: 0640
```
*roles/myservice/templates/myservice.conf.j2*
```yaml
# just an example
# Just an example of an Ansible template, with no particular meaning.
domain={{ domain }}
```
......@@ -425,15 +427,65 @@ float can't automatically generate this association itself):
```
This takes advantage of the fact that float defines an Ansible group
for each service, which includes the hosts that the service instances
have been scheduled on.
for each service (with the same name as the service itself), which
includes the hosts that the service instances have been scheduled
on. **Note** that since Ansible 2.9, the group names will be
"normalized" according to the rules for Python identifiers,
i.e. dashes will be turned into underscores.
### On the Ansible requirement
Does the above mean you have to learn Ansible in order to use float?
Should you be concerned about investing effort into writing a
configuration for my service in yet another configuration management
system's language? The answer is *yes*, but to a very limited extent:
* You do need knowledge of how to set up an Ansible environment: the
role of `ansible.cfg`, how to structure `group_vars` etc. Writing a
dedicated configuration push system for float was surely an option,
but we preferred relying on a popular existing ecosystem for this,
both for convenience of implementation and also to allow a migration
path of co-existence for legacy systems. To counter-balance, float
tries to keep its usage of Ansible as limited as possible, to allow
eventual replacement.
* Most services will only need an extremely simple Ansible role to
generate the service configuration, normally a mix of *template* and
*copy* tasks, which are possibly the most basic functionality of any
configuration management system. This should guarantee a certain
*ease of portability* to other mechanisms, should one decide to
migrate away from float. Besides, it is a good sanity check: if your
service requires complicated setup steps, perhaps it might be
possible to move some of that complexity *inside* the service
containers.
To emphasize portability, it might be wise to adhere to the following
rules when writing Ansible roles:
* Try to use only *copy*, *file* and *template* tasks, rather than
complex Ansible modules;
* avoid using complex conditional logic or loops in your Ansible tasks
* keep the configuration "local" to the service: do not reference
other services except using the proper service discovery APIs (DNS),
do not try to look up configuration attributes for other services
(instead make those into global configuration variables);
* do not use facts from other hosts that need to be discovered (these
break if you are not using a fact cache when doing partial runs):
instead, define whatever host attributes you need, explicitly, in
the inventory;
More generally, the integration with Ansible as the underlying
configuration management engine is the "escape hatch" that allows the
implemention of setups that are not explicitly modeled by float
itself.
# Infrastructure Part 1: Base Layer
We can subdivide what is done by float in two separate sections:
operations and services done on each host, the so-called "base" layer
of infrastructure, and then the fundamental services that are part of
the "cluster-level" infrastructure (logging, monitoring,
operations and services affecting every host, the so-called "base"
layer of infrastructure, and then the fundamental services that are
part of the "cluster-level" infrastructure (logging, monitoring,
authentication, etc): the latter are part of float but run on the base
layer itself as proper services, with their own descriptions and
Ansible roles to configure them.
......@@ -923,7 +975,7 @@ delegated to the external automation.
## Authentication and Identity
The fkiat infrastructure provides a full AAA solution that is used by
The float infrastructure provides a full AAA solution that is used by
all the built-in services, and that can be easily integrated with your
services (or at least that would be the intention). It aims to
implement modern solutions, and support moderately complex scenarios,
......@@ -1184,11 +1236,8 @@ infrastructure:
* it is possible to separate short-term and long-term metrics storage
by using the *prometheus-lts* service to scrape the other Prometheus
instances and retain metrics long term. The Thanos layer will again
transparently support this configuration.
To enable long-term metrics storage, include
*services.prometheus-lts.yml* in your service definitions, and add the
corresponding *playbooks/prometheus-lts.yml* playbook to your own.
transparently support this configuration. See the *Scaling up the
monitoring infrastructure* section below for details.
Monitoring dashboards are provided by Grafana.
......@@ -1234,6 +1283,42 @@ alerting less noisy):
* `scope` should be one of *host* (for prober-based alerts),
*instance* (for all other targets), or *global*.
### Scaling up the monitoring infrastructure
Float upholds the philosophy that collecting lots and lots of metrics
is actually a good thing, because it enables post-facto diagnosis of
issues. However, even with relatively small numbers of services and
machines, the amount of timeseries data that needs to be stored will
grow very quickly.
Float allows you to split the monitoring data collection into two
logical "parts" (which themselves can consist of multiple identical
instances for redundancy purposes), let's call them *environments* to
avoid overloading the term *instance*:
* A *short-term* Prometheus environment that scrapes all the service
targets with high frequency, evaluates alerts, but has a short
retention time (hours / days, depending on storage
requirements). Storage requirements for this environment are
bounded, for a given set of services and targets.
* A *long-term* Prometheus environment that scrapes data from the
short-term environment, with a lower frequency, and discarding
high-cardinality metrics for which we have aggregates. The storage
requirement grows much more slowly over time than the short-term
environment. Float calls this service *prometheus-lts* (long-term
storage).
This effectively implements a two-tiered (high-resolution /
low-resolution) timeseries database, which is then reconciled
transparently when querying through the Thanos service layer.
To enable long-term metrics storage, include
*services.prometheus-lts.yml* in your service definitions, and add the
corresponding *playbooks/prometheus-lts.yml* playbook to your own.
You will also need to set *prometheus_tsdb_retention* and
*prometheus_lts_tsdb_retention* variables appropriately.
## Log Collection and Analysis
......@@ -1264,10 +1349,12 @@ added:
### Metric extraction
It is often useful to extract real-time metrics from logs, for
instance this is how we compute real-time HTTP statistics. Float runs
an instance of [mtail](https://github.com/google/mtail) on every host
to process the local logs and compute metrics based on them.
It is often useful to extract real-time metrics from logs, most often
when dealing with software that does not export its own metrics. An
example is NGINX, where logs are parsed in order to compute real-time
access metrics. Float runs an instance of
[mtail](https://github.com/google/mtail) on every host to process the
local logs and compute metrics based on them.
Custom rules can be added simply by dropping mtail programs in
*/etc/mtail*. This would generally be done by the relevant
......@@ -1277,14 +1364,33 @@ service-specific Ansible role.
Syslog logs received by the log-collector will be subject to further
processing in order to extract metadata fields that will be stored and
indexed.
The implementation uses
the
[mmnormalize](https://www.rsyslog.com/doc/v8-stable/configuration/modules/mmnormalize.html) rsyslog
module, which parses logs using
the [liblognorm](http://www.liblognorm.com/files/manual/index.html)
engine to extract metadata fields.
indexed. Metadata extracted from logs is useful for searching and
filtering, even though those cases are already well served by
full-text search (or *grep*), and most importantly for aggregation
purposes: these can be either used for visualizations (dashboards), or
for analytical queries, that would be difficult to answer using the
coarse view provided by monitoring metrics.
Perhaps it's best to make an example to better illustrate the relation
between metadata-annotated logs and monitoring metrics, especially
log-derived ones, which are obviously related being derived from the
same source. Let's consider the canonical example of the HTTP access
logs of a website which is having problems: the monitoring system can
tell which fraction of the incoming requests is returning, say, an
error 500, while properly annotated logs can answer more detailed
queries such as "the list of top 10 URLs that have returned an error
500 in the last day". The extremely large cardinality of the URL field
(which is user-controlled) makes it too impractical to use for
monitoring purposes, but the monitoring metric is cheap to compute and
easy to alert on in real-time, while the metadata-annotated logs
provide us with the (detailed, but more expensive to compute)
analytical view.
The implementation uses the
[mmnormalize](https://www.rsyslog.com/doc/v8-stable/configuration/modules/mmnormalize.html)
rsyslog module, which parses logs with the
[liblognorm](http://www.liblognorm.com/files/manual/index.html) engine
to extract metadata fields.
Liblognorm rulebase files are a bit verbose but relatively simple to
write. Rules can be manually tested using the *lognormalizer* utility,
......@@ -1354,7 +1460,12 @@ sane replication options.
# Configuration
Float is an Ansible plugin with its own configuration, that replaces
the native Ansible inventory configuration.
the native Ansible inventory configuration. You will still be running
Ansible (`ansible-playbook` or whatever frontend you prefer) in order
to apply your configuration to your production environment. Float only
provides its own roles and plugins, but it does not interfere with the
rest of the Ansible configuration: playbooks, host and group
variables, etc. which will have to be present for a functional setup.
The toolkit configuration is split into two parts, the *service
description metadata*, containing definitions of the known services,
......@@ -1363,7 +1474,8 @@ same information you would have in a normal Ansible inventory). A
number of global Ansible variables are also required to customize the
infrastructure for your application.
All files are YAML-encoded and should usually have a *.yml* extension.
All configuration files are YAML-encoded and should usually have a
*.yml* extension.
Float is controlled by a top-level configuration file, which you
should pass to the ansible command-line tool as the inventory with the
......@@ -1378,6 +1490,9 @@ credentials_dir: credentials/
plugin: float
```
This file **must** exist and it must contain at the very least the
"plugin: float" directive.
The attributes supported are:
`services_file` points at the location of the file containing the
......@@ -1695,6 +1810,9 @@ publicly exported (at least in the current implementation), which
unfortunately means that the service itself shouldn't be running on
*frontend* nodes.
`use_proxy_protocol`: When true, enable the HAProxy proxy protocol for
the service, to propagate the original client IP to the backends.
#### Other endpoints
Other endpoints are used when the service runs their own reverse
......@@ -1806,8 +1924,10 @@ attempt to restore it on new servers: the idea is that for sharded
datasets, the application layer is responsible for data management.
This attribute is false by default.
`owner`: For filesystem paths, the user that will own the files upon
restore.
`owner`, `group`, `mode`: For filesystem-backed datasets, float will
create the associated directory if it does not exist; these parameters
specify ownership and permissions. These permissions will also be
reset upon restore.
### Volumes
......@@ -2051,6 +2171,16 @@ associated with the alerts.
Prometheus instances (default 90d). Set it to a shorter value when
enabling long-term storage mode.
`prometheus_lts_tsdb_retention` controls the time horizon of the
long-term Prometheus instances (default 1 year), when they are
enabled.
`prometheus_scrape_interval` sets how often the primary Prometheus
instances should scrape their targets (default 10s).
`prometheus_lts_scrape_interval` sets how often the long-term
Prometheus instances should scrape the primary ones (default 1m).
`prometheus_external_targets` allows adding additional targets to
Prometheus beyond those that are described by the service metadata. It
is a list of entries with *name* and *targets* attributes, where
......
......@@ -10,7 +10,7 @@ with user-defined roles. They are roughly grouped into sections:
top of the *base* layer, i.e. within containers etc).
* *util* for internal roles that are included by other roles, either to
expose common functionality to user roles (geoip, mariadb instances),
or to handle Ansible-related logic shared by multiple roles.
expose common functionality to user roles (geoip), or to handle
Ansible-related logic shared by multiple roles.
# The containers configuration file specifies all of the available configuration
# command-line options/flags for container engine tools like Podman & Buildah,
# but in a TOML format that can be easily modified and versioned.
# Please refer to containers.conf(5) for details of all configuration options.
# Not all container engines implement all of the options.
# All of the options have hard coded defaults and these options will override
# the built in defaults. Users can then override these options via the command
# line. Container engines will read containers.conf files in up to three
# locations in the following order:
# 1. /usr/share/containers/containers.conf
# 2. /etc/containers/containers.conf
# 3. $HOME/.config/containers/containers.conf (Rootless containers ONLY)
# Items specified in the latter containers.conf, if they exist, override the
# previous containers.conf settings, or the default settings.
[containers]
# List of devices. Specified as
# "<device-on-host>:<device-on-container>:<permissions>", for example:
# "/dev/sdc:/dev/xvdc:rwm".
# If it is empty or commented out, only the default devices will be used
#
# devices = []
# List of volumes. Specified as
# "<directory-on-host>:<directory-in-container>:<options>", for example:
# "/db:/var/lib/db:ro".
# If it is empty or commented out, no volumes will be added
#
# volumes = []
# Used to change the name of the default AppArmor profile of container engine.
#
# apparmor_profile = "container-default"
# List of annotation. Specified as
# "key=value"
# If it is empty or commented out, no annotations will be added
#
# annotations = []
# Default way to to create a cgroup namespace for the container
# Options are:
# `private` Create private Cgroup Namespace for the container.
# `host` Share host Cgroup Namespace with the container.
#
# cgroupns = "private"
# Control container cgroup configuration
# Determines whether the container will create CGroups.
# Options are:
# `enabled` Enable cgroup support within container
# `disabled` Disable cgroup support, will inherit cgroups from parent
# `no-conmon` Container engine runs run without conmon
#
# cgroups = "enabled"
# List of default capabilities for containers. If it is empty or commented out,
# the default capabilities defined in the container engine will be added.
#
# default_capabilities = [
# "AUDIT_WRITE",
# "CHOWN",
# "DAC_OVERRIDE",
# "FOWNER",
# "FSETID",
# "KILL",
# "MKNOD",
# "NET_BIND_SERVICE",
# "NET_RAW",
# "SETGID",
# "SETPCAP",
# "SETUID",
# "SYS_CHROOT",
# ]
# A list of sysctls to be set in containers by default,
# specified as "name=value",
# for example:"net.ipv4.ping_group_range = 0 1000".
#
# default_sysctls = [
# "net.ipv4.ping_group_range=0 1000",
# ]
# A list of ulimits to be set in containers by default, specified as
# "<ulimit name>=<soft limit>:<hard limit>", for example:
# "nofile=1024:2048"
# See setrlimit(2) for a list of resource names.
# Any limit not specified here will be inherited from the process launching the
# container engine.
# Ulimits has limits for non privileged container engines.
#
# default_ulimits = [
# “nofile”=”1280:2560”,
# ]
# List of default DNS options to be added to /etc/resolv.conf inside of the container.
#
# dns_options = []
# List of default DNS search domains to be added to /etc/resolv.conf inside of the container.
#
# dns_searches = []
# Set default DNS servers.
# This option can be used to override the DNS configuration passed to the
# container. The special value “none” can be specified to disable creation of
# /etc/resolv.conf in the container.
# The /etc/resolv.conf file in the image will be used without changes.
#
# dns_servers = []
# Environment variable list for the conmon process; used for passing necessary
# environment variables to conmon or the runtime.
#
# env = [
# "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
# ]
# Pass all host environment variables into the container.
#
# env_host = false
# Path to OCI hooks directories for automatically executed hooks.
#
# hooks_dir = [
# “/usr/share/containers/oci/hooks.d”,
# ]
# Default proxy environment variables passed into the container.
# The environment variables passed in include:
# http_proxy, https_proxy, ftp_proxy, no_proxy, and the upper case versions of
# these. This option is needed when host system uses a proxy but container
# should not use proxy. Proxy environment variables specified for the container
# in any other way will override the values passed from the host.
#
# http_proxy = true
# Run an init inside the container that forwards signals and reaps processes.
#
# init = false
# Container init binary, if init=true, this is the init binary to be used for containers.
#
# init_path = "/usr/libexec/podman/catatonit"
# Default way to to create an IPC namespace (POSIX SysV IPC) for the container
# Options are:
# `private` Create private IPC Namespace for the container.
# `host` Share host IPC Namespace with the container.
#
# ipcns = "private"
# Flag tells container engine to whether to use container separation using
# MAC(SELinux)labeling or not.
# Flag is ignored on label disabled systems.
#
# label = true
# Logging driver for the container. Available options: k8s-file and journald.
#
# log_driver = "k8s-file"
# Maximum size allowed for the container log file. Negative numbers indicate
# that no size limit is imposed. If positive, it must be >= 8192 to match or
# exceed conmon's read buffer. The file is truncated and re-opened so the
# limit is never exceeded.
#
log_size_max = 65536
# Default way to to create a Network namespace for the container
# Options are:
# `private` Create private Network Namespace for the container.
# `host` Share host Network Namespace with the container.
# `none` Containers do not use the network
#
# netns = "private"
# Create /etc/hosts for the container. By default, container engine manage
# /etc/hosts, automatically adding the container's own IP address.
#
# no_hosts = false
# Maximum number of processes allowed in a container.
#
# pids_limit = 2048
# Default way to to create a PID namespace for the container
# Options are:
# `private` Create private PID Namespace for the container.
# `host` Share host PID Namespace with the container.
#
# pidns = "private"
# Path to the seccomp.json profile which is used as the default seccomp profile
# for the runtime.
#
# seccomp_profile = "/usr/share/containers/seccomp.json"
# Size of /dev/shm. Specified as <number><unit>.
# Unit is optional, values:
# b (bytes), k (kilobytes), m (megabytes), or g (gigabytes).
# If the unit is omitted, the system uses bytes.
#
# shm_size = "65536k"
# Default way to to create a UTS namespace for the container
# Options are:
# `private` Create private UTS Namespace for the container.
# `host` Share host UTS Namespace with the container.
#
# utsns = "private"
# Default way to to create a User namespace for the container
# Options are:
# `auto` Create unique User Namespace for the container.
# `host` Share host User Namespace with the container.
#
# userns = "host"
# Number of UIDs to allocate for the automatic container creation.
# UIDs are allocated from the “container” UIDs listed in
# /etc/subuid & /etc/subgid
#
# userns_size=65536
# The network table contains settings pertaining to the management of
# CNI plugins.
[network]
# Path to directory where CNI plugin binaries are located.
#
# cni_plugin_dirs = ["/usr/libexec/cni"]
# Path to the directory where CNI configuration files are located.
#
# network_config_dir = "/etc/cni/net.d/"
[engine]
# Cgroup management implementation used for the runtime.
# Valid options “systemd” or “cgroupfs”
#
# cgroup_manager = "systemd"
# Environment variables to pass into conmon
#
# conmon_env_vars = [
# "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
# ]
# Paths to look for the conmon container manager binary
#
# conmon_path = [
# "/usr/libexec/podman/conmon",
# "/usr/local/libexec/podman/conmon",
# "/usr/local/lib/podman/conmon",
# "/usr/bin/conmon",
# "/usr/sbin/conmon",
# "/usr/local/bin/conmon",
# "/usr/local/sbin/conmon"
# ]
# Specify the keys sequence used to detach a container.
# Format is a single character [a-Z] or a comma separated sequence of
# `ctrl-<value>`, where `<value>` is one of:
# `a-z`, `@`, `^`, `[`, `\`, `]`, `^` or `_`
#
# detach_keys = "ctrl-p,ctrl-q"
# Determines whether engine will reserve ports on the host when they are
# forwarded to containers. When enabled, when ports are forwarded to containers,
# ports are held open by as long as the container is running, ensuring that
# they cannot be reused by other programs on the host. However, this can cause
# significant memory usage if a container has many ports forwarded to it.
# Disabling this can save memory.
#
# enable_port_reservation = true
# Selects which logging mechanism to use for container engine events.
# Valid values are `journald`, `file` and `none`.
#
# events_logger = "journald"
# Default transport method for pulling and pushing for images
#
# image_default_transport = "docker://"
# Default command to run the infra container
#
# infra_command = "/pause"
# Infra (pause) container image name for pod infra containers. When running a
# pod, we start a `pause` process in a container to hold open the namespaces
# associated with the pod. This container does nothing other then sleep,
# reserving the pods resources for the lifetime of the pod.
#
# infra_image = "k8s.gcr.io/pause:3.2"
# Specify the locking mechanism to use; valid values are "shm" and "file".
# Change the default only if you are sure of what you are doing, in general