139 lines
5.0 KiB
Markdown
139 lines
5.0 KiB
Markdown
|
## Background
|
||
|
|
||
|
This is based on a real-world problem we encountered early on in EdgeDB Cloud
|
||
|
(but to be clear, we have a working solution for this already in place, we are
|
||
|
not asking you to do any work on our actual cloud as part of this interview
|
||
|
process)
|
||
|
|
||
|
## Problem overview
|
||
|
|
||
|
We run Consul in our internal cloud infrastructure.
|
||
|
|
||
|
Consul limits the number of open HTTP connections from clients,
|
||
|
[defaulting to 200](https://developer.hashicorp.com/consul/docs/agent/config/config-files#http_max_conns_per_client).
|
||
|
|
||
|
Consul also [exposes telemetry](https://developer.hashicorp.com/consul/docs/agent/monitor/telemetry)
|
||
|
in standard Prometheus format.
|
||
|
|
||
|
However, "current count of open HTTP connections" is not part of Consul's
|
||
|
telemetry, so we have no way of knowing how close we are to this 200-connection
|
||
|
limit.
|
||
|
|
||
|
The goal here is to gather this count of open connections ourselves and send it
|
||
|
to our Prometheus metrics server, where we can graph it or alert on it alongside
|
||
|
the other metrics that Consul exposes natively.
|
||
|
|
||
|
## Technical details
|
||
|
|
||
|
- All of the connections that we care about are on TCP port 8500 (Consul's
|
||
|
primary service port).
|
||
|
|
||
|
- All of the connections we currently have are using IPv4, but we try to leave
|
||
|
ourselves open to IPv6 compatibility. It's up to you whether you want to
|
||
|
support IPv6 or leave that as a future TODO.
|
||
|
|
||
|
- We run the Prometheus node_exporter on all hosts that run Consul, and have
|
||
|
its [textfile collector](https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector)
|
||
|
enabled with `--collector.textfile.directory=/tmp/node-exporter`. It's up to
|
||
|
you whether you want to write a full Prometheus metrics collector
|
||
|
implementation, or write to the node_exporter's textfile directory.
|
||
|
|
||
|
- It's up to you how to collect the metric value itself - calling `netstat` and
|
||
|
looking at its output, as in the example below; implementing it yourself by
|
||
|
looking at files under /proc or /sys; using a 3rd-party library that exposes
|
||
|
the value, etc.
|
||
|
|
||
|
- We've provided a test script in Python, but there is absolutely no requirement
|
||
|
that your solution be in Python. Use any language you feel appropriate.
|
||
|
|
||
|
## Working example
|
||
|
|
||
|
Setting up a full Consul installation to repro this problem would be non-trivial
|
||
|
and outside the scope of this interview question, so we have provided a simple
|
||
|
Python script that simulates the behavior, by opening a server on a specified
|
||
|
port, then opening a specified number of client connections to the server, and
|
||
|
holding them open until the script is killed.
|
||
|
|
||
|
The script should run on any Python higher than 3.7 and uses the stdlib only
|
||
|
(does not require a virtualenv or `pip install` or anything similar).
|
||
|
|
||
|
```
|
||
|
$ ./port-opener.py -h
|
||
|
usage: port-opener.py [-h] [--port PORT] [--num-connections NUM_CONNECTIONS] [--ipv {4,6}] [--verbose]
|
||
|
|
||
|
options:
|
||
|
-h, --help show this help message and exit
|
||
|
--port PORT, -p PORT port to listen and make connections on
|
||
|
--num-connections NUM_CONNECTIONS, -n NUM_CONNECTIONS
|
||
|
number of connections to open
|
||
|
--ipv {4,6} IP version (4 or 6) to use
|
||
|
--verbose, -v enable verbose logging
|
||
|
```
|
||
|
|
||
|
To see it in action:
|
||
|
|
||
|
```
|
||
|
$ ./port-opener.py
|
||
|
started server on 127.0.0.1:8500
|
||
|
opened 200 client connections
|
||
|
|
||
|
```
|
||
|
|
||
|
Then, in a separate terminal window:
|
||
|
|
||
|
```
|
||
|
$ netstat -tn
|
||
|
Active Internet connections (w/o servers)
|
||
|
Proto Recv-Q Send-Q Local Address Foreign Address State
|
||
|
...
|
||
|
tcp 0 0 127.0.0.1:8500 127.0.0.1:34980 ESTABLISHED
|
||
|
tcp 0 0 127.0.0.1:8500 127.0.0.1:35006 ESTABLISHED
|
||
|
...
|
||
|
```
|
||
|
|
||
|
With those 200 connections open, your solution to this problem should emit a
|
||
|
Prometheus metric that looks something like `consul_open_http_connections 200`
|
||
|
or `open_tcp_conns{port=8500} 200` or something similar.
|
||
|
|
||
|
Note that if you run the port-opener script and then kill it, this will close
|
||
|
the connections, but for a few minutes afterwards netstat will still list them
|
||
|
in `TIME_WAIT` state:
|
||
|
|
||
|
```
|
||
|
> netstat -tn
|
||
|
Active Internet connections (w/o servers)
|
||
|
Proto Recv-Q Send-Q Local Address Foreign Address State
|
||
|
...
|
||
|
tcp 0 0 127.0.0.1:55036 127.0.0.1:8500 TIME_WAIT
|
||
|
tcp 0 0 127.0.0.1:54748 127.0.0.1:8500 TIME_WAIT
|
||
|
tcp 0 0 127.0.0.1:54690 127.0.0.1:8500 TIME_WAIT
|
||
|
...
|
||
|
```
|
||
|
|
||
|
We don't care about any `TIME_WAIT` connections, because from Consul's point of
|
||
|
view they don't count towards the 200-connection limit.
|
||
|
|
||
|
## Your solution
|
||
|
|
||
|
Send us:
|
||
|
|
||
|
- Your code implementing the Prometheus metric collection
|
||
|
|
||
|
- A readme with:
|
||
|
|
||
|
- How to run your code, if there's any non-obvious steps
|
||
|
|
||
|
- Any design decisions or tradeoffs you made
|
||
|
|
||
|
- Any test files or scripts you wrote, or modifications to our port-opener
|
||
|
script
|
||
|
|
||
|
- This take-home question is also new-ish for us, so we would also appreciate
|
||
|
any feedback you have about:
|
||
|
|
||
|
- Any challenges you had understanding our description of the problem, or
|
||
|
getting our test script to run, etc
|
||
|
|
||
|
- Any feedback you have on this question that we can use to improve it for
|
||
|
other candidates
|