From 34481f278c3c9888356006c9eea4f592abc5e100 Mon Sep 17 00:00:00 2001 From: Michael Francis Date: Tue, 9 Jul 2024 14:15:16 -0400 Subject: [PATCH] initial commit --- Dockerfile | 8 +++ README.md | 57 ++++++++++++++++++++ description.md | 138 +++++++++++++++++++++++++++++++++++++++++++++++++ flake.lock | 61 ++++++++++++++++++++++ flake.nix | 34 ++++++++++++ go.mod | 19 +++++++ go.sum | 36 +++++++++++++ main.go | 88 +++++++++++++++++++++++++++++++ port-opener.py | 105 +++++++++++++++++++++++++++++++++++++ test.sh | 65 +++++++++++++++++++++++ 10 files changed, 611 insertions(+) create mode 100644 Dockerfile create mode 100644 README.md create mode 100644 description.md create mode 100644 flake.lock create mode 100644 flake.nix create mode 100644 go.mod create mode 100644 go.sum create mode 100644 main.go create mode 100755 port-opener.py create mode 100755 test.sh diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..ae5745b --- /dev/null +++ b/Dockerfile @@ -0,0 +1,8 @@ +FROM nixpkgs/nix-flakes:nixos-24.05 +WORKDIR /app +COPY . . + +# Pre-build the project to cache dependencies +RUN nix develop -c true + +ENTRYPOINT ["nix", "develop", "-c", "./test.sh"] diff --git a/README.md b/README.md new file mode 100644 index 0000000..5ad3050 --- /dev/null +++ b/README.md @@ -0,0 +1,57 @@ +## Testing + +Easiest way is to run `docker run --rm -it $(docker build -q .)` on a system with docker installed. +Second easiest way is to run `nix develop -c ./test.sh` on a system with nix installed. +Hardest way is to install `go` `prom2json` `prometheus node_exporter` `golang` `curl` and `python` + +The test output is kind of messy since I essentially copied the commands I ran into a shell script and added some colors, but I didn't hide the application output :) + +## Design / Rational + +I decided to time box this task to a work day since a time limit wasn't given, and I figured if this was a real problem in production getting something that works quickly would be more valuable then trying to make "the perfect code" the first pass. + +I decided to build the app as a binary that exports the consul metrics in the prometheus text format to standard out for two main reasons: + +1. Since the description mentions that the current system uses textfile, it seems like the pragmatic choice to build something that emulates the existing behavior vs something that to work in reality would require configuration of other systems. +2. By outputting to standard out vs the application writing the textfile, It maintains the unix philosophy of gluing simple tools together. For example, this tool could be orchestrated by a cronjob, systemd, kubenernetes etc, and the output can be easily viewed by a human or of course piped to the correct file + +I chose golang despite having limited experience with it because although I think I could have done a better job in rust (my language of choice these days) I knew that edgedb is mostly a golang/python shop and I wanted to demonstrate my willingness to learn and my adaptability to unfamiliar tools. + +I chose to use the go-netstat library because I wanted to be pragmatic. My original thought was to use the sock_diag system call directly, but after reading the documentation for it I figured if this was a real problem I faced on the job I'd rather use something that was tested and already handled various edge cases I hadn't thought of / encountered. + +Otherwise, there is nothing particularly interesting about the code, it configures the prometheus library with a gauge, configures the netstat library to scrape the open ports, then runs them through some simple filters. The function returns the count, sets the value in the prom registry then prints out the metrics to standard out using the prom text formatting library. + +I used nix/docker to run/build this project as I'm on a mac and the syscalls I need are somewhat specific to linux. I'm aware that consul supports OSX as well (in fact, I run consul on my mac github action build farm) however, again being pragmatic, I believe edgedb only runs consul on linux so I didn't attempt to make this script cross platform. This is a potential point of future improvement + +## Future Improvements + +- Everything + +Personally, I think the answer to this makes a great topic of discussion after the technical interview submission. I would highlight both in the tests and the code its self, there is no error handling or proper logging. If I were to do this for a production app, I'd definitely instrument the application its self to know if the output of it could be trusted (IE, what if it crashes and outputs garbage to the output) and to monitor for any performance issue (is the syscall + context switch faster/safer than reading via procfs?) + +I didn't attempt to test ipv6 support because (unfortunately) it's disabled on my own system, but I think it wouldn't take long to fix that. + +There are a few hardcoded values especially in the tests, it relies on the script creating 100 connections by default, and having 0 once it exits. As well, it assumes consul is running on port 8500, whereas it might be on a different port, or there maybe multiple instances of consul on the same box. Easy to fix, but omitted since I focused on getting it working first. + +## Overall feedback + +I enjoyed this exercise, it was fun to have a practical exercise that I could iterate on and see that it was working as I went along. + +I didn't end up changing the port-opener script at all, however I did notice and start to investigate that it seems to only handle `SIGINT` to cause it to exit AND it doesn't print any output until the loop exits. The latter made it difficult to debug why it would sometimes "hang" without printing any output, only to find out the server couldn't start on the defined port. + +The biggest problem I had was task control / distribution for this problem. Ideally, I would have delivered a VM with systemd, prometheus, node_exporter, the cronjob, and maybe a wrapper to start and stop the port opener with different values on some schedule. + +In practice doing this is difficult. In general because making assumptions about the runtime environment of the person reviewing this is hard, so something like packer/vagrant/ansible/kubernetes can't be assumed and they're all relatively heavyweight tools to ask someone to configure. + +I settled on using nix + docker, because I've heard there are a few nix fans on the team, and it was as close as I could get to controlling the reviewers environment without using docker hacks (like docker in docker, or running systemd inside docker) while still providing a 1 command way to run the tests while only requiring one (fairly common) dependency. + +### Thoughts for future candidates + +Having been on both sides of the table I can say interviewing is tough in general. For me personally, while I enjoyed the exercise, I would have appreciated more guidance on the expected result. + +On one hand as I of course want to get hired, I want(ed) impress the reviewer(s) by delivering a really polished solution using the tools I know best (rust/kubernetes/nix) however, I don't know what is weighted more heavily, delivering something that works in a short amount of time, or delivering something polished. I of course chose the former and I hope it works out for me, but I can imagine future candidates would appreciate a few words on what to prioritize. + +Depending on priorities, I'd do one of two things (or both even) + +1) Have a really structured exercise that can be tested automatically a la hackerrank or "deploy an instance of edgedb that's available to the internet with some test data" - this weeds out people who can't do the job at all, while being realistic and having a low review burden. +2) Make the question open ended/collaborative w/ one of the engineers. I have really liked the "given a shell, debug some issue" style questions. They're time boxed and practical, but with a higher burden on the team. diff --git a/description.md b/description.md new file mode 100644 index 0000000..303ad7e --- /dev/null +++ b/description.md @@ -0,0 +1,138 @@ +## Background + +This is based on a real-world problem we encountered early on in EdgeDB Cloud +(but to be clear, we have a working solution for this already in place, we are +not asking you to do any work on our actual cloud as part of this interview +process) + +## Problem overview + +We run Consul in our internal cloud infrastructure. + +Consul limits the number of open HTTP connections from clients, +[defaulting to 200](https://developer.hashicorp.com/consul/docs/agent/config/config-files#http_max_conns_per_client). + +Consul also [exposes telemetry](https://developer.hashicorp.com/consul/docs/agent/monitor/telemetry) +in standard Prometheus format. + +However, "current count of open HTTP connections" is not part of Consul's +telemetry, so we have no way of knowing how close we are to this 200-connection +limit. + +The goal here is to gather this count of open connections ourselves and send it +to our Prometheus metrics server, where we can graph it or alert on it alongside +the other metrics that Consul exposes natively. + +## Technical details + +- All of the connections that we care about are on TCP port 8500 (Consul's + primary service port). + +- All of the connections we currently have are using IPv4, but we try to leave + ourselves open to IPv6 compatibility. It's up to you whether you want to + support IPv6 or leave that as a future TODO. + +- We run the Prometheus node_exporter on all hosts that run Consul, and have + its [textfile collector](https://github.com/prometheus/node_exporter?tab=readme-ov-file#textfile-collector) + enabled with `--collector.textfile.directory=/tmp/node-exporter`. It's up to + you whether you want to write a full Prometheus metrics collector + implementation, or write to the node_exporter's textfile directory. + +- It's up to you how to collect the metric value itself - calling `netstat` and + looking at its output, as in the example below; implementing it yourself by + looking at files under /proc or /sys; using a 3rd-party library that exposes + the value, etc. + +- We've provided a test script in Python, but there is absolutely no requirement + that your solution be in Python. Use any language you feel appropriate. + +## Working example + +Setting up a full Consul installation to repro this problem would be non-trivial +and outside the scope of this interview question, so we have provided a simple +Python script that simulates the behavior, by opening a server on a specified +port, then opening a specified number of client connections to the server, and +holding them open until the script is killed. + +The script should run on any Python higher than 3.7 and uses the stdlib only +(does not require a virtualenv or `pip install` or anything similar). + +``` +$ ./port-opener.py -h +usage: port-opener.py [-h] [--port PORT] [--num-connections NUM_CONNECTIONS] [--ipv {4,6}] [--verbose] + +options: + -h, --help show this help message and exit + --port PORT, -p PORT port to listen and make connections on + --num-connections NUM_CONNECTIONS, -n NUM_CONNECTIONS + number of connections to open + --ipv {4,6} IP version (4 or 6) to use + --verbose, -v enable verbose logging +``` + +To see it in action: + +``` +$ ./port-opener.py +started server on 127.0.0.1:8500 +opened 200 client connections + +``` + +Then, in a separate terminal window: + +``` +$ netstat -tn +Active Internet connections (w/o servers) +Proto Recv-Q Send-Q Local Address Foreign Address State +... +tcp 0 0 127.0.0.1:8500 127.0.0.1:34980 ESTABLISHED +tcp 0 0 127.0.0.1:8500 127.0.0.1:35006 ESTABLISHED +... +``` + +With those 200 connections open, your solution to this problem should emit a +Prometheus metric that looks something like `consul_open_http_connections 200` +or `open_tcp_conns{port=8500} 200` or something similar. + +Note that if you run the port-opener script and then kill it, this will close +the connections, but for a few minutes afterwards netstat will still list them +in `TIME_WAIT` state: + +``` +> netstat -tn +Active Internet connections (w/o servers) +Proto Recv-Q Send-Q Local Address Foreign Address State +... +tcp 0 0 127.0.0.1:55036 127.0.0.1:8500 TIME_WAIT +tcp 0 0 127.0.0.1:54748 127.0.0.1:8500 TIME_WAIT +tcp 0 0 127.0.0.1:54690 127.0.0.1:8500 TIME_WAIT +... +``` + +We don't care about any `TIME_WAIT` connections, because from Consul's point of +view they don't count towards the 200-connection limit. + +## Your solution + +Send us: + +- Your code implementing the Prometheus metric collection + +- A readme with: + + - How to run your code, if there's any non-obvious steps + + - Any design decisions or tradeoffs you made + + - Any test files or scripts you wrote, or modifications to our port-opener + script + +- This take-home question is also new-ish for us, so we would also appreciate + any feedback you have about: + + - Any challenges you had understanding our description of the problem, or + getting our test script to run, etc + + - Any feedback you have on this question that we can use to improve it for + other candidates diff --git a/flake.lock b/flake.lock new file mode 100644 index 0000000..e388d21 --- /dev/null +++ b/flake.lock @@ -0,0 +1,61 @@ +{ + "nodes": { + "flake-utils": { + "inputs": { + "systems": "systems" + }, + "locked": { + "lastModified": 1710146030, + "narHash": "sha256-SZ5L6eA7HJ/nmkzGG7/ISclqe6oZdOZTNoesiInkXPQ=", + "owner": "numtide", + "repo": "flake-utils", + "rev": "b1d9ab70662946ef0850d488da1c9019f3a9752a", + "type": "github" + }, + "original": { + "owner": "numtide", + "repo": "flake-utils", + "type": "github" + } + }, + "nixpkgs": { + "locked": { + "lastModified": 1720418205, + "narHash": "sha256-cPJoFPXU44GlhWg4pUk9oUPqurPlCFZ11ZQPk21GTPU=", + "owner": "nixos", + "repo": "nixpkgs", + "rev": "655a58a72a6601292512670343087c2d75d859c1", + "type": "github" + }, + "original": { + "owner": "nixos", + "ref": "nixos-unstable", + "repo": "nixpkgs", + "type": "github" + } + }, + "root": { + "inputs": { + "flake-utils": "flake-utils", + "nixpkgs": "nixpkgs" + } + }, + "systems": { + "locked": { + "lastModified": 1681028828, + "narHash": "sha256-Vy1rq5AaRuLzOxct8nz4T6wlgyUR7zLU309k9mBC768=", + "owner": "nix-systems", + "repo": "default", + "rev": "da67096a3b9bf56a91d16901293e51ba5b49a27e", + "type": "github" + }, + "original": { + "owner": "nix-systems", + "repo": "default", + "type": "github" + } + } + }, + "root": "root", + "version": 7 +} diff --git a/flake.nix b/flake.nix new file mode 100644 index 0000000..9f8abc6 --- /dev/null +++ b/flake.nix @@ -0,0 +1,34 @@ +{ + description = "A very basic flake"; + + inputs = { + nixpkgs.url = "github:nixos/nixpkgs?ref=nixos-unstable"; + flake-utils.url = "github:numtide/flake-utils"; + }; + + outputs = { self, nixpkgs, flake-utils }: + flake-utils.lib.eachDefaultSystem (system: + let + pkgs = nixpkgs.legacyPackages.${system}; + in rec { + port_exporter = pkgs.buildGoModule { + name = "port_exporter"; + src = ./.; + vendorHash = "sha256-pQaZCAVdlisMUuyyKAkudgHDthZ4o/JglWgo0DXn5t4="; + doCheck = false; + }; + devShell = pkgs.mkShell { + buildInputs = with pkgs; [ + python3 + prometheus-node-exporter + coreutils + curl + jq + prometheus.cli + prom2json + port_exporter + ]; + }; + } + ); +} diff --git a/go.mod b/go.mod new file mode 100644 index 0000000..8ec4b4d --- /dev/null +++ b/go.mod @@ -0,0 +1,19 @@ +module edude03/port_exporter + +go 1.22.3 + +require github.com/elastic/gosigar v0.14.3 + +require ( + github.com/beorn7/perks v1.0.1 // indirect + github.com/cespare/xxhash/v2 v2.2.0 // indirect + github.com/nberlee/go-netstat v0.1.2 // indirect + github.com/pkg/errors v0.9.1 // indirect + github.com/prometheus/client_golang v1.19.1 // indirect + github.com/prometheus/client_model v0.5.0 // indirect + github.com/prometheus/common v0.48.0 // indirect + github.com/prometheus/procfs v0.12.0 // indirect + github.com/sirupsen/logrus v1.9.3 // indirect + golang.org/x/sys v0.17.0 // indirect + google.golang.org/protobuf v1.33.0 // indirect +) diff --git a/go.sum b/go.sum new file mode 100644 index 0000000..16fd2f7 --- /dev/null +++ b/go.sum @@ -0,0 +1,36 @@ +github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM= +github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw= +github.com/cespare/xxhash/v2 v2.2.0 h1:DC2CZ1Ep5Y4k3ZQ899DldepgrayRUGE6BBZ/cd9Cj44= +github.com/cespare/xxhash/v2 v2.2.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs= +github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/elastic/gosigar v0.14.3 h1:xwkKwPia+hSfg9GqrCUKYdId102m9qTJIIr7egmK/uo= +github.com/elastic/gosigar v0.14.3/go.mod h1:iXRIGg2tLnu7LBdpqzyQfGDEidKCfWcCMS0WKyPWoMs= +github.com/nberlee/go-netstat v0.1.2 h1:wgPV1YOUo+kDFypqiKgfxMtnSs1Wb42c7ahI4qyEUJc= +github.com/nberlee/go-netstat v0.1.2/go.mod h1:GvDCRLsUKMRN1wULkt7tpnDmjSIE6YGf5zeVq+mBO64= +github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= +github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/prometheus/client_golang v1.19.1 h1:wZWJDwK+NameRJuPGDhlnFgx8e8HN3XHQeLaYJFJBOE= +github.com/prometheus/client_golang v1.19.1/go.mod h1:mP78NwGzrVks5S2H6ab8+ZZGJLZUq1hoULYBAYBw1Ho= +github.com/prometheus/client_model v0.5.0 h1:VQw1hfvPvk3Uv6Qf29VrPF32JB6rtbgI6cYPYQjL0Qw= +github.com/prometheus/client_model v0.5.0/go.mod h1:dTiFglRmd66nLR9Pv9f0mZi7B7fk5Pm3gvsjB5tr+kI= +github.com/prometheus/common v0.48.0 h1:QO8U2CdOzSn1BBsmXJXduaaW+dY/5QLjfB8svtSzKKE= +github.com/prometheus/common v0.48.0/go.mod h1:0/KsvlIEfPQCQ5I2iNSAWKPZziNCvRs5EC6ILDTlAPc= +github.com/prometheus/procfs v0.12.0 h1:jluTpSng7V9hY0O2R9DzzJHYb2xULk9VTR1V1R/k6Bo= +github.com/prometheus/procfs v0.12.0/go.mod h1:pcuDEFsWDnvcgNzo4EEweacyhjeA9Zk3cnaOZAZEfOo= +github.com/sirupsen/logrus v1.9.3 h1:dueUQJ1C2q9oE3F7wvmSGAaVtTmUizReu6fjN8uqzbQ= +github.com/sirupsen/logrus v1.9.3/go.mod h1:naHLuLoDiP4jHNo9R0sCBMtWGeIprob74mVsIT4qYEQ= +github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= +github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +golang.org/x/sys v0.0.0-20180810173357-98c5dad5d1a0/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= +golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8 h1:0A+M6Uqn+Eje4kHMK80dtF3JCXC4ykBgQG4Fe06QRhQ= +golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg= +golang.org/x/sys v0.17.0 h1:25cE3gD+tdBA7lp7QfhuV+rJiE9YXTcS3VG1SqssI/Y= +golang.org/x/sys v0.17.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= +google.golang.org/protobuf v1.33.0 h1:uNO2rsAINq/JlFpSdYEKIZ0uKD/R9cpdv0T+yoGwGmI= +google.golang.org/protobuf v1.33.0/go.mod h1:c6P6GXX6sHbq/GpV6MGZEdwhWPcYBgnhAHhKbcUYpos= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= +gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= diff --git a/main.go b/main.go new file mode 100644 index 0000000..0ce7c58 --- /dev/null +++ b/main.go @@ -0,0 +1,88 @@ +package main + +import ( + "context" + "flag" + "fmt" + "os" + + "github.com/nberlee/go-netstat/netstat" + "github.com/prometheus/client_golang/prometheus" + "github.com/prometheus/common/expfmt" +) + +var ( + maybe_port = flag.Uint("port", 8500, "Port to monitor") +) + +func main() { + flag.Parse() + + // Check port number before attempting to cast it to a uint16 + if *maybe_port < 1 || *maybe_port > 65535 { + fmt.Fprintln(os.Stderr, "Invalid port number") + os.Exit(1) + } + + port := uint16(*maybe_port) + + reg := prometheus.NewRegistry() + + open_consul_connections := prometheus.NewGauge(prometheus.GaugeOpts{ + Subsystem: "consul", + Name: "open_tcp_connections", + Help: "Number of open consul tcp connections", + ConstLabels: prometheus.Labels{ + "port": fmt.Sprintf("%d", port), + }, + }) + + reg.MustRegister(open_consul_connections) + + open_consul_connections.Set(float64(ConsulConnectionCount(port))) + + metricFamilies, err := reg.Gather() + if err != nil { + panic(err) + } + + for _, mf := range metricFamilies { + expfmt.MetricFamilyToText(os.Stdout, mf) + } + +} + +// Filters +func PortFilter(port uint16) func(s *netstat.SockTabEntry) bool { + return func(s *netstat.SockTabEntry) bool { + return s.LocalEndpoint.Port == port + } +} + +func StateFilter(s *netstat.SockTabEntry) bool { + return s.State == netstat.Established +} + +func ConsulConnectionCount(port uint16) int { + ctx := context.Background() + features := netstat.EnableFeatures{ + TCP: true, + TCP6: true, + UDP: true, + UDP6: true, + PID: true, + } + + targetFilterFn := PortFilter(port) + + resp, err := netstat.Netstat(ctx, features, func(ste *netstat.SockTabEntry) bool { + return targetFilterFn(ste) && StateFilter(ste) + }) + + if err != nil { + fmt.Fprintf(os.Stderr, "Error: %v\n", err) + return 0 + } + + return len(resp) +} diff --git a/port-opener.py b/port-opener.py new file mode 100755 index 0000000..c3b8ea8 --- /dev/null +++ b/port-opener.py @@ -0,0 +1,105 @@ +#!/usr/bin/env python3 + +import argparse +import asyncio +import logging +import random +import socket +import signal +import functools + +async def main(): + parser = argparse.ArgumentParser() + parser.add_argument('--port', '-p', type=int, default=8500, help="port to listen and make connections on") + parser.add_argument('--num-connections', '-n', type=int, default=100, help="number of connections to open") + parser.add_argument('--ipv', type=int, default=4, choices=[4, 6], help="IP version (4 or 6) to use") + parser.add_argument('--verbose', '-v', action='store_true', help="enable verbose logging") + args = parser.parse_args() + + level = logging.DEBUG if args.verbose else logging.INFO + logging.basicConfig(format="%(message)s", level=level) + + if args.ipv == 4: + host = "127.0.0.1" + family = socket.AF_INET + elif args.ipv == 6: + host = "::1" + family = socket.AF_INET6 + else: + raise ValueError(args.ipv) + + loop = asyncio.get_running_loop() + tasks = [] + + started_event = asyncio.Event() + server_task = asyncio.create_task(start_server(loop, host, args.port, family, started_event)) + tasks.append(server_task) + + await started_event.wait() + logging.info(f"started server on {host}:{args.port}") + + for index in range(args.num_connections): + client_task = asyncio.create_task(start_client(loop, index, host, args.port, family)) + tasks.append(client_task) + + logging.info(f"opened {args.num_connections} client connections") + + try: + await asyncio.gather(*tasks) + except asyncio.exceptions.CancelledError: + logging.info("exiting") + return + + +async def start_server(loop, host, port, family, started_event): + server = await loop.create_server( + lambda: ServerProtocol(), + host=host, port=port, family=family) + + started_event.set() + + async with server: + await server.serve_forever() + + +class ServerProtocol(asyncio.Protocol): + def connection_made(self, transport): + self.transport = transport + peer = transport.get_extra_info("peername") + logging.debug(f"server received connection: {peer}") + + def data_received(self, data): + message = data.decode() + logging.debug(f"server received message: {message}") + self.transport.write(b"pong") + + +async def start_client(loop, client_idx, host, port, family): + transport, protocol = await loop.create_connection( + lambda: ClientProtocol(client_idx), + host=host, port=port, family=family) + + message = f"ping {client_idx}".encode() + try: + while True: + transport.write(message) + delay = 10 + (random.random() * 10) + await asyncio.sleep(delay) + finally: + transport.close() + + +class ClientProtocol(asyncio.Protocol): + def __init__(self, client_idx): + self.client_idx = client_idx + + def connection_made(self, transport): + peer = transport.get_extra_info("peername") + logging.debug(f"client {self.client_idx} made connection: {peer}") + + def connection_lost(self, exc): + logging.debug(f"client {self.client_idx} closed connection: {exc}") + + +if __name__ == "__main__": + asyncio.run(main()) diff --git a/test.sh b/test.sh new file mode 100755 index 0000000..c9e9295 --- /dev/null +++ b/test.sh @@ -0,0 +1,65 @@ +#! /usr/bin/env bash + +set -euo pipefail +set -m # Enable job control +shopt -u huponexit + +RED="\e[31m" +GREEN="\e[32m" +MAGENTA="\e[35m" +ENDCOLOR="\e[0m" + +open_connections() { + METRICS=$(curl -s localhost:9100/metrics | prom2json) + VALUE=$(echo $METRICS | jq -r '.[] | select(.name == "consul_open_tcp_connections") | .metrics[] | select(.labels.port = "8500") | .value') + echo $VALUE +} + + +TMP_DIR=$(mktemp -d) +PIDS=() + +# Start the port-opener, and save the PID +python port-opener.py --verbose & +OPENER_PID=$! +PIDS+=($OPENER_PID) + +# Wait for the port opener to start +sleep 1 + +# Start the node exporter +node_exporter --collector.textfile.directory=$TMP_DIR & +NODE_EXPORTER_PID=$! +PIDS+=($NODE_EXPORTER_PID) + +# Run the port exporter +port_exporter > $TMP_DIR/consul.prom + +sleep 1 + +echo -e "${MAGENTA}Testing with 100 connections${ENDCOLOR}" +VALUE=$(open_connections) +if [[ $VALUE -ne 100 ]]; then + echo -e "${RED}Expected 100 connections, got $VALUE ${ENDCOLOR}" + exit 1 +else + echo -e "${GREEN}Test passed ${ENDCOLOR}" +fi + +kill -s SIGINT $OPENER_PID + +sleep 2 + +port_exporter > $TMP_DIR/consul.prom + +echo -e "${MAGENTA} Testing with 0 connections ${ENDCOLOR}" + +VALUE=$(open_connections) +if [[ $VALUE -ne 0 ]]; then + echo -e "${RED} Expected 0 connections, got $VALUE ${ENDCOLOR}" + exit 1 +else + echo -e "${GREEN} Test passed ${ENDCOLOR}" +fi + +kill $NODE_EXPORTER_PID