| description.md | ||
| Dockerfile | ||
| flake.lock | ||
| flake.nix | ||
| go.mod | ||
| go.sum | ||
| main.go | ||
| port-opener.py | ||
| README.md | ||
| test.sh | ||
Testing
Easiest way is to run docker run --rm -it $(docker build -q .) on a system with docker installed.
Second easiest way is to run nix develop -c ./test.sh on a system with nix installed.
Hardest way is to install go prom2json prometheus node_exporter golang curl and python
The test output is kind of messy since I essentially copied the commands I ran into a shell script and added some colors, but I didn't hide the application output :)
Design / Rational
I decided to time box this task to a work day since a time limit wasn't given, and I figured if this was a real problem in production getting something that works quickly would be more valuable then trying to make "the perfect code" the first pass.
I decided to build the app as a binary that exports the consul metrics in the prometheus text format to standard out for two main reasons:
- Since the description mentions that the current system uses textfile, it seems like the pragmatic choice to build something that emulates the existing behavior vs something that to work in reality would require configuration of other systems.
- By outputting to standard out vs the application writing the textfile, It maintains the unix philosophy of gluing simple tools together. For example, this tool could be orchestrated by a cronjob, systemd, kubenernetes etc, and the output can be easily viewed by a human or of course piped to the correct file
I chose golang despite having limited experience with it because although I think I could have done a better job in rust (my language of choice these days) I knew that edgedb is mostly a golang/python shop and I wanted to demonstrate my willingness to learn and my adaptability to unfamiliar tools.
I chose to use the go-netstat library because I wanted to be pragmatic. My original thought was to use the sock_diag system call directly, but after reading the documentation for it I figured if this was a real problem I faced on the job I'd rather use something that was tested and already handled various edge cases I hadn't thought of / encountered.
Otherwise, there is nothing particularly interesting about the code, it configures the prometheus library with a gauge, configures the netstat library to scrape the open ports, then runs them through some simple filters. The function returns the count, sets the value in the prom registry then prints out the metrics to standard out using the prom text formatting library.
I used nix/docker to run/build this project as I'm on a mac and the syscalls I need are somewhat specific to linux. I'm aware that consul supports OSX as well (in fact, I run consul on my mac github action build farm) however, again being pragmatic, I believe edgedb only runs consul on linux so I didn't attempt to make this script cross platform. This is a potential point of future improvement
Future Improvements
- Everything
Personally, I think the answer to this makes a great topic of discussion after the technical interview submission. I would highlight both in the tests and the code its self, there is no error handling or proper logging. If I were to do this for a production app, I'd definitely instrument the application its self to know if the output of it could be trusted (IE, what if it crashes and outputs garbage to the output) and to monitor for any performance issue (is the syscall + context switch faster/safer than reading via procfs?)
I didn't attempt to test ipv6 support because (unfortunately) it's disabled on my own system, but I think it wouldn't take long to fix that.
There are a few hardcoded values especially in the tests, it relies on the script creating 100 connections by default, and having 0 once it exits. As well, it assumes consul is running on port 8500, whereas it might be on a different port, or there maybe multiple instances of consul on the same box. Easy to fix, but omitted since I focused on getting it working first.
Overall feedback
I enjoyed this exercise, it was fun to have a practical exercise that I could iterate on and see that it was working as I went along.
I didn't end up changing the port-opener script at all, however I did notice and start to investigate that it seems to only handle SIGINT to cause it to exit AND it doesn't print any output until the loop exits. The latter made it difficult to debug why it would sometimes "hang" without printing any output, only to find out the server couldn't start on the defined port.
The biggest problem I had was task control / distribution for this problem. Ideally, I would have delivered a VM with systemd, prometheus, node_exporter, the cronjob, and maybe a wrapper to start and stop the port opener with different values on some schedule.
In practice doing this is difficult. In general because making assumptions about the runtime environment of the person reviewing this is hard, so something like packer/vagrant/ansible/kubernetes can't be assumed and they're all relatively heavyweight tools to ask someone to configure.
I settled on using nix + docker, because I've heard there are a few nix fans on the team, and it was as close as I could get to controlling the reviewers environment without using docker hacks (like docker in docker, or running systemd inside docker) while still providing a 1 command way to run the tests while only requiring one (fairly common) dependency.
Thoughts for future candidates
Having been on both sides of the table I can say interviewing is tough in general. For me personally, while I enjoyed the exercise, I would have appreciated more guidance on the expected result.
On one hand as I of course want to get hired, I want(ed) impress the reviewer(s) by delivering a really polished solution using the tools I know best (rust/kubernetes/nix) however, I don't know what is weighted more heavily, delivering something that works in a short amount of time, or delivering something polished. I of course chose the former and I hope it works out for me, but I can imagine future candidates would appreciate a few words on what to prioritize.
Depending on priorities, I'd do one of two things (or both even)
- Have a really structured exercise that can be tested automatically a la hackerrank or "deploy an instance of edgedb that's available to the internet with some test data" - this weeds out people who can't do the job at all, while being realistic and having a low review burden.
- Make the question open ended/collaborative w/ one of the engineers. I have really liked the "given a shell, debug some issue" style questions. They're time boxed and practical, but with a higher burden on the team.
