App Skeleton Tips - Milan Mitić

Over the last five years, I have managed multiple teams working on distributed systems, predominantly in go, javascript, and .net. Teams are tasked with building out many new services. This poses some interesting challenges.

From a management perspective, the ability to reallocate engineers across teams without friction is essential for agility. Onboarding engineers is made easier if services have some kind of common standard / contract. Especially so if the service catalogue is large. On the other hand, I want to allow engineers as much flexibility as possible to choose the best tools to use. In most cases, it is best to let the market decide (cathedral/bazaar).

Managers need to think very carefully about where they draw the “standardisation” line. As a result, this document starts with a small contract, and the rest of the document focuses on “best practices”. It focuses on application NFRs and developer Quality of Life (QoL), in roughly equal measure.

Maybe this won’t work for your ecosystem; your mileage may vary.

The Application Entrypoint

All apps contain the following three files. This acts as an engineer’s very first entry point into a service, so I recommend this as a contract across all services. This is the only contract in this document, the rest focuses on best practices.

README.md
- a human-readable entry point into the app
- keeps documentation close to code (no one wants to use Confluence)
- language-agnostic, source-controlled
- consider Mermaid diagrams (keep docs close to code)
- github can render Mermaid diagrams embedded in readmes (here’s one I prepared earlier)
Makefile
- a programmatic entry point into the app
- source-controlled
- language-agnostic
  - not everything will be in the app language (scripts, migrations, compiled dependencies)
  - what language-agnostic task runner will you use? something at the OS level?
  - `Make` has been around since 1976; it’s available in just about every Linux distribution out there. It’s also not particularly windows portable (which may be strategically useful, when you think about it)
.env.example
- an explicit declaration of all environment variables that the app requires to function
- programmatic, human-readable, source-controlled

The Local Developer Experience

The most important subject to me. A new engineer’s first impression, a seconded engineer’s onboarding speed. A poor developer experience scales linearly – O(n) – every engineering manager should sweat when they read that. Some high-level considerations:

do devs have short feedback loops
can devs deploy prod-like infra locally?
- if I hear “it works on *my* machine one more time…”
can devs incrementally upgrade their services
- can devs run multiple versions of javascript/postgres/etc locally
can devs profile apps with prod-like accuracy?
- wildly helpful if done right
- requires performance tests

The rest of this document digs deeper into these and other topics

App Infra

As above, can devs deploy prod-like infra locally? How do you spin up your application’s infrastructure for local development? For example:

db (postgres, etc)
messaging (kafka, zookeeper, kowl, etc)
containerisation/orchestration/mesh (docker, k8s, istio, helm, etc)
observability (ELK, grafana, etc)
cicd (jenkins, etc)

One option is to use docker-compose as a recipe. It’s ubiquitous, declarative, transparent, and source-controlled. You can also package common infra into a dedicated “dev” repo, or similar. Alternatively, the engineers I work with are currently looking into testcontainers – a good example of why *not* to standardise technologies.

Unit Tests

ensure a high coverage (>85% or >90% as a baseline)
a DI container will likely help
introduce a gate into your CI build pipeline that fails a build if coverage goes down
- this is a great way to reduce the chance of incremental deterioration in software quality
- this can be done by outputting a coverage report from your unit testing harness, and then performing a delta in your pipeline

Component Tests

Aka integration tests, aka acceptance tests. Basically small-scale integration tests where all but immediate dependencies are mocked. For example, the app, it’s database, and kafka are concrete, whereas any external APIs/etc are mocked. A lot of bugs are caught here and they are hard to unit test. E.g. did you remember to update your migrations?

you can treat component tests as unit test with no mocks
- i.e. you can use your existing testing harness if you don’t want to add moving parts
containerised dependencies
- containers can be executed inside your CI pipeline
- think about if two builds can run in parallel
- this trivial with test-containers
- it’s a little trickier with docker-compose, you need to understand dind, but I like how transparent it is (less magic under the hood than testcontainers)
- however, I suspect tools like testcontainers can help you to combine coverage numbers across unit & component tests, which would be cool
sanity load tests
- if you can easily spin up your app and it’s infra, what stops you from running a 30s sanity load test?

Jenkins

linting, formatting, unit testing
shared .editorconfing file (nuget package?)
publish coverage numbers to jenkins
track changes
static checks
- sonarqube
- config (sonar-project.properties)
- fail on critical severity
- image scanning
snyk/trivy
- fail on critical severity
component tests

Load Tests

k6
- great for API tests
kafka
- but what about asynchronous services (i.e. no API)?
- k6 can do “kafka” load tests, but these tests exercise the kafka cluster, not your app
- if you can create an ephemeral topic and an ephemeral consumer, it’s pretty easy to load it up with messages and then track the consumer lag
- once the lag hits zero, your load test is complete
history

Sitrus

can you automatically do ephemeral E2E (or big integration) tests?
sitrus
- sitrus can create an ephemeral k8s namepsace in the sbx cluster
  e.g. a `sit-123-rewards` namespace underneath the sbx namespace
- sitrus will then trigger each app to deploy itself
- this way sitrus doesn’t need to know about each app’s dependencies (postgres/couchbase/elastic/etc)
can also be used for DPT
- similar to above: if you can easily spin up your app and it’s infra, what stops you from running a load test?
- database clusters have been split => so some tests can run concurrently

Releasing

semantic versioning
use the Conventional Commit standard
use the “Merge Check” bitbucket plugin to enforce the standard (copy dropbears regex)
use the Semantic Release library to auto-generate a semver and release notes, and generate a jenkins release object
i.e. adding a “version” tag will lead to a semver being automagically generated and a jenkins release object

node dotenv, python venvs, golang vendoring, dockerised infra (e.g. a pg12 container, etc