get-into-teaching-app

Monitoring

Logs

We use logit.io to host a Kibana instance for our logs. The logs persist for 14 days and contain logs for all our production and test instances. You can filter to a specific instance using the cf.app field.

Metrics

We use Prometheus to collect our metrics into an InfluxDB instance. The metrics are presented using Grafana. All the configuration/infrastructure is currently configured in the GiT API terraform files. The metrics are advertised on the /metrics endpoint of the application.

Note that if you change the Grafana dashboard it will not persist and you need to instead export the dashboard and updated it in the GitHub repository. These are re-applied on API deployment.

Alerts

We use Prometheus Alert Manager to notify us when something has gone wrong. It will post to the relevant Slack channel and contain a link to the appropriate Grafana dashboard and/or runbook.

You can add/configure alerts in the GiT API repository.

All the runbooks are also hosted in the GiT API repository.

Error reporting

We use Sentry to capture application errors. They will be posted to the relvant Slack channel when they first occur.

External links are now tested using Lychee.

Install it using your package manager of choice or download the binary from GitHub and make it executable.

lychee --insecure --exclude-mail app/views/content

We run this on a nightly schedule in a GitHub Action and broken links are reported via Slack.