brily
Module 01 · Monitor

Uptime monitoring
that earns its pages.

Most uptime tools answer the easy question: is the homepage returning 200? Brily answers the one that matters. Is the thing your users pay you for actually working, right now, from where they live?

app.brily.app / projects / ship-os / monitorslive
/api/checkout · 30-day 99.98%
30d agonow
p50
188ms
p95
421ms
p99
812ms
Quorum · 2 of 3
FRA
ok
HEL
slow
AMS
ok
TLS expiry
64 days
cert healthy
Recent checkslast 60s
  • homepage
    FRA312ms
  • api/auth/login
    FRA188ms
  • api/checkout
    HEL421ms
  • webhook/stripe
    FRA1.2s
  • cron/digest
    anyheartbeat

Multi-region HTTP and HTTPS checks

Probes across three continents run your checks on the interval you pick. An incident fires only when multiple regions agree. Single-probe blips are discarded before they wake anyone.

Response assertions that mean something

A 200 status is the weakest possible signal. Assert on status codes, headers, JSON body content, and latency percentiles. A page rendering 'Service unavailable' at HTTP 200 still triggers an alert.

TLS expiry and certificate health

We track certificate expiry, chain validity, and SAN coverage. Warn at 30 days, escalate at 7. No more 2am TLS pages.

Heartbeat monitoring for jobs

Point your cron jobs, scheduled workers, or backup scripts at a heartbeat URL. We page you when they stop checking in, not when something logs a warning.

Alert routing that respects on-call

Route by severity, time of day, and escalation window. Daytime latency elevations go to Slack. 3am full outages page the on-call engineer. P3 issues wait until morning.

We run Brily on Brily

Our status page, our alerts, and the SLA we promise you: all delivered by the same product you sign up for. Our on-call drill finds our bugs first.

From alert to public incident

One click. Not three tools.

A monitor trips. Slack gets the context: which probes failed, what the last error was, a link to the runbook. You decide it is a real incident. One click promotes the alert to your public status page with the affected component pre-filled and subscribers notified.

slack · #ops-alerts02:47 UTC
P1Brily · now
/api/checkout failing

2 of 3 probes failing · 503 Service Unavailable for 3m 12s

checkoutship-os
Promote to incident →
UPDATEBrily · now
Incident drafted on status page

Component: Checkout · State: Investigating · Subscribers: 1,284 notified

status.ship-os.com
View on public page →

What counts as a monitor

One monitor is one check against one endpoint: a homepage, an API route, a webhook URL, a checkout step. You configure the interval (30 seconds on Team and above, 5 minutes on Free), the assertions it runs, and the probe regions.

A monitor is not a project. A project groups monitors that share a status page, a team, and a subscriber list. A typical SaaS product is one project with five to fifteen monitors.

How we keep alerts out of the noise

Every monitor runs from at least two probe regions on Pro and above. Before we declare an incident, the configurable quorum has to agree. The default is two of three regions failing inside the same window. A single probe losing network reachability never pages anyone.

Integrations in the box

  • Slack. Per-channel routing by severity or project.
  • Telegram. Bot-based alerts to DMs, groups, or channels with inline action buttons.
  • WhatsApp. P1 delivery via WhatsApp Business Cloud API, opt-in compliance handled for you.
  • Webhook. Signed generic webhooks for PagerDuty, Opsgenie, or your own services.
  • Status page. One click turns an alert into a public incident.

Related reading

Replace your uptime tool.

Free plan covers 5 monitors at 5-minute intervals. No credit card, no trial countdown.