How to Monitor SIP Trunks: A Practical Guide for MSPs
If you manage VoIP for clients, "the phones are down" is the call you dread. This guide covers what to actually monitor on a SIP trunk, the failure modes you'll see in production, sensible alert thresholds, and why a single-location check will mislead you.
The problem with how most people monitor SIP
The default approach in every MSP shop is either:
- Pinging the PBX IP and calling it "monitored"
- Setting up an HTTP check against the PBX's web UI
- Both of the above + crossing fingers
None of those tell you whether your phone system actually works. The PBX IP can respond to ping while SIP registration is broken. The web UI can return HTTP 200 while the SIP listener has crashed. Real users discover the outage minutes (or hours) before your monitoring does.
The fix is to monitor SIP the same way a phone does: send a real REGISTER request and verify a 200 OK response. Do it from multiple geographic locations so you can distinguish a real outage from a routing issue.
What you should actually be monitoring
Five things, in priority order:
1. SIP REGISTER success
The most basic question: can a phone register against your PBX or carrier right now? A SIP REGISTER round-trip from a probe host with valid credentials will tell you. If it fails, your customers can't make or receive calls. Period.
2. SIP REGISTER from multiple regions
If your PBX is in Florida and your only probe is also in Florida, you have a blind spot for any routing issue between your customers' regions and your PBX. A "100% green" status while half your customers can't connect is a familiar disaster. Use probes in at least US-East, US-West, and EU.
3. Response time / registration latency
Slow REGISTER responses (200ms → 800ms over a week) are a leading indicator of carrier capacity issues. Track the trend. Most monitoring tools surface response time as a graph; alert on sustained increases, not spikes.
4. PBX TCP listener (5060/5061)
Sometimes SIP-level checks fail because TLS broke or the auth backend is down. A TCP-only check on port 5060 (UDP/TCP SIP) or 5061 (TLS SIP) tells you whether the basic socket is alive — useful when triaging "is the box up at all?"
5. Trunk-side checks (separately from PBX)
If you use a third-party SIP carrier (Bandwidth, VoIP Innovations, Twilio, etc.), monitor THEIR registrar separately. When customers complain about no inbound calls but outbound works, the trunk is the most common culprit, not the PBX.
Common SIP failure modes
From real production tickets — these are what your alerts will catch (or miss, if poorly configured):
Carrier IP allow-list mismatch
Your carrier rotated their SBC IPs. Your firewall ACL didn't update. Inbound calls drop, outbound still work. A SIP probe from outside your network catches this; an internal-only probe doesn't.
NAT keepalive timeout
PBX is behind NAT, registration succeeds, but the carrier's NAT mapping expires after 60s of no traffic. Calls coming in during the gap fail with "endpoint not registered." Fix: shorten the registration interval (most PBXs default to 3600s; drop to 60-120s for NAT'd setups). Detection: REGISTER works in tests but inbound calls drop randomly.
TLS certificate expired
You forced TLS for SIP, the cert expired, every modern SIP client refuses to connect. UDP-fallback clients still work. You see partial outage. Always monitor SIP-over-TLS in addition to UDP.
Asterisk/FreePBX module crash
A specific module (e.g. chan_sip or chan_pjsip) hangs while the rest of Asterisk runs. Web UI responds, SIP doesn't. Only a real REGISTER probe catches this.
Carrier maintenance window without notice
Even Tier 1 carriers do this. You'll see correlated failure across all extensions on the trunk while other carriers stay up. Multi-trunk customers benefit from monitoring each carrier separately so you can fail over (or at least know which carrier to call).
Setting alert thresholds without alert fatigue
SIP networks have transient blips. Alerting on every single failed check generates noise that gets ignored. Practical thresholds:
- Single check failure → log it, don't alert. Could be packet loss.
- 2 consecutive failures from any region → potentially open an incident, depending on sensitivity setting.
- 2+ regions failing simultaneously → high-confidence outage, alert immediately.
- All regions failing → critical, page somebody.
- Recovery → 1 successful check is usually enough to mark resolved (don't be conservative on recovery — false positives demoralize the team).
Where to test from
If you're rolling your own probes, host them on cheap VPS providers in geographically diverse regions:
- US-East: Hetzner Ashburn, OVH BHS, Vultr Atlanta
- US-West: Hetzner Hillsboro, Vultr Seattle, OVH Vint Hill
- EU: Hetzner Falkenstein, OVH Roubaix, Vultr Frankfurt
One thing to know: some carriers will rate-limit or outright block monitoring probe traffic. Authenticate with your real SIP credentials (not throwaways) so the carrier sees you as a registered endpoint, not a scanner. Set your User-Agent to something identifying ("MyCompany-StatusProbe/1.0").
If you don't want to roll your own
StatusCore is one of the few uptime monitoring platforms with native SIP support across UDP, TCP, and TLS. We probe from US-East, US-West, and EU every check interval, send real REGISTER requests with your credentials, and surface per-region status separately so you can distinguish carrier issues from PBX issues.
Setup takes about 5 minutes per trunk: enter the SIP server, port, username, password, and protocol. Multi-region check happens automatically. Free trial — no credit card.
Quick recap
- Ping isn't monitoring. HTTP isn't monitoring. Send a real SIP REGISTER.
- Use multiple geographic probes — single-location checks miss regional issues.
- Monitor TLS separately from UDP — the cert can expire while UDP keeps working.
- Track response-time trends; sustained slowdowns predict outages.
- Don't alert on single failures. Two consecutive, two regions failing = real signal.
Skip the DIY: SIP monitoring in 5 minutes
StatusCore probes your SIP trunks from US-E, US-W, and EU with real REGISTER requests. Free trial.
Start Your Free Trial →Related reading: VoIP / SIP Monitoring overview · All-in-one monitoring + SIEM · StatusCore vs Blumira
