Episode
Telemetry & Observability for Elixir Apps at Cars.com with Zack Kayser & Ethan Gunderson
- Podcast
- Elixir Wizards
- Published
- Dec 12, 2024
- Duration seconds
- 2559
- Processing state
processed
Actions
POST https://stenobird.com/v1/public/podcasts/elixir-wizards/episodes/telemetry-observability-for-elixir-apps-at-cars-com-with-zack-kayser-ethan-gunderson/transcription-requests
Idempotently request low-priority transcript generation for this episode.GET https://stenobird.com/podcast/elixir-wizards/telemetry-observability-for-elixir-apps-at-cars-com-with-zack-kayser-ethan-gunderson.md
Read the agent-friendly Markdown representation of this episode resource.
Summary
Learn how to implement effective observability in high-traffic Elixir environments using Telemetry and OpenTelemetry. Engineers from Cars.com share practical strategies for managing large-scale system visibility and avoiding deployment-driven traffic spikes.
Topics
- Elixir
- Telemetry
- OpenTelemetry
- Observability
- Phoenix LiveView
- Distributed Tracing
- Microservices
- System Monitoring
Highlights
- Main idea: Observability should enable developers to ask unplanned questions of a system to diagnose incidents and prevent recurrence
- Practical takeaway: Use OpenTelemetry instrumentation libraries to easily add vendor-agnostic tracing and spans to your Elixir applications
- Failure mode: Relying on Phoenix LiveView's default auto-recovery during deployments can trigger massive, redundant downstream database or search engine queries
- Practical takeaway: Leverage the Elixir Telemetry ecosystem to hook into events from libraries like Oban without needing to modify their internal source code
- Trade-off: Balancing high-resolution data collection with the storage costs and performance overhead of high-volume telemetry spans
Chapters
1:00Introduction to Cars.com Scale: The guests discuss their experience transitioning from small-scale Elixir apps to managing high-throughput production environments at Cars.com.4:15The High-Stakes Switch: A look at the technical pressure and challenges of migrating traffic from legacy stacks to new Elixir-based infrastructure.7:20The Value of Contextual Tracing: Why simple log lines are insufficient for triaging incidents and how tracing allows you to follow a specific user's journey through downstream services.10:35Defining Observability Goals: Moving beyond simple incident diagnosis to using telemetry for proactive system understanding.13:50Managing Data Volume and Sampling: The challenges of handling massive amounts of telemetry data and the necessity of sampling strategies to manage costs.16:50LiveView and WebSocket Challenges: How Phoenix LiveView socket reconnections during deployments can create significant downstream load on services like Elasticsearch.23:30Scaling Instrumentation: Strategies for instrumenting large-scale applications and the importance of using standardized libraries like OpenTelemetry.39:20The Future of Elixir Telemetry: How the growing ecosystem of Telemetry-enabled libraries simplifies the burden of building custom observability tools.