DevOps Barcelona 2024

Supercharged GitHub Actions monitoring with OpenTelemetry at Elastic by Victor E. Martínez Rubio

Ready to observe your GitHub Actions from a central repository? At Elastic, we implemented our custom OpenTelemetry Collector receiver to collect GitHub Actions logs and combine it with the existing traces receiver to observe all workflows in our GitHub organization. Learn about the challenges we encountered, how we solved them, and see how centralized logs, traces, and metrics empower the analysis and visualization of GitHub workflows.

At Elastic, we use GitHub Actions in multiple repositories for our CI/CD pipelines. However, we faced challenges with decentralized logs, which made troubleshooting issues that spanned multiple workflow runs or repositories difficult.

In this session, we explain how we centralized GitHub Actions telemetry using OpenTelemetry Collector and how it helped us improve our analysis and visualization of GitHub workflows.

Initially, we focused on scanning logs to detect security vulnerabilities and creating a unified platform for searching, analyzing, and visualizing logs, complete with custom alerts and notifications.

As our project progressed, we realized the broader advantages of centralized logs combined with traces and metrics, which we are going to explore with real-world examples.

We will examine how we handled spikes in log volume, navigated GitHub Actions API rate limits, and ensured data integrity while implementing the custom OpenTelemetry Collector receiver for GitHub Actions log collection.

We planned to use OpenTelemetry Collector as the primary log receiver and exporter. To ensure reliability, we intended to queue webhook events with a proxy service, which sends them to the collector at a controlled pace and retries failed requests.

We will discuss how to fine-tune the receiver for log volume efficiency and optimize the collector's reliability. Visualizations will showcase the impacts of various configuration changes on performance, and we will explain why we did not implement the proxy service.

Finally, we will share real-world examples of how centralized logs, traces, and metrics have empowered our analysis and visualization capabilities by showcasing how we leveraged detection rules to find leaked secrets and sensitive information in logs, making identifying and remediating security vulnerabilities easier. showing how we used traces to identify bottlenecks and the most failing runs to optimize our workflows, demonstrating how centralized logs helped us identify the frequency of flaky commands and prioritize optimization and troubleshooting efforts, sharing how we crafted informational dashboards using the provided traces and metrics to help us find optimization opportunities.

Talk Questions

Questions moderation

All questions have to comply with our Code of conduct. So if you don't see your question right after sending it it's because either it has not been moderated yet or it's a question that does not comply with our CoC.

Question 766
Do developer prefer to look at logs in Github Actions or in Elastic?
Question 769
How do you enforce the usage of opentelemetry in every workflow in Github Action at organization level?
Question 770
What is the log retention period in Elastic? Considering the high cost of Elastic, why not rely on GitHub’s UI for viewing logs instead?
Question 765
what open source tool would you recommend to monitor OT logs/traces? grafana? kibana?
Question 760
We are creating metrics in data dog manually, with open telemetry, do we can create automated metrics for DD? Open telemetry has some extension to do it ?
Question 761
The dropped logs? How do you know what is happening?
Question 768
Can we get these metrics from GitHub SaaS or we need our own workers?
Question 759
You said traces measure time of action and metrics are numeric values in example of CPU consumption etc. I use metrics to collect process duration, is it a misuse?
Question 767
How much effort do DevOps/SRE teams need to invest in comparison to running a managed service like Data dog to setup OTEL collector?
Question 762
Can you add trigger of alarms depending on themetrics in your demo?
Question 763
Can we use bitbucket instead of github actions?
Question 764
Can CludWatch send info to the opentelemetry?
Question 771
Looks like you‘re working around a lot of inefficiencies in GitHub that are easily solvable with GitLab and something like secret detection isn‘t an afterthought, but avoided in the first place. You should give it a try!
Question 776
How could be a first approach building a first a platform.. removing ingress and add gateway api.. like we saw yesterday.. how coukd increase the complex in this approach?? Thanks

Address

Auditori AXA
Avinguda Diagonal, 547, 08029 Barcelona

Contacts

conference@devops.barcelona
sponsors@devops.barcelona

Link

About Us
Code Of Conduct

Supercharged GitHub Actions monitoring with OpenTelemetry at Elastic by Victor E. Martínez Rubio

Talk Questions

Questions moderation

Question 766 (12)

Question 769 (10)

Question 770 (8)

Question 765 (4)

Question 760 (3)

Question 761 (3)

Question 768 (3)

Question 759 (2)

Question 767 (2)

Question 762 (1)

Question 763 (1)

Question 764 (1)

Question 771 (0)

Question 776 (0)

Address

Contacts

Link

Question 766

Question 769

Question 770

Question 765

Question 760

Question 761

Question 768

Question 759

Question 767

Question 762

Question 763

Question 764

Question 771

Question 776