DevOps Barcelona 2024

Drowning on Metrics or why Jack could have fit by Almudena Vivanco

How to properly size a service through performance testing and take those metrics into production. The key lies in Observability

In the last 5 years, the Lidl Plus product has grown from 2 stores in Zaragoza to 13,000 stores across Europe. From 100,000 users in 2018 to 90million in 2024. To carry out this titanic work in an organized and budget-friendly manner, emphasis was placed on two relevant points:

Monitoring and Observability

Performance Testing

Basic monitoring transitioned to a culture of Observability, which not only provided visibility into system metrics but also into the complete flow and user experience. When we talk about observability, we no longer talk about isolated systems but about understanding what happens as a whole.

Performance testing was highly relevant throughout the rollout period, inferring the volume that each country would bring based on the number of tickets coming in from the stores. Performance tests were conducted for each critical product, and end-to-end tests were constantly performed to measure the user experience of the Lidl Plus app.

We lacked real-time visibility from the application to the backend. Over the past 5 years, we have worked on that traceability to measure the "happiness" of our users, moving from tools like Firebase or Dynatrace to the current solution based on OpenTelemetry.

We will show the current stack and the ability to infer performance data for a product before going into production, validating workload hypotheses and feedback to improve tests once they are in production.

Talk Questions

Questions moderation

All questions have to comply with our Code of conduct. So if you don't see your question right after sending it it's because either it has not been moderated yet or it's a question that does not comply with our CoC.

Question 836
What tools do you recommend to run load tests? Specially free ones
Question 838
Did you thought about implementing some kind of queueing mechanism (e.g. "you will be served in x minutes") for clients to manage peak requests, like many other companies, during black Fridays?
Question 839
What tool do you use to make load testing? K6, locust?
Question 835
Did you consider using the card number to identify if there were repeated users on the same day?
Question 840
Do you apply some kind of special scaling policies or configurations to your infra before peak business periods?
Question 841
Which load testing tool do you prefer?
Question 837
Don't you apply "code freeze" csnpaigns times before critical time?
Question 842
Why not instead of dropping metrics storing a small amount %? For example 5% to check how things behave or have a small sample at least
Question 843
Do you use grafana cloud? Or on premise grafana?

Address

Auditori AXA
Avinguda Diagonal, 547, 08029 Barcelona

Contacts

conference@devops.barcelona
sponsors@devops.barcelona

Link

About Us
Code Of Conduct

Drowning on Metrics or why Jack could have fit by Almudena Vivanco

Talk Questions

Questions moderation

Question 836 (5)

Question 838 (2)

Question 839 (2)

Question 835 (1)

Question 840 (1)

Question 841 (1)

Question 837 (0)

Question 842 (0)

Question 843 (0)

Address

Contacts

Link

Question 836

Question 838

Question 839

Question 835

Question 840

Question 841

Question 837

Question 842

Question 843