devops barcelona 2023 talks
Talks

Check our awesome lineup!

The State of SQL-Based Observability

by Ryadh Dahimene

Many successful paradigms in engineering and computer science are the result of two distinct approaches colliding with each other, leading to broader and more powerful applications. In this talk, we’ll look at the parallel backgrounds of two established paradigms: SQL and Observability.

We’ll be tracing back the history of both paradigms. How they managed to avoid each other despite SQL being the lingua franca of data manipulation, and how the industry standardization, fuelled by open-source innovation, has now propelled SQL back into the game as an observability language. We’ll also highlight case studies and benchmark results to provide the necessary elements for the attendee to answer a simple question: is Sql-based observability applicable to my use case? highlighting also the current limitations of this approach and leaving the conclusions for the attendees to draw.

More info in: https://clickhouse.com/blog/the-state-of-sql-based-observability

Ryadh Dahimene

Ryadh Dahimene

Integrations PM at ClickHouse

Integrations dude at ClickHouse, fascinated by data in all its forms, Ryadh enjoys working closely with users, enabling them to turn massive datasets into actionable insights.


Drowning on Metrics or why Jack could have fit

by Almudena Vivanco

How to properly size a service through performance testing and take those metrics into production. The key lies in Observability

In the last 5 years, the Lidl Plus product has grown from 2 stores in Zaragoza to 13,000 stores across Europe. From 100,000 users in 2018 to 90million in 2024. To carry out this titanic work in an organized and budget-friendly manner, emphasis was placed on two relevant points:

Monitoring and Observability

Performance Testing

Basic monitoring transitioned to a culture of Observability, which not only provided visibility into system metrics but also into the complete flow and user experience. When we talk about observability, we no longer talk about isolated systems but about understanding what happens as a whole.

Performance testing was highly relevant throughout the rollout period, inferring the volume that each country would bring based on the number of tickets coming in from the stores. Performance tests were conducted for each critical product, and end-to-end tests were constantly performed to measure the user experience of the Lidl Plus app.

We lacked real-time visibility from the application to the backend. Over the past 5 years, we have worked on that traceability to measure the "happiness" of our users, moving from tools like Firebase or Dynatrace to the current solution based on OpenTelemetry.

We will show the current stack and the ability to infer performance data for a product before going into production, validating workload hypotheses and feedback to improve tests once they are in production.

Almudena Vivanco

Almudena Vivanco

Principal "PringaOps" Performance en SCRM - Lidl International Hub

Almudena is of a mathematical vocation and has been dedicated to performance engineering for 20 years. Almudena has worked on projects with high traffic and high availability from online television platforms, job portals, security proxies, and now a European retailer.

For 17 years, she has been actively involved in the dissemination of DevOps and performance culture in Spain.


Money-saving tips for the frugal serverless developer

by Yan Cui

Join us in this insightful session as we dive into the world of serverless architectures and explore common cost mistakes and learn actionable tips for cutting down wastes and reducing your AWS bill.

Whether you're looking to cut down on CloudWatch costs or improve cost-efficiency for your serverless application, we've got some helpful tips, just for you.

Yan Cui

Yan Cui

Developer Advocate at Lumigo and AWS Serverless Hero

Yan is an experienced engineer who has run production workload at scale on AWS since 2010. He has been an architect and principal engineer in a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming.

He has worked extensively with AWS Lambda in production since 2015. Yan is also an AWS Serverless Hero and the author of Production-Ready Serverless and co-author of Serverless Architectures on AWS, 2nd Edition, both by Manning.


Generative AI: bliss or miss?

by Diana Todea

GenAI is not a brand new technology yet it has become a hot topic in the last couple of years. As many organisations are adopting it within different business cases, SREs and DevOps engineers have a great deal to say about its best use cases.

This talk puts GenAI on the DevOps map, and deep dives into the GenAI applications within the DevOps/SRE realm.

In this talk, I will revisit concepts and technologies linked to GenAI such as transformers, LLMs and RAG and see them applied into observability, in particular in the shape of the AI Assistant and focus on some use cases for DevOps engineers. Whether GenAI is really a must try technology, we will understand by the end of this talk.

Diana Todea

Diana Todea

Senior Site Reliability Engineer at EQS Group

Diana is a Senior Site Reliability Engineer at EQS Group. She is passionate about serverless, AI and machine learning.


A Tale of Tail Latency: Understanding Kubernetes CPU Requests and Limits For Sustainability and Profit

by Ara Pulido

When deploying an application to Kubernetes, each container in a pod should define CPU requests and limits. It is more commonly understood how CPU requests affect the scheduling of your pod and the future pods in the same node. But outside scheduling, CPU requests and limits have some effects on how your containers are created and can heavily impact their performance and their energy footprint.

In this talk we will help clarify some misconceptions about CPU requests and limits by explaining, in a developer friendly way, how they translate to some Linux internals. We will offer some quick tips on how to understand those effects, minimise them, and select good values to reduce your application energy footprint while ensuring its performance.

Ara Pulido

Ara Pulido

Staff Developer Advocate at Datadog

Ara Pulido is a Staff Developer Advocate at Datadog. Prior to that she worked as an Engineering Manager at Bitnami and Canonical, the company behind Ubuntu. She has more than 15 years of experience working on infrastructure open-source companies.


Scaling: from 0 to 25 million users

by Josip Stuhli

A story of how our infrastructure evolved over time to accommodate an increasing number of users - from on-premise to cloud and back down.

How does one make an infrastructure to handle more than a couple of users? How do you go from 100 to 1000 to 100,000 to tens of millions? What happens when due to popular demand hundreds of thousands of users hit your servers at the same time?

I'll tell you a story of how a small team of people managed to move software and services from one server to two, and then to dozens on cloud and then back to on-premise. What we encountered on the way, where we failed, and how we solved it.

Josip Stuhli

Josip Stuhli

CTO @ Sofascore

Josip has been involved with computers for the better part of his life. Started with web development back in high school. Since then he's moved to backend and DevOps.

Loves security stuff and is obsessed with optimising everything. Works in Zagreb as CTO @ Sofascore


Streamlining Compliance: Leveraging Open-Source Terraform AWS modules

by Anton Babenko

Are you navigating the complexities of compliance frameworks like SOC2, CIS, and HIPAA and seeking a more efficient path? This talk breaks down these frameworks simply and shows you a time-saving trick, making it perfect for anyone wanting to make their organization's compliance journey much easier.

I'll start by outlining the basics of these frameworks and highlighting the challenges businesses face in implementing them.

As the creator and maintainer of the terraform-aws-modules projects, I'll be excited to share how using these open-source Terraform AWS modules can streamline the compliance process. I'll walk you through real-life examples showing how such solutions significantly reduce the effort and time required for compliance.

At the end of the talk, attendees will get actionable insights on using Terraform AWS modules for efficient compliance management.

Anton Babenko

Anton Babenko

AWS Community Hero / Terraform fanatic

Anton is AWS Community Hero and helps companies around the globe build solutions using AWS and specializes in infrastructure-as-code, DevOps, and reusable infrastructure components.

He spends much of his time as an open-source contributor on various Terraform & AWS projects. Such as Terraform AWS modules (downloaded more than 500 million times), Terraform best practices ebook, doing serverless with Terraform (serverless.tf), Terraform Weekly (weekly.tf), Your Weekly Dose of Terraform live.


Is Platform Engineering a hype? Let's build a platform together!

by Fernando Ripoll

I will explore how an engineer can build his/her own Cloud Native Platform without losing the cool. I will take a deep dive into the realm of Platform Engineering. What is it? Is it just a buzzword? Can we learn how to build a platform in 50 minutes and demonstrate its value?

Platform Engineering has become a hot topic in the tech industry. It promises to streamline operations, foster innovation, and accelerate product development. But is it just another industry fad, or is it a game-changer here to stay?

Together, we will embark on an exhilarating journey to build a platform from scratch. This hands-on approach will provide attendees with practical insight into the intricacies of Platform Engineering. We'll break down the processes, discuss the tools, and navigate the best practices to construct a robust and scalable platform.

By exploring the practical aspects of Platform Engineering, we aim to demystify the hype and equip participants with the knowledge and skills to leverage this emerging field effectively. Whether you're a seasoned pro or a curious novice, join us as we uncover the real value of Platform Engineering. Let's build, learn, and hack together, as part of a supportive and collaborative community!

Fernando Ripoll

Fernando Ripoll

Solution engineer at Giant Swarm

Hi, I am a solution engineer at Giant Swarm. My beginnings were full-stack development using technologies like Symfony, NodeJS, Golang, and MongoDB, among others. Later, throughout my career, I became more interested in distributed systems, particularly how containers and Kubernetes have changed the developer experience.

Today, I help some big players jump into the Cloud Native world by leading the change from old practices to reveal the benefits of cloud environments.


Fortifying DevOps: Understanding and Fighting Botnet Threats

by Miguel Hernández and Alessandra Rizzo

In the rapidly evolving landscape of cybersecurity, botnets remain a significant threat to Kubernetes and containerization environments. In this talk, we will present a comprehensive overview of our latest research on new groups, delving into their organizational structures, codebases, and tactics. We will explore how these malicious actors share information, select their targets, and offer their services.

By sharing our findings, we hope to raise awareness and facilitate a better understanding of these threats, ultimately contributing to the development of more effective countermeasures.

Botnets represent a significant and evolving threat in the cybersecurity landscape. This presentation aims to shed light on the inner workings of these networks based on extensive research and real-world examples. Attendees will gain insights into:

- Organization and Structure: Understanding how modern botnets are set up and managed.

- Code Analysis: A deep dive into the types of code used by botnet operators to exploit container vulnerabilities.

- Information Sharing: Exploring whether and how these networks share data amongst themselves.

- Target Selection: Analyzing the methods and criteria used by botnets to choose and attack applications.

Our aim is to provide a global view of the current state of botnets, offering valuable knowledge that can aid in the detection, analysis, and mitigation of these threats. This talk is designed for security professionals, researchers, and anyone interested in understanding the complexities and dangers posed by botnets in today’s digital world.

Miguel Hernández

Miguel Hernández

Sr. Threat Research Engineer at Sysdig

Miguel Hernández, Sr. Threat Research Engineer at Sysdig, is a lifelong learner with a passion for innovation.

Over the past decade, Miguel has honed his expertise in security research, leaving his mark at prominent tech companies and fostering a spirit of collaboration through personal open-source initiatives.

Miguel has been a featured speaker at cybersecurity conferences such as HITB, HIP, CCN-CERT, RootedCon, TheStandoff, Bsides Barcelona and Codemotion.

Alessandra Rizzo

Alessandra Rizzo

Threat Detection Engineer at Sysdig

Alessandra Rizzo, Threat Detection Engineer at SysDig, is a malware and Advanced Persistent Threat research enthusiast. She gained expertise as a threat intelligence consultant in Italy with some of Europe's premier financial institutions. In her current role, Alessandra conducts investigations into emerging cloud threats and malicious groups' operations


Tackling Toil: SecOps Automation at Amplify Education

by Johnathan Constance and Calvin Lin

Join us as we discuss innovative methods designed to reduce toil within Security Operations (SecOps) at Amplify Education.

First, we'll detail the use of custom security rules within Datadog, exploring Datadog's built-in scanner detection rules as well as our own methods. Then, we'll discuss a custom tool called IP Blocker that utilizes AWS Web Application Firewall (AWS WAF), Datadog, and other sources to automate blocking of IPs.

Next, we'll discuss the advantages of harnessing Datadog workflows for automating a broad range of SecOps procedures, a strategy that the Amplify DevSecOps team has successfully implemented. Finally, we'll discuss some of the problems that Amplify has run into with our implementation of AWS WAF with a combination of AWS-managed and custom rules.

Johnathan Constance

Johnathan Constance

Director of SecOps at Amplify

Johnathan has over a decade of experience in quality assurance, automation, software engineering, and DevOps, taking on roles as both a leader and an individual contributor.

In 2021, he moved into his current role as the Director of Security Operations at Amplify Education. In this role, Johnathan leads the SecOps team at Amplify, overseeing the introduction of new security technologies and conducting continuous security testing of Amplify's infrastructure.

Calvin Lin

Calvin Lin

Staff SecOps Engineer at Amplify

Calvin has worked at Amplify for over 6 years now tackling different DevOps and SecOps related problems. He has worked on setting up deployment tooling, automation, and WAFs for Amplify.

Recently, Calvin has taken the leadership position on the SecOps team to manage and strengthen Amplify's security posture.


Infrastructure deployments with ArgoCD HA and Crossplane

by Pere Alcoberro

Usage of GitOps methodology to deploy Infrastructure as code using Crossplane and ArgoCD:

- Configuration of Crossplane to have rights to deploy infra from one tooling cluster to the rest of the target Accounts

- Implementation new Infrastructure Kompositions in order to deploy Infrastructure as CRD’s.

- Lifecycle of this kompositions and deployment of the Infrastructure as separate tenants.

- Limitations during the maintenance of this methodology

- Roadmap for the evolution of this toolset.

Pere Alcoberro

Pere Alcoberro

Global Retail DevOps TechLead – DevOps&Cloud Architect

Pere Alcoberro works as an Cloud and DevOps Architect in Allianz Technology SE involved developing for the Global Platform.

He has strong knowledge in Cloud Infrastructure Architectures and Infrastructure as Code.


From Tremors to Saving Lives: A Serverless Kafka Architecture for Timely Earthquake Notifications

by Vlad Onetiu

In our upcoming presentation, we'll explore a cutting-edge architectural solution for real-time SMS and email notifications, particularly geared towards responding to earthquake events. This system is designed to handle rapid data transmission, listening for event changes every second, making it ideal for real time critical alert scenarios. Central to our discussion will be the integration of Lambda functions and Confluent Kafka, coupled with advanced multithreading techniques and DynamoDB lock strategies.

A focal point of our presentation will be addressing the challenges and innovative solutions involved in integrating Confluent Kafka with Lambda functions to enable serverless operation of both producers and consumers. This is a key element in ensuring the quick and efficient distribution of notifications through parallel methods. Additionally, we will delve into the implementation of an automated scaling mechanism, which is vital for optimising the performance of the Serverless Notification ecosystem.

Our aim is to provide a comprehensive insight into how these technologies can be effectively combined to develop a robust and efficient system, capable of delivering critical real-time alerts for situations like earthquake occurrences, ultimately playing a crucial role in saving human lives.

Vlad Onetiu

Vlad Onetiu

DevSecOps / AWS Community Builder

Vlad Onetiu, a DevSecOps and Software Automation Engineer from Romania, is renowned for his expertise in cloud technology, cybersecurity, and software automation.

Since the onset of his career Vlad has made significant strides in these dynamic and crucial technological domains. His proficiency in cloud computing shines through his innovative work with Serverless technologies, where he has adeptly utilised these platforms for sophisticated data extraction combined with system automation architectures written on his page @DataIceberg.

Vlad's skills in software automation and cybersecurity are also evident in his efficient management of CI/CD processes, cloud architectures and in providing valuable security researches, ensuring also the creation of resilient digital infrastructures.


Supercharged GitHub Actions monitoring with OpenTelemetry at Elastic

by Victor E. Martínez Rubio

Ready to observe your GitHub Actions from a central repository? At Elastic, we implemented our custom OpenTelemetry Collector receiver to collect GitHub Actions logs and combine it with the existing traces receiver to observe all workflows in our GitHub organization. Learn about the challenges we encountered, how we solved them, and see how centralized logs, traces, and metrics empower the analysis and visualization of GitHub workflows.

At Elastic, we use GitHub Actions in multiple repositories for our CI/CD pipelines. However, we faced challenges with decentralized logs, which made troubleshooting issues that spanned multiple workflow runs or repositories difficult.

In this session, we explain how we centralized GitHub Actions telemetry using OpenTelemetry Collector and how it helped us improve our analysis and visualization of GitHub workflows.

Initially, we focused on scanning logs to detect security vulnerabilities and creating a unified platform for searching, analyzing, and visualizing logs, complete with custom alerts and notifications.

As our project progressed, we realized the broader advantages of centralized logs combined with traces and metrics, which we are going to explore with real-world examples.

We will examine how we handled spikes in log volume, navigated GitHub Actions API rate limits, and ensured data integrity while implementing the custom OpenTelemetry Collector receiver for GitHub Actions log collection.

We planned to use OpenTelemetry Collector as the primary log receiver and exporter. To ensure reliability, we intended to queue webhook events with a proxy service, which sends them to the collector at a controlled pace and retries failed requests.

We will discuss how to fine-tune the receiver for log volume efficiency and optimize the collector's reliability. Visualizations will showcase the impacts of various configuration changes on performance, and we will explain why we did not implement the proxy service.

Finally, we will share real-world examples of how centralized logs, traces, and metrics have empowered our analysis and visualization capabilities by showcasing how we leveraged detection rules to find leaked secrets and sensitive information in logs, making identifying and remediating security vulnerabilities easier. showing how we used traces to identify bottlenecks and the most failing runs to optimize our workflows, demonstrating how centralized logs helped us identify the frequency of flaky commands and prioritize optimization and troubleshooting efforts, sharing how we crafted informational dashboards using the provided traces and metrics to help us find optimization opportunities.

Victor E. Martínez Rubio

Victor E. Martínez Rubio

Principal Software Engineer at Elastic

Victor is a principal software engineer at Elastic and lives in sunny Spain. Victor works with different teams to build and improve the CI/CD ecosystem.

He has contributed to Jenkins, OpenTelemetry, and other communities for years. He loves riding his bike, eating good food, and having quality time with his family.


Zero Trusting as a True Cloud Native Dev

by Didier Di Cesare

Are your applications really cloud native? As a developer, you must be concerned about who can access resources in your system.

You probably think of authentication and authorization as any other logic – ifs and elses executed before performing critical operations

Did you know the Kubernetes Role-Based Access Control and authentication can be wisely combined to other cloud native technologies to compose a platform that will help you avoid spaghetti code, implement best practices for application security as a true cloud native developer, while delegating some of the burden to other layers of your system?

Attendees to this session will learn how to leverage Kube to build Zero Trust authorization the cloud native way. The talk will demo use cases of tailor-made data security leveraging cloud native technology, including Envoy and Open Policy Agent, that reclaim security policies as a proper concern, decoupled from the application's code at the same level as Deployments and Services.

Didier Di Cesare

Didier Di Cesare

Principal Software Engineer at Red Hat

Didi has been geeking out with computers since the days when moving one required a team effort. As a developer who rarely says no, fuelled by yerba mate and determination, he's transformed countless lines of code into functional solutions. From TypeScript to Rust, Functional to OO, and On-Premise to Cloud, he has been levelling up his skills inside and out.

Didi is currently brewing up services and Kubernetes operators at the Kuadrant project, crafting secure, efficient solutions for Auth and Rate Limiting mainly.


Managing a Data Lake on AWS

by Alexander Günsche

Transactional infrastructure is not suited for processing large amounts of data for analytics. In this talk, participants will learn about data architecture fundamentals and get deep insights into building an enterprise-grade data lake with a business intelligence frontend on AWS, using AWS analytics services such as Glue, Athena, Lake Formation, Kinesis and QuickSight.

(While the presentation is based on AWS, the fundamental concepts are transferable to other environments.)

Alexander Günsche

Alexander Günsche

Senior Solutions Architect at AWS

Alex is a Senior Solutions Architect at AWS with 20 years of IT experience in expert and leadership roles. He is a strong advocate of agile and DevOps practices, and he enjoys seeing serverless, cloud-native and event-driven architectures deployed at scale.

He has delivered large transformation projects and successfully developed own and customers’ businesses. He is a trained and experienced speaker, having presented at a wide range of events.


Building Fastly's Platform for Scale: A Journey of Continuous and Incremental Engineering

by Inés Sombra and Patrick Hamann

This talk explores Fastly's journey in building and scaling its platform, highlighting key architectural principles and addressing the inherent challenges of achieving scalable growth. The focus is on understanding how Fastly prioritizes availability, horizontal scaling, data ownership, and a platform-centric approach.

We’ll discuss the critical role of real-time monitoring and user feedback in our engineering cycles, ensuring that our platform evolves in response to actual usage patterns. Through concrete case studies, we’ll illustrate how these practices have led to measurable improvements in performance and user experience.

Join us to learn how Fastly’s dedication to continuous improvement helps create a better internet where all experiences are fast, safe and engaging

Inés Sombra

Inés Sombra

VP of Engineering at Fastly

Ines Sombra is a VP of Engineering at Fastly, where she spends her time helping the Web go faster.

Ines holds an M.S. in Computology with an emphasis on Cheesy 80’s Rock Ballads. She has a fondness for steak, fernet, and running after a toddler who won't stay put. Follow Ines @randommood

Patrick Hamann

Patrick Hamann

Sr. Principal Software Engineer at Fastly

Patrick is a Distinguished Engineer in Fastly’s Core Systems engineering group, where he oversees the architecture of their control plane systems. Prior to joining Fastly, he helped architect some of the world’s largest media websites, including The Guardian and the Financial Times.

When not speaking or ranting about performance, he enjoys spending time with his family and discovering new places and food.


Thanks for your kind support!

diamond Sponsors

Thanks for your kind support!

gold Sponsors

  • Q-tech
  • Redis
  • AXA
  • Fastly
  • CriteoEngineering
  • Tokiota
  • Clever Cloud
Thanks for your kind support!

silver Sponsors

  • Qualifyze
  • Sysdig
Thanks for your kind support!

bronze Sponsors

  • lumigo
  • CTO Camp
  • Rigor Alliance
  • Grafana