Many successful paradigms in engineering and computer science are the result of two distinct approaches colliding with each other, leading to broader and more powerful applications. In this talk, we’ll look at the parallel backgrounds of two established paradigms: SQL and Observability.
We’ll be tracing back the history of both paradigms. How they managed to avoid each other despite SQL being the lingua franca of data manipulation, and how the industry standardization, fuelled by open-source innovation, has now propelled SQL back into the game as an observability language. We’ll also highlight case studies and benchmark results to provide the necessary elements for the attendee to answer a simple question: is Sql-based observability applicable to my use case? highlighting also the current limitations of this approach and leaving the conclusions for the attendees to draw.
More info in: https://clickhouse.com/blog/the-state-of-sql-based-observability
Integrations dude at ClickHouse, fascinated by data in all its forms, Ryadh enjoys working closely with users, enabling them to turn massive datasets into actionable insights.
How to properly size a service through performance testing and take those metrics into production. The key lies in Observability
In the last 5 years, the Lidl Plus product has grown from 2 stores in Zaragoza to 13,000 stores across Europe. From 100,000 users in 2018 to 90million in 2024. To carry out this titanic work in an organized and budget-friendly manner, emphasis was placed on two relevant points:
Monitoring and Observability
Performance Testing
Basic monitoring transitioned to a culture of Observability, which not only provided visibility into system metrics but also into the complete flow and user experience. When we talk about observability, we no longer talk about isolated systems but about understanding what happens as a whole.
Performance testing was highly relevant throughout the rollout period, inferring the volume that each country would bring based on the number of tickets coming in from the stores. Performance tests were conducted for each critical product, and end-to-end tests were constantly performed to measure the user experience of the Lidl Plus app.
We lacked real-time visibility from the application to the backend. Over the past 5 years, we have worked on that traceability to measure the "happiness" of our users, moving from tools like Firebase or Dynatrace to the current solution based on OpenTelemetry.
We will show the current stack and the ability to infer performance data for a product before going into production, validating workload hypotheses and feedback to improve tests once they are in production.
Almudena is of a mathematical vocation and has been dedicated to performance engineering for 20 years. Almudena has worked on projects with high traffic and high availability from online television platforms, job portals, security proxies, and now a European retailer.
For 17 years, she has been actively involved in the dissemination of DevOps and performance culture in Spain.
Join us in this insightful session as we dive into the world of serverless architectures and explore common cost mistakes and learn actionable tips for cutting down wastes and reducing your AWS bill.
Whether you're looking to cut down on CloudWatch costs or improve cost-efficiency for your serverless application, we've got some helpful tips, just for you.
Yan is an experienced engineer who has run production workload at scale on AWS since 2010. He has been an architect and principal engineer in a variety of industries ranging from banking, e-commerce, sports streaming to mobile gaming.
He has worked extensively with AWS Lambda in production since 2015. Yan is also an AWS Serverless Hero and the author of Production-Ready Serverless and co-author of Serverless Architectures on AWS, 2nd Edition, both by Manning.
GenAI is not a brand new technology yet it has become a hot topic in the last couple of years. As many organisations are adopting it within different business cases, SREs and DevOps engineers have a great deal to say about its best use cases.
This talk puts GenAI on the DevOps map, and deep dives into the GenAI applications within the DevOps/SRE realm.
In this talk, I will revisit concepts and technologies linked to GenAI such as transformers, LLMs and RAG and see them applied into observability, in particular in the shape of the AI Assistant and focus on some use cases for DevOps engineers. Whether GenAI is really a must try technology, we will understand by the end of this talk.
Diana is a Senior Site Reliability Engineer at EQS Group. She is passionate about serverless, AI and machine learning.
When deploying an application to Kubernetes, each container in a pod should define CPU requests and limits. It is more commonly understood how CPU requests affect the scheduling of your pod and the future pods in the same node. But outside scheduling, CPU requests and limits have some effects on how your containers are created and can heavily impact their performance and their energy footprint.
In this talk we will help clarify some misconceptions about CPU requests and limits by explaining, in a developer friendly way, how they translate to some Linux internals. We will offer some quick tips on how to understand those effects, minimise them, and select good values to reduce your application energy footprint while ensuring its performance.
Ara Pulido is a Staff Developer Advocate at Datadog. Prior to that she worked as an Engineering Manager at Bitnami and Canonical, the company behind Ubuntu. She has more than 15 years of experience working on infrastructure open-source companies.
A story of how our infrastructure evolved over time to accommodate an increasing number of users - from on-premise to cloud and back down.
How does one make an infrastructure to handle more than a couple of users? How do you go from 100 to 1000 to 100,000 to tens of millions? What happens when due to popular demand hundreds of thousands of users hit your servers at the same time?
I'll tell you a story of how a small team of people managed to move software and services from one server to two, and then to dozens on cloud and then back to on-premise. What we encountered on the way, where we failed, and how we solved it.
Josip has been involved with computers for the better part of his life. Started with web development back in high school. Since then he's moved to backend and DevOps.
Loves security stuff and is obsessed with optimising everything. Works in Zagreb as CTO @ Sofascore
Are you navigating the complexities of compliance frameworks like SOC2, CIS, and HIPAA and seeking a more efficient path? This talk breaks down these frameworks simply and shows you a time-saving trick, making it perfect for anyone wanting to make their organization's compliance journey much easier.
I'll start by outlining the basics of these frameworks and highlighting the challenges businesses face in implementing them.
As the creator and maintainer of the terraform-aws-modules projects, I'll be excited to share how using these open-source Terraform AWS modules can streamline the compliance process. I'll walk you through real-life examples showing how such solutions significantly reduce the effort and time required for compliance.
At the end of the talk, attendees will get actionable insights on using Terraform AWS modules for efficient compliance management.
Anton is AWS Community Hero and helps companies around the globe build solutions using AWS and specializes in infrastructure-as-code, DevOps, and reusable infrastructure components.
He spends much of his time as an open-source contributor on various Terraform & AWS projects. Such as Terraform AWS modules (downloaded more than 500 million times), Terraform best practices ebook, doing serverless with Terraform (serverless.tf), Terraform Weekly (weekly.tf), Your Weekly Dose of Terraform live.
I will explore how an engineer can build his/her own Cloud Native Platform without losing the cool. I will take a deep dive into the realm of Platform Engineering. What is it? Is it just a buzzword? Can we learn how to build a platform in 50 minutes and demonstrate its value?
Platform Engineering has become a hot topic in the tech industry. It promises to streamline operations, foster innovation, and accelerate product development. But is it just another industry fad, or is it a game-changer here to stay?
Together, we will embark on an exhilarating journey to build a platform from scratch. This hands-on approach will provide attendees with practical insight into the intricacies of Platform Engineering. We'll break down the processes, discuss the tools, and navigate the best practices to construct a robust and scalable platform.
By exploring the practical aspects of Platform Engineering, we aim to demystify the hype and equip participants with the knowledge and skills to leverage this emerging field effectively. Whether you're a seasoned pro or a curious novice, join us as we uncover the real value of Platform Engineering. Let's build, learn, and hack together, as part of a supportive and collaborative community!
Hi, I am a solution engineer at Giant Swarm. My beginnings were full-stack development using technologies like Symfony, NodeJS, Golang, and MongoDB, among others. Later, throughout my career, I became more interested in distributed systems, particularly how containers and Kubernetes have changed the developer experience.
Today, I help some big players jump into the Cloud Native world by leading the change from old practices to reveal the benefits of cloud environments.
In the rapidly evolving landscape of cybersecurity, botnets remain a significant threat to Kubernetes and containerization environments. In this talk, we will present a comprehensive overview of our latest research on new groups, delving into their organizational structures, codebases, and tactics. We will explore how these malicious actors share information, select their targets, and offer their services.
By sharing our findings, we hope to raise awareness and facilitate a better understanding of these threats, ultimately contributing to the development of more effective countermeasures.
Botnets represent a significant and evolving threat in the cybersecurity landscape. This presentation aims to shed light on the inner workings of these networks based on extensive research and real-world examples. Attendees will gain insights into:
- Organization and Structure: Understanding how modern botnets are set up and managed.
- Code Analysis: A deep dive into the types of code used by botnet operators to exploit container vulnerabilities.
- Information Sharing: Exploring whether and how these networks share data amongst themselves.
- Target Selection: Analyzing the methods and criteria used by botnets to choose and attack applications.
Our aim is to provide a global view of the current state of botnets, offering valuable knowledge that can aid in the detection, analysis, and mitigation of these threats. This talk is designed for security professionals, researchers, and anyone interested in understanding the complexities and dangers posed by botnets in today’s digital world.
Miguel Hernández, Sr. Threat Research Engineer at Sysdig, is a lifelong learner with a passion for innovation.
Over the past decade, Miguel has honed his expertise in security research, leaving his mark at prominent tech companies and fostering a spirit of collaboration through personal open-source initiatives.
Miguel has been a featured speaker at cybersecurity conferences such as HITB, HIP, CCN-CERT, RootedCon, TheStandoff, Bsides Barcelona and Codemotion.
Alessandra Rizzo, Threat Detection Engineer at SysDig, is a malware and Advanced Persistent Threat research enthusiast. She gained expertise as a threat intelligence consultant in Italy with some of Europe's premier financial institutions. In her current role, Alessandra conducts investigations into emerging cloud threats and malicious groups' operations
Join us as we discuss innovative methods designed to reduce toil within Security Operations (SecOps) at Amplify Education.
First, we'll detail the use of custom security rules within Datadog, exploring tools such as GuardDuty, CloudTrail, and our own Scanner Detection methods. Then, we'll discuss a custom tool called IP Blocker that utilizes AWS Web Application Firewall (AWS WAF), Datadog, and other sources to automate blocking of IPs.
Next, we'll discuss the advantages of harnessing Datadog workflows for automating a broad range of SecOps procedures, a strategy that the Amplify DevSecOps team has successfully implemented. Finally, we'll discuss some of the problems that Amplify has run into with our implementation of AWS WAF with a combination of AWS-managed and custom rules.
Johnathan has over a decade of experience in quality assurance, automation, software engineering, and DevOps, taking on roles as both a leader and an individual contributor.
In 2021, he moved into his current role as the Director of Security Operations at Amplify Education. In this role, Johnathan leads the SecOps team at Amplify, overseeing the introduction of new security technologies and conducting continuous security testing of Amplify's infrastructure.
Calvin has worked at Amplify for over 6 years now tackling different DevOps and SecOps related problems. He has worked on setting up deployment tooling, automation, and WAFs for Amplify.
Recently, Calvin has taken the leadership position on the SecOps team to manage and strengthen Amplify's security posture.
We are building platforms to enable developers and golden path, we are delivering faster in production and we are doing it with golden paths in mind. We are using Kubernetes, GitOps, maybe even an IDP but what about test environments?
Often forgotten during the software development lifecycle, rapid and reliable testing environments are much needed to ensure product quality.
Traditional testing environments can fall short in scalability or be a complete waste of resource and it will inevitably lead to bottlenecks in the development pipeline.
This session aim is to showcase through a live demo our choosen approach to spawn on-demand ephemeral environments triggered by pull requests, leveraging the power of Flux and GitHub actions.
Community leader and CNCF ambassador, Alessandro has spent the last few years building cloud native infrastructures for Microsoft customers, animating the Dutch community, and training others to pass the CKx exams.
He has passion for all things cloud native, he's been around open source for 25 years and recently started building a new concept around namespace-as-a-service called Kubespaces.io
In our upcoming presentation, we'll explore a cutting-edge architectural solution for real-time SMS and email notifications, particularly geared towards responding to earthquake events. This system is designed to handle rapid data transmission, listening for event changes every second, making it ideal for real time critical alert scenarios. Central to our discussion will be the integration of Lambda functions and Confluent Kafka, coupled with advanced multithreading techniques and DynamoDB lock strategies.
A focal point of our presentation will be addressing the challenges and innovative solutions involved in integrating Confluent Kafka with Lambda functions to enable serverless operation of both producers and consumers. This is a key element in ensuring the quick and efficient distribution of notifications through parallel methods. Additionally, we will delve into the implementation of an automated scaling mechanism, which is vital for optimising the performance of the Serverless Notification ecosystem.
Our aim is to provide a comprehensive insight into how these technologies can be effectively combined to develop a robust and efficient system, capable of delivering critical real-time alerts for situations like earthquake occurrences, ultimately playing a crucial role in saving human lives.
Vlad Onetiu, a DevSecOps and Software Automation Engineer from Romania, is renowned for his expertise in cloud technology, cybersecurity, and software automation.
Since the onset of his career Vlad has made significant strides in these dynamic and crucial technological domains. His proficiency in cloud computing shines through his innovative work with Serverless technologies, where he has adeptly utilised these platforms for sophisticated data extraction combined with system automation architectures written on his page @DataIceberg.
Vlad's skills in software automation and cybersecurity are also evident in his efficient management of CI/CD processes, cloud architectures and in providing valuable security researches, ensuring also the creation of resilient digital infrastructures.
Ready to observe your GitHub Actions from a central repository? At Elastic, we implemented our custom OpenTelemetry Collector receiver to collect GitHub Actions logs and combine it with the existing traces receiver to observe all workflows in our GitHub organization. Learn about the challenges we encountered, how we solved them, and see how centralized logs, traces, and metrics empower the analysis and visualization of GitHub workflows.
At Elastic, we use GitHub Actions in multiple repositories for our CI/CD pipelines. However, we faced challenges with decentralized logs, which made troubleshooting issues that spanned multiple workflow runs or repositories difficult.
In this session, we explain how we centralized GitHub Actions telemetry using OpenTelemetry Collector and how it helped us improve our analysis and visualization of GitHub workflows.
Initially, we focused on scanning logs to detect security vulnerabilities and creating a unified platform for searching, analyzing, and visualizing logs, complete with custom alerts and notifications.
As our project progressed, we realized the broader advantages of centralized logs combined with traces and metrics, which we are going to explore with real-world examples.
We will examine how we handled spikes in log volume, navigated GitHub Actions API rate limits, and ensured data integrity while implementing the custom OpenTelemetry Collector receiver for GitHub Actions log collection.
We planned to use OpenTelemetry Collector as the primary log receiver and exporter. To ensure reliability, we intended to queue webhook events with a proxy service, which sends them to the collector at a controlled pace and retries failed requests.
We will discuss how to fine-tune the receiver for log volume efficiency and optimize the collector's reliability. Visualizations will showcase the impacts of various configuration changes on performance, and we will explain why we did not implement the proxy service.
Finally, we will share real-world examples of how centralized logs, traces, and metrics have empowered our analysis and visualization capabilities by showcasing how we leveraged detection rules to find leaked secrets and sensitive information in logs, making identifying and remediating security vulnerabilities easier. showing how we used traces to identify bottlenecks and the most failing runs to optimize our workflows, demonstrating how centralized logs helped us identify the frequency of flaky commands and prioritize optimization and troubleshooting efforts, sharing how we crafted informational dashboards using the provided traces and metrics to help us find optimization opportunities.
Victor is a principal software engineer at Elastic and lives in sunny Spain. Victor works with different teams to build and improve the CI/CD ecosystem.
He has contributed to Jenkins, OpenTelemetry, and other communities for years. He loves riding his bike, eating good food, and having quality time with his family.
Are your applications really cloud native? As a developer, you must be concerned about who can access resources in your system.
You probably think of authentication and authorization as any other logic – ifs and elses executed before performing critical operations
Did you know the Kubernetes Role-Based Access Control and authentication can be wisely combined to other cloud native technologies to compose a platform that will help you avoid spaghetti code, implement best practices for application security as a true cloud native developer, while delegating some of the burden to other layers of your system?
Attendees to this session will learn how to leverage Kube to build Zero Trust authorization the cloud native way. The talk will demo use cases of tailor-made data security leveraging cloud native technology, including Envoy and Open Policy Agent, that reclaim security policies as a proper concern, decoupled from the application's code at the same level as Deployments and Services.
Principal Software Engineer at Red Hat, core member of the Kuadrant Project, developer and maintainer of Authorino
Transactional infrastructure is not suited for processing large amounts of data for analytics. In this talk, participants will learn about data architecture fundamentals and get deep insights into building an enterprise-grade data lake with a business intelligence frontend on AWS, using AWS analytics services such as Glue, Athena, Lake Formation, Kinesis and QuickSight.
(While the demo is based on AWS, the fundamental concepts are transferable to other environments.)
Alex is a Senior Solutions Architect at AWS with 20 years of IT experience in expert and leadership roles. He is a strong advocate of agile and DevOps practices, and he enjoys seeing serverless, cloud-native and event-driven architectures deployed at scale.
He has delivered large transformation projects and successfully developed own and customers’ businesses. He is a trained and experienced speaker, having presented at a wide range of events.