Intelligent models often work very well in laboratory or "controlled" datasets but when it comes to testing them with big, real world, datasets we often suffer from some lack of maturity in the technical side of things: we need infrastructure, QA processes, automation pipelines, constant data ingestion, deploy evolutions of the model... to name a few
In this talk we will discuss MLOps (what it is, why we need multiple software development profiles...), where are the most common bottlenecks and some open-source initiatives to ease the "go to production" phase such as Gradio's Hugging Face, Streamlit or Kubeflow.
Nerea Luis has a PhD in Computer Science and works as Artificial Intelligence Lead at Sngular where she leads projects related to Machine Learning and Computer Vision, among others. She is passionate about outreach, artificial intelligence, and robotics.
Nerea is co-founder of T3chFest and Women Techmaker Scholar in 2016 by Google. In 2018, the COTEC Foundation selected her as an expert in Technology, Talent and Gender within its network “Los 100 de Cotec”. She has been the winner of the Innovative ICT recognition by the Fundación Cibervoluntarios. In 2019, Nerea was awarded by the Royal House with the decoration to the Order of Civil Merit and she has also been selected within the Top 100 leading women in Spain in the revelation category.
In 2020 she was included in the Future Leaders ranking prepared by LlyC and among the 21 changemakers of Forbes Spain.
Communication between different parts of your distributed application can become very complex if you don’t have good architecture practices.
In this session, you will learn how and when to apply two architectural patterns in your distributed applications: orchestration and choreography. In addition, you will learn how to orchestrate complex workflows with state machines and how an event bus can help you choreograph micro services.
And you will also get good practices on how to use them to make your application scale and be maintainable.
Marcia is a developer advocate for AWS and the host of FooBar a youtube channel (http://bit.ly/foobar-youtube) where she publishes content every week, related to serverless and the cloud.
She has been designing and developing software professionally for 15 years and worked in all the different stages of building scaling and performant software. She has deep knowledge of building applications in the cloud and using DevOps processes.
DevOps engineers tend to be obsessed with their favorite tools and platforms. That could be Docker, Kubernetes, Terraform, Prometheus, Grafana, Crossplane, or any other among miriad of those labeled as "DevOps".
However, that is often missing the point of what we're trying to accomplish. The goal should be to enable everyone to be in full control of their applications, including dependent services and infrastructure. DevOps is about having self-sufficient teams and the only way to accomplish that is by providing services that everyone can consume. Instead of waiting for requests to create a cluster, perform security scanning, deploy an application, and so on, ops and other specialized teams should be enabling others to do those operations.
That enablement is best accomplished by creating an Internal Developer Platform (IDP).
In this session, we'll explore the architecture and the key ingredients needed for an IDP. We'll also discuss the key benefits of an IDP and we'll see, through a demo, how we could build one.
We'll combine tools like Backstage, Argo CD, Crossplane, and quite a few others into a platform that everyone can use, no matter their experience level.
Viktor Farcic is a Developer Advocate at Upbound, a member of the Google Developer Experts, CDF Ambassadors, and Docker Captains groups, and a published author.
His big passions are DevOps, GitOps, Microservices, Continuous Integration, Delivery and Deployment (CI/CD), and Test-Driven Development (TDD).
He is a host of the YouTube channel DevOps Toolkit and a co-host of DevOps Paradox. He published The DevOps Toolkit Series and Test-Driven Java Development.
My team and I are SREs and run one of the most important service stacks at Google: authentication. Our role is to make our services reliable and secure through engineering projects. But we sometimes fail, even if our SLO can be as high as 99.9999% uptime (31.56 seconds downtime per year) in some cases.
Whenever our services are down, mostly every Google product is down, our billions of customers can't access their own data they have entrusted us with and you make it quickly through the press. Business across the globe can be affected.
When you are oncall and get one of the pages that makes you think "oh gosh", what really happens behind the scenes? How to go from a potentially cryptic alert message to a full blown incident response team coordinating over tens of engineers?
After mitigation, the complete repair starts and the forensics style root cause analysis needs to indicate what happened and how to prevent that failure class forever. We also need to travel back in time: outages do not randomly happen, but have a trigger in a broken process, a system interaction, a small code piece.
In this talk, we'll go through the beautiful process of failure and recovery, examining real outages that have affected hundreds of millions of customers and seeing what happened, how we approached it and what we learned. We'll deep dive on some of the responses and how can the be exported to other organisations. We'll learn how our organisation has evolved to be resilient as well, over the last 15 years of operating systems at hyper-scale.
Ramón is a Staff Site Reliability Engineer at Google where he works on the Identity team. He started back in 2011 as an intern and has since then become team Technical Lead (TL), Engineering Manager and recently moved into a üTL role for the all Privacy, Safety and Security teams. Our role is to store, manage and safeguard user accounts, from account creation down to credential management passing by account security like hijacking and phishing protection. We employ hundreds of microservices across our stack, that offers a variety of protocols and APIs to our users and customers. They run in thousands of machines in tens of data centers across the globe and must be as reliable as possible as not only other Google consumer products depend on them, but people and enterprises worldwide that use Google Workspace and Google Cloud Platform for their businesses.
Prior to Google, Ramón worked at CERN, being part of the Physics Department and the ATLAS Collaboration, where he developed the ROOT framework for data analysis and then the functional testing framework to validate and ensure the reliability of the distributed computing facilities that allowed for the Higgs Boson discovery in 2012.
He holds a Computer Engineering MSc but for the last decade has been researching part time on autonomic computing and the management of computer fleets in data centers and enterprises to optimise and reduce the power usage of them. Hopes to be delivering this research this summer!
Currently when we think of Infrastructure as Code (IaC), one tool seems to stand out and has become a de-facto standard: Terraform.
With Terraform you can easily build, edit and version your whole infrastructure by using Terraform builtin providers or custom ones.
But sometimes there is no provider for the infrastructure you intend to use, not even the lone no-star repository in a lost corner of the internet, only a custom REST API. What can you do? Going back to manual operations? Create your own scripts?
In this talk Horacio and Aurélie will show you, step by step, how to go from an infrastructure API to a fully functional yet light Terraform provider. By taking as base a REST API, they will explain the basics of provider creation, give some pointers on how to do a simple yet efficient provider architecture and show you the code and the provider in action.
Will they succeed in this new mission?
I'm a DevRel at OVHcloud in Toulouse, France. I am been working as a Developer and Ops for over 16 years. Cloud enthusiast and advocates DevOps/Cloud/Golang best practices.
I'm one of the leaders of Duchess France, an association that promotes women developers and women working in IT.
Conferences and meetups organizer since 2016. Technical writer (dev.to/aurelievache), a book author & reviewer, a sketchnoter and a speaker at international conferences.
I created a new visual way for people to learn and understand Cloud technologies: "Understanding Kubernetes/Istio/Docker in a visual way" in sketchnotes and videos.
Spaniard lost in Brittany, coder, speaker, dreamer and all-around geek.
Horacio works as Director of DevRel at OVHcloud. He is also the co-founder and leader of the @FinistDevs and @RdvSpeakers communities.
Horacio loves web development in general and everything around Web Components and standards web in particular, but he also loves to discuss Kubernetes, AI and cloud in general. He is a Google Developer Expert (GDE) in Web Technologies and Flutter.
Copy and paste of Terraform reduce reusability, maintainability, and scalability of our configuration. In an effort not to repeat ourselves, we might start moving our configuration into modules and run into new scaling and collaboration challenges!
In this talk, I will describe some of the challenges and lessons learned in building, scaling, and maintaining the public Terraform modules for AWS components and how to apply them to your modules.
I defined an initial goal for those modules to provide a powerful and flexible way to manage infrastructure on AWS but with more than a couple thousand issues and pull-requests opened since the beginning, the goal had to change. What started as an initial set of basic Terraform AWS modules for common architectural patterns on AWS soon became a base for many customers using Terraform and required radical changes and improvements.
I will describe some of the challenges along the way and lessons learned when building entirely open-source Terraform modules used by thousands of developers. Some problems are technical such as versioning, quality assurance, documentation, compatibility promises, and upgrading. Other problems are around collaboration and software design principles, such as how to reason about feature-requests or how small should a module be. I will also examine the testing strategy for terraform-aws-modules and discuss the reasoning for not having tests!
I will provide a list of dos and don’ts for Terraform modules that highlight the critical features (e.g., documentation, feature-rich, sane defaults, examples, etc.), which make terraform-aws-modules scalable in collaboration and use.
By the end of the talk, attendees will understand proven practices around building useful, maintainable, and scalable modules, learn about public modules available to them, and how they can participate in making those open-source projects better.
Anton is AWS Community Hero and helps companies around the globe build solutions using AWS and specializing in infrastructure-as-code, DevOps, and reusable infrastructure components.
He spends a large amount of his time as an open-source contributor on various Terraform & AWS projects. Such as Terraform AWS modules (downloaded more than 100 million times), Terraform best practices ebook (www.terraform-best-practices.com), serverless.tf, weekly.tf, https://bit.ly/terraform-youtube.
Choosing the right open-source project to use can be quite challenging - not knowing if it’s going to be the right fit, how it will behave, and if you end up wasting time trying to make it all work. We’ve all been there.
But what if I told you there’s a practical way to have a clear understanding of how to incorporate an OSS project in your environment?
In this talk, I’m going to speak about the DevOps perspective on open-source and the challenges Infra-focused engineers have with choosing the right project for their environment.
As a DevOps Engineer, I’ve seen a lot of things, stumbled upon a lot of non-based decisions, and so will present practical advice on how to choose an OSS project for your dev/prod environment and will talk about the business mindset you should have to evaluate the key indicators based on your needs and specific pain points.
Hila Fish is a Senior DevOps Engineer at Wix, with 15 years of experience in the tech industry.
She’s also a public speaker who believes the DevOps culture is what drives a company to perform at its best, and talks about that and other DevOps/Infrastructure topics at conferences.
She carries the vision to enhance and drive business success by taking care of its infrastructure.
In her spare time, Hila is a lead singer of a cover band, giving back to the community by co-organizing DevOps-related conferences (Inc. "DevOpsDays TLV" & "StatsCraft" monitoring conference), mentoring and leading a program in “Baot” (An Israeli community of technical women in tech), and sharing her passion and knowledge on various tech communities & initiatives, Social Media and her colleagues.
Tracing and telemetry are popular topics right now, but the development is so quick that it also confuses:
- Starting with OpenTracing, then W3C Trace-Context, and now OpenTelemetry there are plenty of standards, but what do or don't they cover?
- With tracing being stable, how are the metric and log efforts going in OpenTelemetry?
- Where is OpenTelemetry headed as a project, and how can both users run it in combination with their vendor of choice?
This talk gives an overview of standards, projects, and how they all tie together.
Philipp lives to demo interesting technology. Having worked as a web, infrastructure, and database engineer for over ten years, Philipp is now a developer advocate and EMEA team lead at Elastic — the company behind the Elastic Stack consisting of Elasticsearch, Kibana, Beats, and Logstash.
Based in Vienna, Austria, he is constantly traveling Europe and beyond to speak and discuss open source software, search, databases, infrastructure, and security
13 years ago we had the idea to organise a conference in Gent to bridge the gap between developers and the people runing their code. It was the start of a new global movement. We never predicted that #devops would be where #devops is today. The word devops has evolved, the community has evolved.
Docker has solved all of our problems, the ones left behind were solved by Kubernetes. Everybody and their neighbour is Scrum certified now and we are all happily sipping cocktails on the beach. Or not? Why after almost 10 years of pushing culture change, teaching about Infrastructure as Code, teaching about Monitoring and Metrics … and help people to share both their pain and their learnings are most organisations still struggling with software delivery.
Over the years the word devops lost it’s meaning at least it’s original meaning. The real challenge for the next decade will be to see how we can revive those original values and ideas, if at all... Can we fix Devoops ? This talk will give you some Ideas about that.
Kris Buytaert is a long time Linux and Open Source Consultant. He's one of instigators of the devops movement, currently working for Inuits and O11y.
People curse when they run into dns problems and realise he is right.
In 2009 he was one of the people who started the original #devopsdays in Ghent
He runs SaaS Platforms, helps people to adopt devops practices, automates all the things... while monitoring them...
His blog titled "Everything is a Freaking DNS Problem" can be found at https://www.krisbuytaert.be/blog/.
Container technologies, although not new, have increased their popularity in the past few years, with container orchestrators allowing companies around the world to adopt these technologies to help them ship and scale microservices with precision and velocity.
Kubernetes is currently the most popular container orchestration platform nowadays, and the one chosen by Datadog to run its infrastructure. We run dozens of clusters, with thousands of nodes, and we run them on different public clouds. How are our +1K engineers able to use this infrastructure platform successfully?
Join me in this talk for our story on what we learned while we scaled our Kubernetes clusters, the contributions to Kubernetes we made along the way, how we are building a development platform around it, and how you can apply those learnings when growing your Kubernetes clusters from a handful to hundreds or thousands of nodes.
Ara Pulido is a Senior Developer Advocate at Datadog. Prior to that she worked as an Engineering Manager at Bitnami and Canonical, the company behind Ubuntu. She has more than 10 years of experience working on infrastructure open-source companies. She is a Certified Kubernetes Administrator.
She lives in Malaga, Spain, where she enjoys three of her favorite free time activities: hiking during spring and autumn; snorkeling, during summer; and going for tapas with friends all year round.
In this session we will address best practices to effectively implement Zero Trust architectures within our organization, this will allow us to reduce risk in all environments by establishing strong identity verification, validating device compliance before granting access and ensuring least privilege access only to explicitly authorized resources.
We will discuss how through the services Azure offers us, using Azure for Zero guardrails and blueprints to implement it and focusing primarily on identities, devices, applications, data, infrastructure and networks, helping us increase the speed with which cloud-based initiatives achieve authorization is a critical part of modernization.
I've been developing with .NET and c# many years ago, then I assumed a technical consultant role working with Azure. During this process I have discovered the perfect world of the cloud and it turned into my main professional focus.
I've worked some years in different countries and finally to Barcelona, always following the inner need to keep exploring and spreading. I am also one of the Netcoreconf leads.
Passionate about new technologies, highlighting Netcore, Microsoft Azure, Xamarin, IA, Bots and ASP.NET. Passionate about team management.
I contribute to the developers community writing articles in my personal blog and giving speeches in numerous events. I am also one of the Netcoreconf leads.
The hotels Network decided to move from a batch processing analytics platform to a real time analytics platform based on a services architecture.
To accomplish this, we decided to look for kafka solutions (finally decided to go with Redpanda) and found we needed to add ksqlDB to our architecture.
We will share what drove us here, the key decisions, the mistakes we made, the goals we achieved... It might not be exactly your case, but sure you can benefit from some of our findings
Able to handle multiple work fields and environments, I consider myself an easy learner.
With my university studies focusing in Automatics and Control Systems, I am specially interested in data, robotics, software and artificial intelligence and machine learning.
I work as a Database Realiability Engineer, with more than 9 years experience with databases.
I'm currently pushing towards improving the streaming capabilities of THN as a company. I have had a key role in the transition from batch to real-time since the beginning.
Over the past few years, CDNs have evolved into sophisticated edge cloud platforms with focus on flexibility, programmability, functionality, and security. But it's been a path easier said than done.
In this session, we'll discuss this transformation, how the core building blocks remain vitally important, and how properly built edge cloud platforms offer brand new ways for scaling applications well beyond just content distribution.
Hooman Beheshti is VP of Technology at Fastly, where he develops web performance services.
A pioneer in the application acceleration space, Hooman helped design one of the original load balancers while at Radware and has held senior technology positions with Strangeloop Networks and Crescendo Networks.
He’s been developing the core technologies that make the Internet work faster for nearly 20 years and is an expert and frequent speaker on the subjects of load balancing, application performance, and content delivery networks.
This talk will show how important it is to onboard network engineers on the DevOps culture as one of the pillars of the DevOps transformation.
Traditionally, network operations have been seen as close silos where everything moved slowly and with few (or none) visibility from the applications or systems engineering side, and it’s also true that their evolution into the DevOps principles has taken more than on the system administration, but the good news are that in 2022, most of these teams have closed the gap and are now key players on delivering IT services, adding a lot of value with them.
We can’t forget that, in a lot of cases, we sustain our applications on hybrid deployments running on complex network architectures, with multiple layers of abstraction, starting from the “real” IP network, passing over several overlays (does K8S sound familiar?) and, finally, the service mesh. Including network engineering knowledge in this endeavour will help understanding and optimising these architectures, enforcing consistent IP address management and consolidating more metrics to enrich your observability system.
In this session you will understand how to make this transformation possible and identify relevant skills and tooling that a network engineer has to adopt to jump on this boat.
Christian is currently working as Principal Architect in Network Automation at Network to Code, a pioneer in the network automation industry
He has been working on improving network manageability and resiliency for more than 15 years up to now. Serving in different roles as network reliability engineer, DevOps engineer and network automation engineer.
He loves developing software to improve network operations and build new network services, and also contributing back to the community via open source projects and sharing knowledge.
A presentation on how GitHub builds and deploys software and the pillars upon which those development and delivery practices are built.
This talk will provide some insights into the challenges and complexities in having a globally distributed workforce, providing abilities for developers to collaborate, innovate and experiment safely, ensure compliance and security and ship changes reliably to production systems.
Peter Murray is a Principal Technical Architect at GitHub where he works in the Field Services team supporting customers to adopt and scale GitHub to meet their needs. He has spent over 10 years in the UK financial sector supporting companies adopting DevOps practices and performing migrations to the Cloud before joining GitHub.
Peter is passionate about automation and solutions to drive developer productivity to enable people to deliver their best with the least amount of friction. He had knowledge of a variety of development languages and associated toolsets, CI/CD solutions and IaC tools like Ansible and Terraform.
Zero trust security is predicated on securing everything based on trusted identities. Machine authentication and authorization, machine-to-machine access, human authentication and authorization, and human-to-machine access are the four foundational categories for identity-driven controls and zero trust security. The transition from traditional on-premises datacenters and environments to dynamic, cloud infrastructure is complex and introduces new challenges for enterprise security.
This shift requires a different approach to security, a different trust model. One that trusts nothing and authenticates and authorizes everything. Because of the highly dynamic environment, organizations talk about a "zero trust" approach to cloud security. What does “zero trust” actually mean and what’s required for you to make it successful?
Attend this session and you’ll learn from Armon Dadgar, HashiCorp founder and CTO how your organization can enable scalable, dynamic security across clouds.
Armon Dadgar is one of the Co-founders and CTO of HashiCorp. He has a passion for security and distributed systems and applies those to the world of DevOps tooling and cloud infrastructure.
As a former engineer, he has worked on the design and implementation of many of the core HashiCorp products.
He has been named to the Forbes and Inc 30-under-30 lists for transforming enterprise technology. He studied computer science at the University of Washington, where he met his Co-founder Mitchell Hashimoto.