Get ready for three days full of mind-blowing talks. One single track so you won't miss anything, the best speakers you could ever imagine and the most exciting content!
In a perfect work we would run monolithic systems on machines with unlimited CPU, Memory, Disk and Network IO, and with 100% reliability.
The reality is that such a machine does not exist, and to deliver the demands of performance and availability our users demand from our systems, we have to be creative in partitioning workload over multiple machines, geographic regions, and failure domains. This approach introduces complexity to our systems, we expect failure and design accordingly.
In this talk, Mishra and Nic will walk through the areas of complexity in a system we will then look at what patterns you can employ to ensure performance and availability even in a failing world.
This demo-driven talk will showcase the patterns that can help to adopt a distributed architecture for your applications, especially when moving from a monolithic architecture to a distributed one.
Machine-learning systems have become increasingly prevalent in commodity software systems. They are used through cloud-based APIs or embedded through software libraries. However, even ML systems just look like another data pipeline, they make systems sensible and might put systems health at risk without the proper control.
Through discussions with engineers engaged in deploying and operating ML systems, we arrived at a set of principles and best practices. These include from input-data validation, for fairness/quality on training; contextual alerting, deployment and rollback policies to privacy and ethics . We discuss how these practices fit in with established SRE practices, and how ML requires novel approaches in some cases. We look at a few specific cases where ML-based systems did not behave as did traditional systems, and examine the outcomes in light of our recommended best practices.
DevOps and Agile are great at speeding up development and providing quick results, however most organisations struggle to transition and adapt their team and leadership structures, which is made of FAIL.
In this talk I'll go through some of the experiences I had and interesting ways to solve it while keeping it light so you don't have to run for coffee.
With an investment 4.1bn in boot technology, I want to tell you about the awesome innovations Nationwide Building Society UK have made and challenges we have overcome as a financial services institution, applying DevOps the right way, building a cloud platform. But wait! Let’s make it raw, I’ll tell you about the pain, the politics, huge consumers of K8s & ChatOps pus find out how we made a trading card game for the office the ended up driving some phenomenal behaviours! Why do almost all of our platform engineers write code.
Nationwide Building Society UK has invested 4.1bn in the future of technology for the society, cloud is a sensitive word when you’re a financial services organisation, it’s even more delicate when you’re regulated by the Financial Services Authority and the Prudential Regulation Authority.
I would love to tell you the story of how one of the most risk adverse fs institutions navigated cloud adoption, the technologies we have used and how, including kubernetes. Everyone has a heard of the Spotify model, if one thing is clear, you shouldn’t adopt it, let me tell you about how we took agile to the next level, what was right for us at our scale, maturity.
How does a trading card game have anything to do with agile or operating a cloud platform at scale, I don’t want to give everything away just yet but you should listen to the talk and think about giving it a shot your self.
ChatOps has been a huge technology enabler for us, I’l talk you through some of the stuff we’ve built into our bot such as automatic postmtportems and diary management, dashboards at a glance, how we automated our ingress kill mechanism and used the Shamir algorithm to distribute the key.
How do we organise our squads, demand flow, ensure we’re delivery, why scum didn’t work for us and why kanban make much more sense. How we built, chat-op’ed the hell out of the on-call rota using twilio and slack! Demo!!! Observability, nice as a concept but how did we drive adoption when we’re a platform, DevOps Dojos! Why do we encourage our engineers to take a couple of days out a month to teach!
We have a lot to cover but I’m sure you will find this insightful and interesting, fs orgs usually move like frozen snails and we’re dancing like matrix neo on a red pill.
ChatOps demo’s included, they are pretty quick which makes them excellent demos and seeds of inspiration.
In this talk, we show the audience how we combined our experience as an enterprise SaaS provider with our experience as an OpenShift IaaS provider, to create an easy to use, easy to update and easy to rollback `Infrastructure as Code` Toolchain/Monitoring solution.
This solution faces updating, misconfiguration and diversification issues, by using Docker Images in combination with OpenShift.
We further discuss our self-service system, which enables us to decrease maintenance and to avoid manual steps by using automated deployment of large amounts of instances.
We will then discuss the idea of saving all configuration inside version control, while totally ignoring the users' UI changes.
Transitioning a monolith to distributed microservices in the cloud is an excruciating process.
A stateless API Gateway can help you preserve your existing API contract while developers chop the monolith in different microservices, and publish the new specification transparently.
KrakenD API Gateway is a well-tested open source software with no external dependencies or moving pieces. A cluster will facilitate a journey to cloud and microservices without any supporting databases or single points of failure.
The API Gateway integration with Prometheus, Jaeger, Zipkin and other logging, metrics, and tracing systems will keep the system behavior observable at all times.
During this presentation, we are going to discover the architecture behind this design, how to create the endpoints in a declarative way (no coding), and HORROR STORIES!
Kubernetes has become the standard tool to deploy applications both in Cloud and on premise datacenters.
This has been possible thanks to its powerful API and its model that allows us to describe the lifecycle of an application.
In this talk we are going to explain how Kubernetes works internally, showing what Kubernetes controllers are and how they work. Once we understand that, we will learn how we can extend the features that Kubernetes already offers to tailor it to our needs, and how big companies are investing on projects and tools that are based on this mechanism.
Tired of wrapping pickled models in server logic? me too! The biggest bottleneck in delivering machine learning services is the handover from data science to engineering.
Providing scientists with a predicable and safe way to deploy their own models will decouple the engineering and data science efforts allowing each departments to focus on delivering value instead of completing repetitive tasks.
This was achieved by exposing a pub/sub interface where data models just have to register as services and they are automatically exposed to the rest of the organisation.
What to do when you must monitor the whole infrastructure of the biggest European hosting and cloud provider? How to choose a tool when the most used ones fail to scale to your needs? How to build an Metrics platform to unify, conciliate and replace years of fragmented legacy partial solutions?
In this talk we will relate our experience building and maintaining OVH Metrics, the platform used to monitor all OVH infrastructure. We needed to go to places where most monitoring solutions hadn’t gone before, it needed to operate at the scale of the biggest European hosting and cloud providers: 27 data centers, more than 300k servers (bare metal!), and hundreds of products to fulfill our mission to host 1.3 million customers.
You will hear about time series, about open source solutions pushed to the limit, about HBase clusters operated at the extreme, and how about a small team leveraged the power of a handful of open source solution and lots of coding glue to build one of the most performant monitoring solutions ever.
In this talk i will explain how to integrate a Jmeter in your continuous integration system.
I will demo how to create a stack with grafana, influxdb and jmeter in docker to show in real time results. And how to create more legible tests thanks to blazemeter taurus in a yaml format.
You might be asking yourself why would you migrate in the first place? I will answer that in a bit, first let’s talk briefly about the cloud.
Since the advent of the "cloud" and with the rise of multiple players on this space (Azure, Google Cloud, AWS), tools have made their appearance too. Tooling is indispensable for automation and automation is fundamental for scaling. Tooling will also allow you to do something else: avoiding vendor lock in in favor of a more flexible concept: cost of migration. You can apply this very same concept when you are not as advanced or working with a minor or mid-sized player in the cloud providers arena, and give you the possibility to move to another provider that better suits your needs.
Why would you migrate? There could be several reasons: cost saving, service level and in our case, lack of tooling.
In this talk I’ll cover a one and a half year journey at Packlink, where we migrated from cloud provider and in the way went from:
Letting you know about the problems we found, solutions implemented and tradeoffs took.
Mobile app releases are a manual, tedious and error-prone process when done at scale.
It is not possible to follow a continuous delivery (CD) approach because binaries need to be submitted to the stores, and once published they can’t be rolled back. Moreover, mobile releases can take several days to complete.
In contrast, we deploy the web version of Shopify around 50 times a day. We identified this gap between web and mobile releases as a problem.
In this talk, I’ll explain how my team built a continuous integration (CI) system for our mobile apps (Android and iOS), and how we used this infrastructure as a foundation to build a Ruby on Rails application and other tools to automate and orchestrate releases to Google Play and the App Store, reducing the need for human interaction as much as possible.
We’ll see the challenges of standardizing the release process and the coordination with third-party services like GitHub and the store APIs, and how we made it easier for developers to test the apps.