Let's Talk DevOps

Real-World DevOps, Real Solutions

Category: Devops

  • 7 SRE tools to know today

    7 SRE tools to know today

    As an SRE or platform engineer, you’re likely constantly looking for ways to streamline your workflow and make your day-to-day tasks more efficient. One of the best ways to do this is by utilizing popular SRE or DevOps tools. In this post, we’ll take a look at 7 of the most popular tools that are widely used in the industry today and explain their value in terms of how they can help make you more efficient in your day-to-day tasks.

    1. Prometheus: Prometheus is a popular open-source monitoring and alerting system that is widely used for monitoring distributed systems. It allows you to collect metrics from your services and set up alerts based on those metrics. Prometheus is known for its simple data model, easy-to-use query language, and powerful alerting capabilities. With Prometheus, you can quickly and easily identify issues within your systems and be alerted to them before they become a problem.
    2. Grafana: Grafana is a popular open-source visualization tool that can be used to create interactive dashboards and charts based on the metrics collected by Prometheus. It allows you to easily view the health of your systems, identify trends, and spot outliers. With Grafana, you can quickly and easily identify patterns and trends within your data, which can help you optimize your systems and improve their performance.
    3. Kubernetes: Kubernetes is an open-source container orchestration system that allows you to automate the deployment, scaling, and management of containerized applications. It helps you to define, deploy, and manage your application at scale, and to ensure high availability and fault tolerance. With Kubernetes, you can automate many routine tasks associated with deploying and managing your applications, which frees up more time for you to focus on other important tasks.
    4. Ansible: Ansible is an open-source automation tool that can be used to automate the provisioning, configuration, and deployment of your infrastructure. Ansible is known for its simple, human-readable syntax and its ability to easily manage and automate complex tasks. With Ansible, you can automate the provisioning and configuration of your infrastructure, which can help you save time and reduce the risk of errors.
    5. Terraform: Terraform is a popular open-source tool for provisioning and managing infrastructure as code. It allows you to define your infrastructure as code and to use a simple, declarative language to provision and manage resources across multiple providers. With Terraform, you can automate the process of provisioning and managing your infrastructure, which can help you save time and reduce the risk of errors.
    6. Jenkins: Jenkins is an open-source automation server that can be used to automate the building, testing, and deployment of your software. It provides a powerful plugin system that allows you to easily integrate with other tools, such as Git, Ansible, and Kubernetes. With Jenkins, you can automate many routine tasks associated with building, testing, and deploying your software, which frees up more time for you to focus on other important tasks.
    7. GitLab: GitLab is a web-based Git repository manager that provides source code management (SCM), continuous integration, and more. It’s a full-featured platform that covers the entire software development life cycle and allows you to manage your code, collaborate with your team, and automate your pipeline. With GitLab, you can streamline your entire software development process, from code management to deployment, which can help you save time and reduce the risk of errors.

    These are just a few examples of the many popular SRE and DevOps tools that are widely used in the industry today.

  • Here’s to devops…a poem

    In devops, we're constantly on call 
    Our work is never done, no matter how small 
    We're always ready to troubleshoot and fix 
    Our skills are diverse, our knowledge is mixed
    We're agile and flexible, always adapting 
    We're proactive, we're never static 
    We're experts in automation and efficiency 
    We're the bridge between development and IT
    We're passionate about our craft 
    We strive for continuous improvement, it's what we're after 
    We're the glue that holds everything together 
    We're the unsung heroes, working in all kinds of weather
    So here's to devops, the backbone of technology 
    We may not always get the recognition, but we do it proudly 
    We're a vital part of the team, and we know our worth 
    We're the devops engineers, bringing stability to this earth
  • AWS EC2 Spot – Best Practices

    Amazon’s EC2 has several options for running instances. On-demand instances is what would be used by most. Reserved instances are used by those who can do some level of usage prediction. Another option which can be a cost saver is using Spot instances. Amazon claims savings up to 90% off regular EC2 rates using Spot instances.

    AWS operates like a utility company as such it has spare capacity at any given time. This spare capacity can be purchased through Spot instances. There’s a catch, though. With a 2 minute warning, Amazon can take back that “spare capacity” so using Spot instances needs to be carefully planned. When used correctly Spot instances can be a real cost-saver.

    When to use Spot instances

    There is a fairly broad set of use cases for using Spot instances. The general consensus is simply containerized, stateless workloads, but in reality there’s a lot more.

    • Distributed databases – think MongoDB or Cassandra or even Elasticsearch. These are distributed so losing one instance would not affect the data; simply start another one
    • Machine Learning – typically these are running training jobs and losing it would only mean the learning stops until another one is started. ML lends itself well to the Spot instance paradigm
    • CI/CD operations – this is a great one for Spot instances
    • Big Data operations – AWS EMR or Spark are also great use cases for Spot instances
    • Stateful workloads – even though these applications would need IP and data persistence, some (maybe even all) of these may be candidates for Spot instances especially if they are automated properly.

    Be prepared for disruption

    The primary practice for working in AWS in general, but also working with Spot instances is be prepared. Spot instances will be interrupted at some point when it’s least expected. It is critical to create your workload to handle failure. Take advantage of EC2 instance re-balance recommendations and Spot instance interruption notices.

    The EC2 re-balance recommendation will notify of an elevated risk of Spot instance interruption in advance of the “2 minute warning”. Using the Capacity Rebalancing feature in Auto-scaling Groups and Spot fleet will provide the ability to be more proactive. Take a look at Capacity Rebalancing for more detail.

    If the workloads are “time flexible” configure the Spot instances to stop or hibernate vs terminated when an interruption occurs. When the spare capacity returns the instance will be restarted.

    Use the Spot instance interruption notice and the Capacity rebalance notice to your advantage by using the EventBridge to create rules to gracefully handle an interruption. One such example is outlined next.

    Using Spot instances with ELB

    In a lot of cases Elastic Load Balancer (ELB) is used. Instances are registered and de-registered to the ELB based on health check status. Problem with Spot instances is the instance do not de-register automatically so there may be some interruption if the situation is not handled properly.

    The proper way would be to use the interruption notice as a trigger to de-register the instance from the ELB. By programmatically de-registering the Spot instance prior to termination traffic would not be routed to the instance and no traffic would be lost.

    Easiest way is to use a Lambda function to trigger based on a Cloudwatch instance termination notice. The Lambda function simply retrieves the instance ID from the event and de-registers the instance from the ELB. As usual, Amazon Solution Architects showed how to do it on the AWS Compute Blog.

    Keep your options open

    The Spot capacity pool consists of a set of unused EC2 instances with the same instance type (t3.micro, m4.large, etc) and Availability Zone (us-west-1a). Avoid getting too specific on instance types and what zone they use. For instance, avoid specifically requesting c4.large if running the workload on a m5, c5, or m4 family would work the same. Keep specific needs in mind, vertically scaled workloads need more resources and horizontally scaled workloads would find more availability in older generation types as they are in less demand.

    Amazon recommends being flexible across at least 10 instance types and there is never a need to limit Availability Zones. Ensure all AZs are enabled in your VPC for your instance to use.

    Price and capacity optimized strategy

    Take advantage of Auto Scaling groups as the allocation strategies will enable provisioning capacity automatically. The price-capacity-optimized strategy in Spot Fleet due to how the instance capacity is sourced from pools with optimal capacity. This strategy will reduce the possibility of having the Spot instance reclaimed. Dig into the Auto Scaling User Guide Spot Instances section for more detail. Also take a look at this section which describes when workloads have a high cost of interruption.

    Think aggregate capacity

    Instead of looking at individual instances, Spot enables a more holistic view across units such as vCPUs, network, memory, or storage. Using Spot Fleet with Auto Scaling Groups allows for a higher level view enabling the concept of “target capacity”. Automating the request for more resources to maintain the target capacity of a workload enables considerable flexibility.

    Other options to consider

    Amazon has a considerable number of services which can be integrated with Spot instances to manage compute costs. Used effectively these services will allow for more flexibility and automation eliminating the need to manage individual instances or fleets. Take a look at the EC2 Spot Workshops for some ideas and examples.

  • Devops Toolkit for Automation

    In the DevOps methodology automation is likely the most important concept. Use “automate everything” as a mantra daily.

    Image by Michal Jarmoluk from Pixabay

    As an “operator” working in a DevOps role good tools are a necessity. Tools which allow for automating most everything is crucial to keeping up with the vast amount of changes and updates created in a Agile development environment.

    Using the same tools your counterparts on the team use will allow for expediting the learning process. In a lot of cases developers use a IDE (Integrated Development Environment) of some sort. Visual Studio Code comes to the forefront, but some ‘hardcore’ or ‘old school’ developers still use Emacs or even Vim as their development tool of choice. There are many out there and each has its pros and cons. Along with a IDE there will be the need for extensions to make things simpler. Let’s outline a few and focus on Visual Studio Code as the tool of choice.

    Visual Studio Code is available for most of the commonly used platforms. It has a ton of extensions, but as a “DevOps Engineer” you’ll need a few to make your life easier. First and foremost you’ll want extensions to make working with your favorite cloud provider easier. There are plugins for AWS, GKE, and AKS as well as plugins for yaml, Kubernetes, and Github.

    Another extension necessary for container development is the Remote Development Extension Pack. This extension provides the Dev Containers extension allowing for the opening of files and folders inside a container. It also provides a SSH extension to simplify access to remote machines. The Dev Containers extension will want to use Docker Desktop, but a better alternative is Rancher Desktop.

    Rancher Desktop is another superb tool for several reasons.

    • 100% open source
    • Includes K3s as the Kubernetes distribution
    • Can use with dockerd (moby) or containerd
    • Basic dashboard
    • Easy to use

    To get started with it, download Rancher Desktop and install on your favorite platform. Follow the installation instructions and once installed go to the preferences page and select “dockerd (moby)” as shown below.

    Rancher Desktop Kubernetes Settings

    Now that you have Rancher Desktop installed as well as Visual Studio Code with all of the extensions take some time to get familiar with it. Best to start with your github account and create or fork a repository to work with inside Visual Studio Code. Reading through the various getting started docs yields hours of things to try or work with to learn.

    To get started with your Rancher Desktop cluster simply click on the Rancher Desktop icon. In most windowed environments there’s a icon in the “task bar”.

    Click on the Dashboard link to get access to view the K3s cluster installed when Rancher Desktop started.

    Another way to access the cluster is to use kubectl. A number of utilities were installed to ~/.rd/bin. Use kubectl get nodes to view the node(s) in your cluster or use kubectl get pods -A to view all of the pods in the cluster.

    Many utilities exist to view/manage Kubernetes clusters. Great learning experiences come from experimentation.

    A lot was accomplished in this post. From a bit of reading to manipulating a Kubernetes cluster there is a lot of information to absorb. Visual Studio Code will be the foundation for a lot of the work done in the DevOps world. Containers and Kubernetes will be the foundation for the execution of the work created. This post provided the building blocks to combine the Dev and the Ops with what’s needed to automate the process.

    Next up…building a simple CI/CD pipeline.

  • Getting started in DevOps

    Getting started in DevOps doesn’t have to be hard.

    Image by Dirk Wouters from Pixabay

    How do we get started…starting with some assumptions.

    1. You understand how to install and manage a Kubernetes cluster.
    2. You understand how to ‘git’ around. (heh…like the pun?)
    3. You know how CI/CD pipelines work.
    4. You understand some development. Or at least you know how to get around tools like VSCode.

    There’s plenty of knowledge to be found here so let’s get started.

    In most cases companies needing people who understand DevOps best practices are either starting on or already executing a Digital Transformation journey. These journeys are just that, a journey so grab a seat, buckle up, and enjoy the ride. This particular ride involves a lot of buzzword bingo games. There will be plenty of opportunity for playing that game later.

    Getting started on the journey

    The first part of every journey is preparing for it. It helps to learn a bit more about the destination before embarking on the actual journey to that destination so watch for the buzzwords. The first thing to note is a lot of enterprises have a lot of technical debt. Suffice to say there will be a lot of work for far more developers than there are resources for said developers. From ancient Microsoft .net work to crufty java, there’s plenty of history in those binaries. One of the goals may be to modernize these applications. The fabulous book, “The Phoenix Project” describes how “Phil” takes a over budget and behind schedule modernization project to deployment utilizing effective collaboration and communication, crowdsourcing, and the “Three Ways”.

    Hopefully “The Phoenix Project” helped to frame what is in store for embarking on the adventure into DevOps. The next steps are to put in practice some of the constructs outlined. One of the key tenants of the book was to ensure the “pipeline” has no obstructions as one single slow down will slow the entire line of work. This slow down will create bottlenecks which, in turn, will create a ripple effect on the entire process. These “pipelines” in a cloud native development world are part of the CI/CD process or continuous improvement, continuous development pipeline.

    Other takeaways

    Gene Kim outlined a few other takeaways in “The Phoenix Project” worth noting. The first one came from the need to work in smaller groups. Jeff Bezos is credited with creating the “Two Pizza Team” where the teams are limited in size (consume 2 pizzas per team). This is how a lot of the innovation came from within Amazon. Small, competitive teams who communicated very well. This small team concept leads to another concept of “microservices”.

    Instead of monoliths where everything runs together, microservices breaks each service into a functional unit. Microservices are focused on putting services into the smallest possible unit of work. With smaller units of work comes smaller changes which can be committed and tested faster as well as tested locally in most cases. Microservices will be a key concept to note on this digital transformation journey. Microservices create the need for cross functional teams where communication and collaboration is key. This is where the concept of DevOps comes into play. Employing the DevOps methodologies is crucial to the success of a transformative project.

    Enterprises around the world have endured a massive sea of change in the years since the Covid-19 pandemic started. Even as companies were beginning to embrace the concepts of digital transformation, Covid-19 forced an acceleration of this transformation if the enterprise wanted to survive. Embracing remote work was key to survival.

    This post was simply an introduction. With the key concepts outlined subsequent related posts will focus more on a technical guide to the technology underneath embracing a DevOps methodology. With DevOps many tools exist to help in the many facets including how to create a culture within an organization capable of embracing the change needed to adopt ongoing transformation to adapt to even the slightest change in your organizations market.

    Next up…the introduction of a Podman setup to start down the path of using, managing, and orchestrating containers.