Exploring Life as a Senior Software Engineer in Distributed Systems — Insights from Workday Dublin

Workday Life
5 min readJul 15, 2024

--

Workday branded blog cover, image contains the Workday logo and an image of Workmate Caroline. Text reads: Exploring Life as a Senior Software Engineer in Distributed Systems — Insights from Dublin, Workday Life Blog.

I was inspired to work in tech by watching too many movies and TV shows about a computer genius saving the day with a keyboard. While my career has been nothing like the movies, I am really happy where I ended up. I love solving problems and that is what I do every day as a Senior Software Engineer at Workday in Dublin. I joined the company as a graduate nearly seven years ago, and I was lucky enough to work with two different teams before finding my home on the monitoring team, where I work today.

As a DevOps engineer, I maintain and support all our Workday applications in production, finding deep satisfaction in investigating, identifying, and solving complex issues — from new feature development to bug fixes and production debugging. Actually, sometimes I feel like I’m part Software Engineer and part detective, so maybe my life is a little bit like the movies after all!

A Day in the Life of My Team

My team provides a platform for Workday service teams to easily collect, visualise and create alerts on metrics produced by their service, in order to gain insights into their service’s health.

This means we are constantly developing agents that are deployed alongside every Workday service, to allow teams to easily configure metric collection. We also deploy and manage the Time Series Database (TSDB) which stores the metrics data, and the visualisation layer used to view and alert on the metrics data.

It’s hugely important that we work effectively not only within our own team, but across the different teams all over the business. When we are not busy creating features or improvements to the collector agents, or optimising the performance of the TSDB that serves millions of write requests and hundreds of read requests per second, we are helping our service teams effectively leverage our platform.

Apart from being experts on our own application, we must also have a good working knowledge of the deployment platforms such as Chef, Kubernetes and OpenStack leveraged by all of Workday’s service teams. An example of a service is the Environment Manager Service (EMS) which manages the allocation of each of our customers’ tenants to the servers in each data center. The agents such as the Grafana Agent and Prometheus Server, we develop must be available to service them all regardless of how or where they are deploying in the Workday ecosystem. We partner with them to integrate seamlessly into the deployment platforms, and provide a monitoring system that works almost ‘out of the box’ for everyone.

Why My Role Is Crucial to Workday

If the Workday monitoring platform is down, we don’t have visibility into the health of customers’ tenants, meaning we can’t identify and fix issues. That’s why observability and constant effective monitoring are crucial to providing our best-in-class service. This includes actionable alerts that highlight issues as they arise, and dashboards visualising the service health metrics which help to quickly identify the root cause of the issue.

Our platform is patched in a zero down time fashion meaning any upgrades to our service are transparent to our customers. Our architecture is also fault-tolerant; meaning we must be able to survive major hardware failures without degradation to the service we provide to our customers.

Scaling Up into the Future

Our biggest challenge is scale. We leverage open source technologies such as Grafana Mimir as the Time Series Database and the Grafana-Agent to send metrics to Mimir in our monitoring stack which work incredibly well. But as we operate at enormous scale (much bigger than most of the community who use the same technologies), we are really pushing the tools to their limits. We must serve millions of write and read requests, and quickly! As such, we spend a lot of our time tuning our open source projects to be performant under the tremendous load, and developing in-house applications to integrate with, and improve the performance of these open source projects.

One of these projects was our recent migration from a third-party vendor for our TSDB and visualisation needs, to our own in-house solution. We developed this by leveraging open source solutions and the public cloud. It’s probably one of the most exciting things I’ve been involved with so far, as it’s offered a lot of opportunities for improvements to the platform that were not possible when using a vendor. Since the migration we have better data governance and role based access control. In our current solution we have a better ownership model whereby each team has their own namespace in which to create alerts and dashboards. We have also been able to build governance standards into the platform, meaning we have better control over the alerts, dashboards and metrics meaning they must meet a minimum standard that ensures better monitoring for all Workday services.

A Career of Continuous Learning

I love learning and using new technologies, and luckily for me, we’re always evolving our systems and capabilities at Workday. In fact, the list of technologies I work with these days is almost entirely different to what I used when I joined the team seven years ago, really demonstrating the constant innovation and evolution of the Workday platform.

If you’re interested in getting into distributed systems I’d recommend learning the concepts of containers and the management of containerised applications at scale. Kubernetes is a great orchestration tool to develop distributed systems. Getting yourself set up in a sandbox and deploying Kubernetes on some sort of public or private cloud can help create a highly available architecture and teach you the fundamentals of load balancing and storage orchestration etc. While tools like OpenStack can be leveraged to create a fleet of virtual machines to run a Kubernetes cluster, alternatively there are multiple public cloud vendors such as AWS, GCP and Azure that provide scalable infrastructure solutions to learn on.

I am excited to continue improving our new stack to provide our customers with a monitoring solution they love to use. Who knows what our next software innovation will be here. I can’t wait to be part of it!

Workday branded blog graphic containing an image of Workmate Caroline the author of the blog. Text reads: Meet Caroline, Tenure: 7 years Location: Dublin, Ireland Title: Senior Software Engineer (Distributed Systems), Caroline, a Dublin-based Senior Software Engineer specialising in Distributed Systems, joined Workday through our graduate program. Interestingly, her inspiration to pursue a tech career came from watching movies and TV shows.

A brighter work day is just around the corner. ☀️ Explore career opportunities here. For more #WDAYLife content, follow us on Instagram, Facebook, Twitter, and LinkedIn.

--

--

Workday Life

A view into #WDAYLife as told through our culture and the stories of our Workmates. Your work days are brighter here☀️.