UKCloud Ltd

Role: Automation and Service Reliability Engineer → DevOps Engineer · Pastoral Manager · Agile Lead

January 2020 – February 2023 · Farnborough, Hampshire · Hybrid

The Role
#

I spent three formative years at UKCloud, the UK’s premier sovereign cloud provider for government, public sector, defence, and regulated industries. My time there spanned two distinct phases: infrastructure automation specialist delivering reliability across highly-secure government environments, then emerging leader taking on people management and project leadership while continuing my technical work.

UKCloud: Context & Mission
#

UKCloud was built around a clear strategic mandate: deliver cloud services to organisations requiring data sovereignty, regulatory compliance, and security assurance. All infrastructure was hosted in UK datacentres under UK legal jurisdiction—deliberately designed for government and regulated sectors that cannot tolerate data crossing political or legal boundaries.

The company operated across specialised divisions:

G-Cloud: Public sector cloud services, on the government procurement framework since 2012
UKCloud Health: For healthcare organisations with compliance and data sensitivity requirements
UKCloudX: “First public cloud for high classification systems in the UK”—serving defence and national security, operating completely air-gapped infrastructure

My team—the Automation Service Reliability (ASR) team, later renamed Core Delivery Platform—worked across all of these environments, automating and maintaining mission-critical systems across the full spectrum of UKCloud’s security tiers.

UKCloud’s Environment Architecture
#

It’s worth explaining, because the environment shaped almost everything about the work.

UKCloud operated four distinct environment tiers, each with different connectivity and security profiles:

Environment	Connectivity	Purpose
Dev	Equivalent to Assured	Development and testing; matched Assured architecture so code behaved identically in live
Assured	Semi-connected (extremely restricted)	Government workloads requiring some connectivity; data-restricted access
Elevated	No internet; data feeds from Assured only	More sensitive government workloads requiring isolation from internet
UKCloudX	Completely air-gapped, separate data centre	Defence and national security; high-classification systems

UKCloudX deserves particular mention. Operating with a completely separate data centre and strict control over all information flow:

All materials and data entering the environment via a sheep-dip process (security scanning and sanitisation of all incoming content)
Anything leaving required a multi-person authorisation and exfiltration process (extremely restricted)
This is sovereign cloud at its most rigorous — a security posture few engineers ever work inside

The ASR team worked across all of these tiers. This meant building automation that could operate in fundamentally different paradigms: from environments with restricted-but-present internet feeds, through to completely offline systems where every piece of tooling and data had to be physically controlled.

The Technical Challenge
#

Every service we built faced two non-negotiable constraints:

Availability: 99.90% platform SLA—government and defence sectors cannot tolerate service outages
Compliance: Operating within an ISO27001-compliant organisation with formal change management, documented security procedures, risk management frameworks, and audit trails

Neither was an afterthought; they were the starting point. Every automation decision, every deployment pipeline, every monitoring choice had to answer to reliability and security at the same time.

Phase 1: Infrastructure Automation Specialist (Jan 2020 – Jul 2022)
#

Core Responsibilities
#

Infrastructure Automation & Delivery:

Designed and implemented Ansible-based automation for deploying services across government and defence infrastructure
Delivered automation jobs via Rundeck (automation delivery platform), consuming data from Device42 CMDB—much of our automation was grounded in, or informed by, Device42’s asset and configuration data
Built CI/CD pipelines enabling continuous, reliable delivery across Dev, Assured, Elevated, and UKCloudX environments
Extended infrastructure-as-code practices with Terraform, supporting both semi-connected and air-gapped systems
Integrated Test-Driven Development framework (Molecule) into existing Ansible workflows—elevated code quality and team confidence in infrastructure changes

Highly-Available Systems:

Architected and maintained highly-available service configurations using cluster deployments, HAProxy, and KeepAliveD
Designed for zero-downtime operations: every component redundant, every failure anticipated
Supported virtualisation (VMware: ESX, vSphere, vCloud Director) and containerisation (Docker, Docker Swarm) platforms

Multi-Environment Deployment:

Deployed solutions across the full UKCloud environment spectrum—from semi-connected Assured to completely air-gapped UKCloudX
Different deployment strategies per environment: standard connectivity vs. air-gapped where all tooling and data must be physically controlled
Used a single automation framework (Ansible) as the consistent foundation across all environments despite their fundamentally different constraints

Monitoring & Observability:

Implemented monitoring using OpsView (Nagios-based) and Science Logic
Designed alerting systems balancing rapid incident response with compliance requirements
Built and maintained monitoring dashboards providing visibility across complex, multi-tier infrastructure

Incident Ownership:

Took full ownership of service incidents from first alert through to resolution—regardless of complexity or hour
In-hours and out-of-hours support for mission-critical systems, with proactive and reactive response
Culture of accountability: no hand-off mid-incident; you own it through to conclusion

Change Management & Compliance:

Enacted all changes using ITIL-compliant change management procedures
Operated within a heavily certified organisation: ISO9001 (Quality Management), ISO20000 (IT Service Management), ISO27001 (Information Security), and ISO27018:2014 (Personal Data Protection in Cloud)—the last of which UKCloud was the first company to achieve statement of verification for from LRQA
Formal information security management across documented procedures, access controls, risk management, and full audit trails for all infrastructure changes

Cross-Team Collaboration:

Acted as a key bridge between ASR and Software Engineering teams
Actively built stronger inter-team relationships, enabling ASR to be more responsive to Software Engineering’s needs and requirements
Set clear expectations from Software Engineering based on ASR team workload and priorities—improved transparency and reduced friction between teams

Phase 2: Leadership Expansion (Jul 2022 – Feb 2023)
#

Note: everything from Phase 1 continued. The roles below were taken on in addition to technical responsibilities, not as a replacement.

Pastoral Management
#

In 2022, I took on formal responsibility for the pastoral wellbeing of four full-time undergraduate staff members. This role was deliberately separated from technical management—creating space for team members to discuss challenges, frustrations, and career aspirations independent of technical performance pressures.

What pastoral management meant in practice:

Regular 1-2-1 meetings to understand each team member’s needs, career goals, and personal circumstances
Collaboratively built Personal Development Plans aligned with individual aspirations and business needs
Coached team members through challenges—for example, helping someone stuck on a technical problem develop strategies and personal boundaries for when to seek help, rather than staying stuck in frustration
Provided a neutral space for workplace concerns and helped navigate organisational dynamics

To support the role properly, I also trained and qualified as a Mental Health First Aider (MHFA England, 2022) — giving me the tools to recognise and respond to wellbeing issues, not just manage tasks.

Crisis Leadership:

In late 2022, UKCloud entered administration. During this period, I shifted focus to protecting my team. I worked to find all four undergraduate staff members new positions with other companies—ensuring they had career continuity despite the organisation’s difficulties.

Putting people first during uncertainty is one of the things I’m proudest of from this period.

Agile Leadership & Project Delivery
#

DevOps Management Dashboard (Scrum Master):

Led a team of 4 engineers to deliver a unified monitoring and tooling dashboard
Aggregated APIs from multiple tools (OpsView, ticketing, infrastructure management) into a single view
Implemented SSO authentication—users authenticate once, seamless access to all backend tooling
Delivered using two-weekly sprints, managing the bridge between team and product owner throughout

This was my first experience leading a technical project from a purely leadership perspective—not contributing code, focused on team coordination, sprint delivery, and unblocking obstacles.

The key insight it crystallised: technical leadership ≠ technical contribution. The value I added was through orchestration, communication, and alignment. This distinction became foundational for how I approached leadership going forward.

Other Project Leadership:

Agile Lead on CI/CD Infrastructure project (engineer and lead role)
Agile Lead on Core Management Platform Portal
Engineer on Runbook Automation framework

Formal Leadership Training
#

In 2022, I undertook the CMI (Chartered Management Institute) People People Leadership Programme — a formal apprenticeship in people management, supported by UKCloud as part of my career development.

When UKCloud entered administration, I chose to continue the programme independently. I completed it with a Distinction in October 2023 (while at my next role at SiXworks). The practical experience from pastoral management, combined with the formal training, was a direct contributor to my next opportunity—where I was given the chance to build and lead a team from scratch.

Key Projects
#

Patching as a Service
#

A team-run service delivering reliable, compliant patch management across customer Linux and Windows infrastructure, with full reporting for compliance and audit. I was a supporting engineer on it rather than its owner — contributing chiefly on the Linux side, where patching was automated with Ansible for both deployment and reporting (Windows used Ivanti Security Controls).

The service ran with a flawless operational record: zero customer system outages from patching throughout its life.

Portal Deployment Automation
#

The Challenge: Deploy customer-facing portal reliably and repeatedly across Dev, Assured, and Elevated environments.

The Approach: Built Ansible-based CI/CD pipeline enabling repeatable, automated deployments with consistent configuration across all environments.

The Outcome: Reduced deployment time and error rate; enabled faster feature delivery while maintaining parity across environments.

Platform Migration
#

The Challenge: Migrate supported services from legacy infrastructure to new platform without service disruption.

The Approach: Planned and executed migration using infrastructure-as-code practices, with availability as the primary constraint.

The Outcome: Successful transition of critical systems—availability maintained throughout.

TDD Framework Integration
#

The Challenge: Improve code quality and reliability in existing Ansible roles.

The Approach: Integrated Molecule test-driven development framework into existing codebase, enabling automated testing of infrastructure changes before deployment.

The Outcome: Elevated team confidence in infrastructure changes; reduced regression risk.

Mail Relay Infrastructure
#

Designed and deployed replacement for outbound mail relay infrastructure, ensuring reliable and compliant mail delivery across government services.

DevOps Management Dashboard
#

As Scrum Master, led delivery of unified monitoring dashboard aggregating multiple tools with SSO authentication. First major leadership experience from a non-technical perspective.

Technology Stack
#

Automation & Infrastructure as Code
#

Ansible — Primary automation tool for all infrastructure deployment and management
Terraform — Infrastructure-as-code for cloud resources
Rundeck — Automation delivery platform; used to orchestrate and deliver automation jobs at scale
Git & CI/CD pipelines — Version control and continuous deployment
Molecule — Test-driven development framework for Ansible roles

CMDB
#

Device42 — Configuration Management Database; significant portion of automation work built on or around data consumed from this platform

Virtualisation & Containerisation
#

VMware (vSphere, vCloud Director, ESX) — Primary virtualisation platform
Docker & Docker Swarm — Container runtimes and orchestration
Cluster deployments — HAProxy, KeepAliveD for high availability and failover

Infrastructure & Operating Systems
#

Linux (RHEL) — Primary operating system
Windows Server — Secondary platform support
Highly-available architecture — Cluster design, failover, redundancy

Monitoring & Observability
#

OpsView (Nagios-based) — Primary monitoring and alerting platform
Science Logic — Infrastructure monitoring
Custom dashboards — Aggregated tool visibility

ITSM, Change Management & Compliance
#

ITIL procedures — Formal change control and governance for all infrastructure changes
ISO9001 — Quality Management
ISO20000 — IT Service Management
ISO27001 — Information Security Management
ISO27018:2014 — Personal Data Protection in Cloud (UKCloud was first to achieve LRQA verification)
Jira — Ticketing and project tracking
Confluence — Documentation and knowledge management
Ivanti Service Manager — Customer-facing ITSM
Ivanti Security Controls — Windows patch management (Patching as a Service)

Methodologies & Practices
#

Agile/Scrum — Sprint-based delivery cadence (2-weekly)
Test-Driven Development — Quality-focused infrastructure changes (Molecule)
Infrastructure as Code — Reproducible, version-controlled infrastructure
ITIL change management — Formal change control procedures
Incident ownership — End-to-end accountability from detection through to resolution

What I Learned
#

Sovereign Cloud & Secure Infrastructure
#

Working across UKCloud’s environment tiers — from semi-connected Assured through to fully air-gapped UKCloudX — gave me a breadth of experience few cloud engineers get near. Each environment demanded a different approach to deployment, tooling, and security posture:

Standard cloud assumes connectivity; air-gapped environments require every tool, patch, and data source to be physically controlled
Compliance isn’t an audit exercise—it’s a design constraint that shapes every architectural decision
Building reliable systems in restricted environments requires more rigour, not less, in automation practices

It’s uncommon experience, and increasingly relevant: sovereign cloud is a growing priority for governments worldwide, and the ability to operate and automate infrastructure at this security tier is in real demand.

Availability-First Thinking
#

In environments where outages impact government and defence operations, availability is non-negotiable. The 99.90% platform SLA wasn’t a target—it was the floor. This shaped how I approach infrastructure: redundancy by default, failure modes anticipated, and changes enacted under formal control.

End-to-End Incident Ownership
#

UKCloud’s culture was one of owning problems through to conclusion. You took the alert, you investigated, you resolved—regardless of complexity or time of day. No hand-offs mid-incident. This discipline of seeing things through to resolution is something I carry forward to any technical role.

Leadership Is Additive
#

Taking on pastoral management and Scrum Master responsibilities while continuing technical work taught me that leadership is additive to technical capability, not a replacement for it. The skills developed in parallel—managing people, driving project delivery, navigating organisational dynamics—made me a more effective engineer, not a less technical one.

People-First in Crisis
#

When UKCloud entered administration, the instinct to protect my team was immediate and clear. Finding new positions for all four team members during organisational uncertainty wasn’t a task I was asked to do—it was the right thing to do. That instinct, and the ability to act on it, is something I value in leaders and try to embody myself.

Foundation for What Came Next
#

These three years directly enabled my transition to SiXworks, where I was given the opportunity to build and lead a technical team from scratch. The combination of deep infrastructure expertise in regulated environments and emerging people leadership skills was the foundation for taking on that responsibility.

Why UKCloud Matters
#

UKCloud was a pivotal period of professional growth. It was here I:

Consolidated infrastructure automation expertise in high-stakes, compliance-heavy government environments
Gained rare experience operating across air-gapped and classified defence systems (UKCloudX)
Took on first formal people management responsibilities
Learned to balance technical depth with people development—in parallel, not in sequence
Understood what it means to lead during organisational crisis
Built the confidence and skills to take on greater leadership responsibility

The company’s mission—sovereign cloud for the UK public sector, defence, and regulated industries—meant every technology decision carried genuine weight. That sharpened how I think about infrastructure: security, compliance, and availability as first-class concerns. It’s a perspective I carry into every role that follows.

The Role#

UKCloud: Context & Mission#

UKCloud’s Environment Architecture#

The Technical Challenge#

Phase 1: Infrastructure Automation Specialist (Jan 2020 – Jul 2022)#

Core Responsibilities#

Phase 2: Leadership Expansion (Jul 2022 – Feb 2023)#

Pastoral Management#

Agile Leadership & Project Delivery#

Formal Leadership Training#

Key Projects#

Patching as a Service#

Portal Deployment Automation#

Platform Migration#

TDD Framework Integration#

Mail Relay Infrastructure#

DevOps Management Dashboard#

Technology Stack#

Automation & Infrastructure as Code#

CMDB#

Virtualisation & Containerisation#

Infrastructure & Operating Systems#

Monitoring & Observability#

ITSM, Change Management & Compliance#

Methodologies & Practices#

What I Learned#

Sovereign Cloud & Secure Infrastructure#

Availability-First Thinking#

End-to-End Incident Ownership#

Leadership Is Additive#

People-First in Crisis#

Foundation for What Came Next#

Why UKCloud Matters#