Profile Image

Trevor Reusch

Sr Site Reliability Engineer

Projects

Enterprise Services Platform Azure migration – LexisNexis Risk

Moved LexisNexis Risk’s front door and middleware web servers from on-prem to Azure Cloud. ~50 applications, over 1000 web functions, dozens of hostnames. Now hosted on AKS clusters.

Infrastructure: 3 AKS clusters: 1 hot for active, 1 warm for failover, 1 cold for long term maintenance purposes. Completely segregated and independent stacks, from repository, to resource group, to vnet. All clusters cover all 3 Azure zones for redundancy.

Kubernetes: 4 different nodepools for various workloads: One for ingress controllers and traffic routing, another for realtime customer traffic, 3rd for internal large batch job traffic, and 4th for tooling such as observability and logging.

Monitoring: Cluster is set up with Prometheus for telemetry gathering, AlertManager for alerting, Grafana for monitoring. All logs are additionally shipped to a separately hosted ELK stack, which is loaded with a suite of additional dashboards, monitoring, alerting, and synthetic transactions.

Software: All applications have:

  • Minimum replica count of 2, some higher for burstability.
  • HPA’s configured for CPU scaling, or custom metric “active request” scaling (deprecated)
  • PDB’s to prevent application outage.
  • TopologySpreadConstraints to spread evenly cross azure zones.

Ansible rollout – Majesco

Over 1500 VMs hosted in Azure.

Inventory: Ansible comes with an inventory plugin for azure, but the documentation isn’t the finest, and a single YML file can only be used to access a single subscription.

To create and iterate inventory definitions for all 27ish subscriptions, had to establish a template. Use a little python script to get subscription list from “az account”, iterate that list, and create a copy of that template for each subscription in a separate folder. Additionally, do some other small things for handling vars directories. This auto-generated directory can then be used as an inventory source.

Setup: 1 single playbook to create (or rotate) accounts on Windows and Linux, and configure WinRM. This was done using Azure CLI (az vm runcommand). This playbook would generate a unique asymmetric key for each VM, provision/rotate the user on that VM, and save the credential to azure keyvault.

Connecting: Ansible nodes direct connect to VMs over WinRM/SSH (windows/linux). This is done using customized versions of the winrm and ssh ansible connection plugins, which take custom variables from inventory, populated via azure_keyvault_secret lookup plugin, and save those keys to temp files which can then immediately be used to connect to those VMs.

Result: Ansible can connect to VMs using individual service account keys, downloaded from azure keyvault at playbook runtime. We can rotate these credentials with the run of a single playbook, and they will be immediately and automatically available for use by other playbooks.

Ansible rollout – Utilant

About ~50 on-prem, well maintained Windows servers spread across 4 different VPN connected domains.

Inventory: Pulled via python script, which consolidates and groups servers from Active Directory in 4 different domains. Each domain has its own service account, defined using group_vars, passwords encrypted via ansible-vault.

Connecting: Direct connect over WinRM using user/pass.

Site copy scripting

Problem:

Configuring and managing several hundred (and growing) sites was tedious and error prone.

Solution:

Leveraged Ansible playbooks to fully automate site setup, cloning, and database “restores” between environments. Covers: webroot setup, data directory setup, SQL database setup, SSRS setup, user creation, group creation, permissioning of users/groups on all the above, site configuration in database and config files, adding DNS records for local and WAN, checking if site online.

Had to account for a lot of different use cases (saving settings? Which settings? Are source and destination domains the same? Are we copying just web, or all the files? Are we only restoring the database? What backup servers can be used for SQL, what fileservers can be used to transfer web files? Any shortcuts that can be used? (same multitenant server, same domain, etc))

Result:

No longer need to do any site setup/copying/database operations by hand. Fully automated, and human error has been removed from a lengthy manual process.

Liveswitch on Kubernetes

Liveswitch video streaming server – although advertised as for Kubernetes on their sales page, they don’t have documentation or support for it. Figured out and deployed on 3 AKS clusters.

Internal Site

Problem:

When I arrived, we had several dozen web/sqlservers and over 100 sites. The only singular record of these was a hand-maintained excel sheet in teams. This resulted in the IT department getting bombarded with questions such as “what is the url for this customer’s site”. Also, excel in teams does not make for a good database to automate things off of. Human error made this data incomplete/incorrect.

Solution:

I started by creating a database to hold this information – IIS sites, URLs, the Databases they’re correlated to. I then created powershell scripts to crawl active directory for a current list of webservers, then get a list of IIS sites on those webservers, then read the database configuration for the sites on those webservers, then read a couple configuration attributes out of those databases. Save all that to centralized tables.

Have script run every morning. We now have a database containing current information regarding all the SAAS sites we host.

Next: Build an ASP.NET website to frontend this data for users. Complete with column sorting, selection, filtering, and a handy search box.

Result:

We now had accurate, auto-updating inventory of all sites, presentable internally to our company. And I had good data to base further automation projects on.

Fully automated site deployments

Problem:

We had to login after hours to deploy our SAAS product updates.

Solution:

Built a system for fully automating deployments using inventory (above), task scheduler, GMSA’s, build repository, our monitoring system API, emailing, and a bunch of powershell to tie them all together.

Result:

Can now run a single powershell command to schedule a deployment for our SAAS sites, using only 3 parameters. (build, sitename, datetime).

Future:

Add to Internal site, so release manager can schedule these without any ITS hands-on.