The retail industry has suffered mightily since the COVID-19 pandemic began, and UK-based John Lewis & Partners is no exception. But behind the scenes, the company, which operates 36 high-end department stores, was going full steam ahead with its digital transformation.
Platform operations manager Simon Skelton was transitioning from a role looking after in-shop systems to online systems when the pandemic began in March 2020. "Suddenly, I was having to work on closing down our shops, which we had never done before other than temporarily," Skelton said.
For the company, that meant stopping the replenishment of stores and managing the furlough or redeployment of employees, who are referred to as partners—the John Lewis Partnership is the largest employee-owned business in the UK (PDF).
For Skelton, who is responsible for the digital and legacy platform teams and the end-to-end operation of the corporate website, the pandemic meant doubling down on digital transformation efforts, which started at the end of 2017. Here's how they built on that foundation, transitioning from an on-premise Web commerce system to a microservices-based platform-as-a-service architecture running on the Google Cloud Platform. The result: Improved site search capability and increased release velocity.
Cloud or no cloud?
As its transformation got underway, John Lewis Partnership was trying to address several business challenges, but the primary objective was to become more agile from a business point of view, Skelton said. For its e-commerce system, the company was running a Web commerce platform at the time that he described as "a large, monolithic platform born in the late '90s."
IT was mulling whether to upgrade the Web commerce platform, would have cost several million pounds and about two years of work. Another option was to migrate from an on-premises system hosted in its own data centers to Web commerce platform vendor's cloud. Unfortunately, "the level of complexity of a business our size didn't fit the target market the vendor was going for with its cloud commerce solution," Skelton said. It just wasn't the right fit for a large, complex retail business.
They could have stayed on the Web commerce platform, but the team didn't want to upgrade to simply stay in support. "We wanted to be more agile, and be able to test and learn—rather than having a long cycle of reviewing requirements and then testing and deploying in large, batched releases," Skelton said.
He also knew that the risk of security incidents was far higher when making multiple changes to the system that might take days or weeks to fix.
Another issue was that only a certain number of teams could work on the e-commerce package at a time without "tripping over each other." His team's primary goal was to remove the constraints posed by a monolithic system and build out multiple agile DevOps teams.
So instead of upgrading on-premises or moving to the Web commerce vendor's cloud app, in 2018 IT developed a microservices strategy with help from third-party partners. "Bit by bit, we're moving away from that platform and building upon a platform as a service (PaaS) based on Google Cloud Platform," he said.
Starting off with search
The team kicked off the new microservices model by building a team tasked with replacing the old off-the-shelf search product. "We wanted to improve the search results and the flexibility of our navigation," he said.
The team was empowered to release and deploy under their own control. "They ran the service as well as built it—rather than expecting a separate ops team to do that."
But initially, not everyone was on board. There were "lots of conversations with developers who didn't want to be on a call, and that was a challenge," he said. They needed to be convinced that the "you build it, you run it" principle gave them more control over speed of delivery, but only if they fully took on the live support, too, he said.
Previously, if something broke, the developers "didn’t share the pain," and therefore they didn't make it a priority to remove the reason the app failed in the first place.
When you face getting a support call in the middle of the night, "you’re going to put in as much effort as possible to make sure there are no issues," Skelton said. One of the benefits of a cloud platform is that it's much easier to have a development environment that is the same as the live website, which means the issues surface much earlier, he said.
In 2020, about 25 product teams made over 5,000 incremental changes to the website. By contrast, in 2017 six product teams issued about 10 big releases a year.
"Basically, we're doing smaller changes but much more frequently," he said. Splitting an app into microservices "allows each independent team to make changes in their area and greatly reduces the blast radius if any issues are introduced by one service."
So far, multiple services have been moved off the old Web commerce platform and migrated onto the Google Cloud Platform; this transition is targeted for completion later in 2022.
Changing hearts and minds
But going from 10 releases annually to 5,000 created some change management challenges, Skelton said. It meant assessing and managing risk in a different way and, if an incident occurred, being able to see what changed and then recover quickly, he said.
Along the way, IT implemented a new change management approach, which skips the every-two-week change advisory board (CAB) process, he said. Previously, Skelton's centralized online operations team would review and approve every deployment, which then also had to be approved by the CAB, but that would have been too unwieldy with 5,000 changes coming from the product teams.
"It just isn't practical, and they don't have much value anymore," he said. "So over time, we've devolved that to the individual product teams, because they're the best ones to be able to assess and manage risk" for the changes they make.
These days, a change is manually recorded in the company's centralized enterprise system as a "standard change," which is self-approved by the product team, and if there is an issue, there is visibility and the change can be rolled back if necessary.
"We are currently in the process of automating the creation of changes—for every deployment, making service management agile. That's where we're at on the journey," Skelton said.
Challenges with the new techniques
The agile approach hasn’t been without its issues. "It’s been a negotiation," he said. "We had pushback from the change team because they didn't understand the DevOps way of working.”
Skelton and other IT leaders have been working closely with the teams to explain that this approach means there will be fewer major issues "because there is a continuous testing environment and deployments are all automated," he said.
"One of the biggest issues before was human error with manual testing, and when you automate testing and deployment, you get a far more predictable outcome. We had to educate them on why manual risk assessment no longer added value."
A number of leading indicators have been put in place to ensure that each team is taking the proper measures to prevent performance issues or incidents from occurring.
For example, it's important for teams to think about resilience. "If another service they are dependent on slows down, what would happen to their service?" Skelton said. The company created service operability assessments to prompt each team to think about key aspects of operability, such as service ownership, resilience, performance, scaling, security, monitoring, support model, incident response, and other facets.
It's a self-assessment set of questions and when they answer those they can say, "We thought about that," he said.
Getting a better night's sleep
In addition to the increase in agile releases, website availability has improved to 99.9%, Skelton said. And the average time to repair a service has improved from hours to minutes.
Among the challenges that remain are removing "the safety net" of the managed services team, which has 24/7 eyes on the website, and moving toward more automated alerting. IT’s implementation of a real-time alerting system, is playing a big role in this, he said.
The company used to have a process where alerts went to a central operations team who would then refer to a run-book and if necessary refer to a support rotation and manually call the person on support. "This process could take tens of minutes and is now automated into just seconds," Skelton said.
But Skelton is now sleeping better. "I’m not being kept up at night monitoring major incidents. I used to be called out every time there was a website issue—and there were too many of those," he said. "I can be confident we have teams on call that can deal with something if there’s an issue."
Want to know more? Attend Skelton's conference session at DevOps Enterprise Summit Europe - Virtual, where he'll talk more about how John Lewis & Partners built in "operability" from the start to ensure success. The conference runs May 18-21, 2021.
Keep learning
Get up to speed on digital transformation with TechBeacon's Guide.
Download this free IDC white paper, "Enabling End-to-End Digital Transformation".
See IDC's Futurescape: Top 10 predictions for digital transformation.
How important is digital transformation to your org? Take our survey and find out how you stand next to the competition.
Thinking of making a change? TechBeacon's Careers Topic Center provides expert advice to prepare you your next career move.