Enterprise operations, or ops, affects the health of security and governance, as well as business systems’ ability to serve, well, the business. But as systems are built and deployed, ops are often an afterthought and slid into place toward the middle or end of the build.
It doesn’t matter if you’re talking about CloudOps, DataOps, SecOps, GovOps, or other operational processes: "Afterthought Ops" are a pervasive problem. As a result, most organizations need to reform, redo, or recast their ops to become more efficient and effective.
Here are 12 foolproof steps to reengineer your enterprise IT operations.
1. Assess the current ‘as-is’ state of all cloud and non-cloud systems
Where are you now, and where do you need to go? An exact understanding of your ops status is the key to making effective improvements. This includes traditional and cloud-based systems.
A few things to look for include:
- The current state of the data, including distribution and heterogeneity
- Databases leveraged
- Hardware leveraged, including CPU and memory configurations
- Platforms under management (e.g., Linux)
- Edge-based systems under management
2. Assess current ops skill sets
Do a skill set inventory to identify the current mix among existing ops staffers. List their skills in specific categories such as monitoring tools, databases, security systems, and other systems currently in place.
Keep in mind that these categories are not just about operational understanding, but the ability to fix specific things that go wrong at the deep, native levels of most of the technology run by operations. If staff must call a DBA or a developer for every issue, then that ops team just won’t scale.
3. Assess current ops processes and playbooks
Playbooks are simple guides that illustrate how things are monitored, managed, and fixed. They become a master set of guides that provide consistent approaches to address ops, which standardizes processes and approaches across the ops teams.
You want consistent procedures that make sense and address all that needs to be addressed. For example, if an ops process suggests that a server or network device needs to be reset, you’ll need to ensure that any data in flight on those systems will be durable during a reset, whether remote, on the cloud, or local.
4. Assess the current effectiveness of ops and services to the business users
Ops playbooks often lack guidance on end-user monitoring. The very idea that we can monitor how end users interact with applications and databases only emerged over the last several years. Some of the things that end-user monitoring should include are:
- Performance monitoring, or the time between a user requesting a behavior and the behavior occurring
- Time for a result set to return to the user after a request
- The number of user errors that occur, which could mean that the data checking needs to be improved
5. Form a vision for ops that would best serve the business or a changing market
What’s the vision for your final optimized state? You’ll start its definition in this step. Your vision can be many things, but you need answers to these core questions:
- What are the expectations for auto-recovery and self-healing? Instantaneous? Within a few hours? A few days? Each has a different level of costs.
- What are the expectations for end-user performance monitoring and improvement?
- What are the expectations for monitoring and management of security?
- What are the expectations for the level of automation?
The idea is to come to an agreement on expectations. Be sure everyone realizes that the higher the expectations, the higher the price of ops tools and ops people.
6. Define effective leadership, both executive and line management
Leadership is critical to the success of an ops team. Good leaders set expectations and meet goals that produce high-quality operations, processes, and skill levels.
In this step, define what type of leadership is needed, considering both line managers who are tactically focused and more strategically focused executives. Each layer of leadership will have different missions and skills. For example, the executive will be an expert on budgets and how to obtain resources, while the line leader will focus on day-to-day ops and long-term performance.
7. Define the role and use of DevOps, including forward-looking vision
Ops teams are, as the name implies, typically a part of DevOps teams. They work closely with the developers, but the old habit of tossing code over the wall to the ops teams is coming to an end.
This step should clarify how dev and ops will work and play well together. This includes processes and tools for collaboration, setting expectations for how traditional ops works with the emerging DevOps tool chains, and, finally, special training and skills needed to make this collaboration work.
8. Define the gaps between the as-is state of the systems, skill sets, and processes
This is the first step in defining a future, or, “to-be” state, based upon the work completed in the previous steps. What gets defined largely depends upon the understanding we have of the as-is state, which should be detailed by the time you reach this step.
Results needed include:
- Target playbooks, including high-level and low-level ops processes
- Target ops technology functions, such as self-healing, performance management, ops APIs, and tool integration
- Target skills that include an understanding of the gap between the as-is and to-be states
- Ops model for the ops team; how they will be structured to be successful?
Once you have defined the gaps, you need to explain what needs to change, why, and when.
9. Define your ops technology stack
This step is often done first. But without the understanding gathered in the previous steps, we have no true idea of which ops tool sets will prove effective.
Today’s wish list of ops technology typically includes:
- Performance management
- AIOps
- Security operations (SecOps)
- Governance operations (GovOps)
- DataOps
- Cloud operations (CloudOps)
10. Define a testing and proof-of-concept approach for new tools and tool improvement
Everything defined in the previous step needs to be tested for functionality, which should be in sync with requirements, both business and technical. There is also compatibility testing with existing applications, networks, and platforms.
The trick is to ensure that there are no misunderstood issues that can become showstoppers. An example would be an ops tool that doesn’t work with a specific database. Moreover, there should be processes put in place to continuously refresh the tooling. Establish a culture where everything, including processes and tools, can be questioned and thus improved.
11. Define metrics to determine ongoing success and failure of your ops
Put together models to define how you will measure the effectiveness of ops. While many believe this comes down to a letter grade, the reality is that the metrics you’ll gather over time need to be more fine-grained and telling. Include items such as:
- Number of outages
- Duration of outages
- End-user productivity
- Performance and tuning
- Cost per system under operations
12. Define a process for continuous improvement of all aspects of ops
This last step is the most difficult. You must define the culture, processes, organizational structures, tooling, skills, etc., that end up being your final ops improvements, which should take your ops to the next level. Continuous improvement should include collaboration, people, processes, and technology.
The things that matter most include:
- How the ops teams, as well as other teams (e.g., dev), will collaborate. While this often means just tossing a ChatOps tool into the process, this is more about developing a culture of open communications.
- A process of continuous improvement. This can exist only if the previous step is done correctly. With continuous improvement, everyone has the ability to question a process, tool, and skill, with the objective to incrementally improve everything.
- The metrics tracking, and how that feedback will get to the ops teams to provide input for improvements. While many ops organizations provide this on a yearly basis, that kind of delay will not help the teams get the feedback needed to improve anything in a timely manner. The metrics should be delivered on a dashboard that all have access to all the time. Never hide data.
Note to self: Culture change required
Nothing is foolproof. However, if you follow these steps, concepts, and ideas, you’ll improve your ops processes, people, and technology. Keep in mind that the highest degree of success involves cultural changes, and that is often the hardest part of this fix.
Keep learning
Choose the right ESM tool for your needs. Get up to speed with the our Buyer's Guide to Enterprise Service Management Tools
What will the next generation of enterprise service management tools look like? TechBeacon's Guide to Optimizing Enterprise Service Management offers the insights.
Discover more about IT Operations Monitoring with TechBeacon's Guide.
What's the best way to get your robotic process automation project off the ground? Find out how to choose the right tools—and the right project.
Ready to advance up the IT career ladder? TechBeacon's Careers Topic Center provides expert advice you need to prepare for your next move.