Jan 16

Written by:

Here’s 5 of our favorite tips we’ve gathered over the years to assist you in getting the most from your Business Transaction Monitoring solution:

Einstein Graffiti Banksy

  1. Start out by identifying your key transactions and Services.  Be selective.  Too much noise in terms of data and alerts can be detrimental.  Focus on actionable information, not just data for the sake of data.
  2. Identify the key metrics associated with the transactions operational characteristics.  For example, does the daily or weekly volumes vary.  How does Service Delivery Performance change in relation to the time of day or week or month?  Are usage spikes normal due to an announcement or external event?
  3. Start small, start simple.  A big bang approach will likely fail as it will be seen as too disruptive, or providing too much data.  Similarly, a complex initial approach may be seen as too disruptive.
  4. Most importantly, design an action plan for each alert you may receive.
  5. Finally, decide what the key success factors are in advance, and regularly measure your BTM usage against the success factors.

If you’ve any tips or tricks you’d like to share, please leave a comment.

Jul 3

Written by:

A partner of ours returned from a meeting recently with the reaction from the prospect of “I already monitor my systems, why do I need more monitoring?“.  Great question.

I get that a lot. It’s a normal reaction. Usually from the IT Operations Director who has spent considerable sums of money on monitoring to date, and can boast an arsenal including:

  • Hardware monitoring
  • Application availability
  • Network monitoring
  • Website monitoring
  • Transaction monitoring
  • Speciality tools for monitoring Oracle, SAP and the like
  • And perhaps some dashboard aggregators that consolidate information from many separate sources into one single dashboard.

So why would an organization need more monitoring?  Well, the single most compelling reason is to cut down on the number of outages and incidents that impact business performance.  To do this, there’s more to monitoring than just detecting when things go wrong which is what the products in use by most organizations are stuck with.  By then, the damage is done, and something has already gone wrong.

Ideally, monitoring should be smart enough, and powerful enough, to detect a situation that indicates with a high probability that something needs attention before the situation develops into something bigger and more costly.  It’s the equivalent of warning you that your car is about to be towed instead of telling you that your car has just been towed.

My Ferrari (I wish that were true) getting towed!
In other words, rather than coping with IT disasters, what about averting them in the first place?  A system that constantly monitors your key business activities and transactions, with the ability to connect events together in order to detect variances within your business transactions.  Tells you exactly what’s going on in real-time and provides timely warnings.

For example, your current monitoring systems for processing orders might provide the following information:

  1. Database server OK, ping round trip 0.112s
  2. Database OK, 32 transactions per second, average transaction 1.232s
  3. Web Server OK, 42 connections

Whereas a system monitoring business events would instead report:

  1. 14 Orders in progress
  2. Average time to process orders is 6.687 seconds
  3. Alert: 13% of orders processed in last 5 minutes were above 9 seconds.  Current trend is that an order will breach the SLA of 12.5 seconds within 40 minutes.

So rather than overworked IT staff trying to filter millions of seemingly disconnected IT events, most of which report little or nothing by way of business significance, they can instead focus on meaningful business objectives and performance indicators, and can react quicker to events that impact business performance, as well as communicate with non-IT staff using the lingua franca of your business.  And most importantly, if you’re already solely relying on traditional monitoring approaches, then you can expect to significantly further reduce the number of outages and incidents from anywhere between 20% and 80%!

Feb 28

Written by:

I was recently giving a presentation to a rather large utility provider, and was asked the question “But why do I need real-time?

Good question.  And a very difficult question to answer correctly in most cases.  There’s lots of answers that address the requirement – in theory.  Take your pick from the ones I come across most often, and may even be guilty of uttering one or two of these myself:

  • React to an opportunity or threat
  • Become more proactive
  • Prioritize resources
  • Make smarter decisions quicker

But sometimes the right answer lies in asking a question in return.  “Is there anything you could do if you knew something had just occurred?“  This turns the question around to the customer and they always find that there is always something that can be done to improve the situation!

Remember To React 

Within a business context, knowing immediately if there are problems means that you know at least as soon as your customers do.  A well known analyst firm estimates that more than 60% of problems are reported by customers (and in our experience, this number is low.  We generally see numbers approaching 80%)  But the value of detection is completely lost if there isn’t a plan for reacting.

So one definition of real-time is defined or determined by the window of time that exists whereby a detection and reaction have maximum benefit.  So whether you need to react within 5 minutes, or 5 micro-seconds,  each situation has it’s own context and definition of real-time.

Ask Brian

Ask The Brian

But the real value of detection is predicting situations in advance by being able to detect the patterns that indicate a high probability of something about to happen in the future.  This is possible so long as situations exhibit a consistent set of early-warning signs – to the uninitiated it can seem a little bit like High tech fortune telling.  But the results are definitely worth it.  We’ve seen results in our customers showing more than an 80% drop in customer detection rates and incidents tagged as high priority.  It’s where we definitely see the value of Complex Event Processing and Business Activitiy Monitoring intersecting.

Nov 10

Written by:

Twitter provide nice APIs that allow users access to their data. There’s a whole bunch of interesting applications built on top of this data, most of them gimmicks while a few have serious potential. One of the ways of accessing the twitter data is via the Twitter Firehose which is supposed to be the entire stream of “tweets” (tweet is a message on twitter). Unfortunately the firehose is still not open to all users which is a real pity.

Bill de hÓra has a very interesting posting on this topic. There are 2 interesting aspects to Bill’s post:

1. The story behind the firehose and some of the reactions to the fact that Twitter still haven’t given open access to this data.

2. The speculation that the reason for the delay is not so much a business reason but a technical limitation on Twitter’s ability to scale.

If the reason for the delay is that Twitter is protecting the data so they can exploit the value themselves, I wouldn’t be too surprised. To be fair that would be considered the second most valuable thing they have. The most valuable asset is the fact that they have everyone’s attention.

Bill seems to think the issues are technical. Roy Fielding suggests as much when he blogged about Inverse Economies of Scale with PubSub systems. Roy’s solution to the problem of people demanding too much data/events and therefore crippling your service was to charge for the service. That is possibly what Twitter may announce next year.

The technical problems that Roy and Bill discuss are caused by Event and/or Data gluttons. These are users/machines that oversubscribe to events or data services. This is human nature and provides developers with a challenge when building distributed software. Luckily there are a lot of useful patterns and tools available to today’s developers to solve these problems. At WestGlobal we face the challenge of Event Gluttons and we handle it by having a flexible architecture that can be optimised depending on the situation. In general, the approach we take is for a distributed Event Processing Network (EPN) with flexible deployment options. As such, Vantify deployment are layered as follows:

1. The first layer of consumers are our Event Processors who sit closest to the action. These event gluttons are deployed on a network and want to know everything that is happening in your business. These Event Gluttons process events as they happen using CEP and other techniques. The good news for the network is that you don’t need too many of these event processors so over subscription isn’t a problem. They can be distributed almost anywhere on your network and if there is a high volume of data then we can limit the chatter by deploying the processors closer to the events or splitting up the event streams. Although these are very greedy consumers we can satisfy their appetite.

2. The other place where we need to handle event gluttons is at the top layer in our product where real users want to view reports and dashboards about their business. Here we do need to worry about supporting a large number of users. However most of these users are using a browser and polling our data feeds which will scale very well. There will be a limited number of users who want subscriptions in order to be notified. This is where we need to worry about scale but this is typically a small number.

So I think the world is big enough to accommodate Event Gluttons. Thanks to people like Roy and Bill, developers have a lot of tools and techniques available to them to handle scale. The only good reason for restricting data or events should be a business reason. I certainly hope Twitter open up their data and like many others I’ll be watching with interest when they reveal all.

Oct 13

Written by:

“So what do you actually do?”

That dreaded question! I’ve moved job recently and I’m hearing it a lot.

One of the great thing about working at WestGlobal is that now it’s an easier question to answer. I used to work on SOA products. Need I say more?

Like with most questions, the response to “So what do you actually do?” depends on the audience. I’ve been thinking about how I should respond to this question if asked by a previous technical colleague or by my 7 year old son. The interesting thing is that I don’t think my response would vary that much. I think the difference would be in the language and metaphors used.

If the colleague asked I’d obviously try to impress them but I’d probably try something like the following:

  1. Using agents/sensors across your business you gather real-time events as they happen.
  2. You process the events from these sensors and produce new events and data using advanced techniques like CEP.
  3. You present your new events and data in a variety of forms like dashboards, reports and alarms to the customer. They get a detailed picture of what is happening in their business including the relationships between different agents and activities.

If my son asked, I could simplify the language and introduce a metaphor:

  1. Using spies across your business you gather information about what is going on. Your spies send you reports whenever anything happens.
  2. Back at the command centre, you decode the message from all of your spies and try to figure out what it all means.
  3. You’re able to give info to your friends to help them save the day. They know what is happening before anybody else so can react quickly.

Agent Vantify Issue 1

I think I prefer my job through the eyes of my son!

Tune in next week to find out if any agents defected.

Sep 10

Written by:

Many vendors use variations of the concept of enabling an enterprise to “Align IT with the Business”. For example, products to help to manage IT from the perspective of the Business and to do more of what drives the business and less of what doesn’t. Or to view your IT as an engine for business value. These are valuable perspectives, and Business Service Management (BSM) is being promoted as the answer. Here at WestGlobal, we believe that BSM is only half the picture.

Sinage with messages

Let’s ask an important question:

Q. What does the business want from aligning IT with the business?

A. The business wants a clear and simple solution that monitors how well IT services are being delivered to support business activities and transactions. The business also wants quantitative and qualitative data in order to understand how well individual services are performing, and would like the IT department to prioritize their Operational activity to maximize business activities and minimize negative business impacts.

In order to deliver this vision, there are two different aspects that need to be addressed.

The first part concerns IT resources. Servers, networks, routers, websites – all of the technology and resources and tools that are used to deliver services. Monitoring solutions are required to check the health and availability of these components. Enterprise monitoring tools are vital in this regard, and they’re readily available and do a good job.

The second part concerns Business Activities. Sales orders, shipping, payments – all of the vital business transactions and processes that rely on IT infrastructure that are the life blood of any business.

Traditionally, enterprises are very good at addressing the first part – it is well understood and products are available. On the other hand, very few properly address the second. Without the second part, an enterprise will not be able to align the business and IT departments. Instead of measuring how well sales orders are being processed, the IT department only has lower level tools to measure server uptime or CPU load. Reporting a monthly statistic that the web servers were available within their SLA of 98% does nothing to assure the business that all orders were captured and that every customer had a satisfactory experience. It’s why enterprises that only address the first part still rely on their customers to report problems first.

Addressing the second part means adopting a different approach to gathering data for measuring service delivery. Event processing is an ideal underlying technology to extract relevant and meaningful data from the thousands of events that occur every hour in the enterprise. In terms of Business Activity Monitoring, an event is simply the fact that a process or transaction or activity has progressed. For example, an event may signify that a customer has logged in. A subsequent event may signify that a customer has queried stock availability or placed an item in a basket, and so on until the individual transaction has completed. Because most business activities can be broken up into a start and end, with varying numbers of units of work in between, figuring out the significance of each event is straightforward. By measuring how long it takes for each unit of work, and by tracking events that relate to different activities, the IT department can report to the business in terms that are meaningful.

Enterprise Monitoring Systems with Business Service Management (BSM) do a great job with the first part. Business Activity Monitoring (BAM) that is capable of monitoring Service experience and Customer experience does a great job on the second part, and together enables IT and Business alignment.


Aug 14

Written by:

The evolution of enterprise monitoring has evolved greatly in recent years by focusing on the service being delivered to the Business rather than the health of the underlying IT infrastructure. We have ITIL and ISO20000 frameworks and certification to learn how to align the services to Business needs, and we also have CMDB and BSM solutions to help organize and manage our IT resources. And while these products have improved how IT delivers services and prioritizes resources, it’s only half the picture and severely limits the ability of IT Operations to detect and react to threats. So while IT has better tools to organize management of IT infrastructure resources, IT Operations is still a stressful place where most problems are still reported by users.

Chill Pill

An analogy that’s often used is monitoring the human body. If you monitor the Key Performance Indicators (KPIs) of the human body, you probably end up a list such as heart rate, respiration, and temperature. You may even try to develop a more holistic approach by not only monitoring each KPI, but also monitoring the relationship between each KPI. Therefore if you see an increase in the heart rate, you also expect to see a corresponding increase in body temperature and respiration. And if you see such an increase, you might infer that the body is exercising – perhaps riding a bicycle.

To me, this represents a problem. Sure, the body may be biking, but is it going in the right direction?

Wrong Way

Applying this same technique to IT system management is inaccurate and results in many business problems remaining undetected. It is just not possible to qualitatively monitor business activities from infrastructure data.

But a complete 360° view is possible. By monitoring the myriad interactions and units of work that make up the business transactions and activities, the IT organization now provides a business context for the work that the infrastructure is carrying out. We use Service Activity Monitoring (SAM) to qualitatively monitor the units of work, and Business Activity Monitoring (BAM) to qualitatively monitor the desired Business Activity or transaction. By looking qualitatively at Service Delivery from Top Down as well as Bottom Up, an IT organization can control all aspects of Service Delivery with all the benefits such as lower costs, higher revenues, and happier customers.

Note: In my previous musings, Doug McClure kindly pointed me to his excellent blog, and mentioned that he hadn’t heard of “Service Activity Monitoring” (SAM) before. It’s a term I used after I first came across it from a presentation given by CITT online describing an architecture overview in Deutsche Post. Essentially, SAM sits between BAM and infrastructure monitoring and monitors how services from applications are being delivered as units of work within a business process.

Jul 28

Written by:

Occasionally I get asked a simple question by IT Operations managers, “Why do I need another monitoring tool? I’m already monitoring all my IT and network technology – what else could I need?”. And then in the next meeting an Executive will ask me “Why are we still only discovering incidents when the customer calls in a problem. Don’t we monitor this stuff ?”

Executives naturally have a world-view oriented around measuring and improving business targets such as customer satisfaction, churn, volume of new customers…etc. They’re generally not interested in megabits per second, memory leaks, or whether the CPU is working at 50% or 90%. Sometimes I hear amusing anecdotes – for example the reaction of a CEO being told that Customer Sat was down due to high loading on the Mediation server CPU.

IT Operations on the other hand live and breathe CPU Utilization, load-balancing, bandwidth, megabits per second and other dark arts. If the servers are up and the applications are responding, then there is often an implied conclusion that all is good in the world.

There is a real language barrier in most organizations between IT and Business departments, and all too often this results in real execution problems that affect customers and revenues.

A coherent monitoring strategy and implementation will play a critical role in building a bridge between these two valid but orthogonal viewpoints. Specifically the ability to monitor Business Activity in terms of key indicators (e.g. data connection set-up time, number porting delay, online ordering, automated fulfilment) extends the view of IT operations to provide assurance that technology is delivering Business Performance targets and not only technical metrics such as those described above.

Business Activity Monitoring (BAM) provides executives with the ability to access real-time business performance metrics. Service Activity Monitoring (SAM) is the IT department equivalent and provides Operations staff with the ability to access real-time service delivery performance metrics, and to associate the service with underlying infrastructure as well as the corresponding business process, transaction, and customer.

In other words, by using products that combine BAM and SAM capabilities, both Business and IT executives have a common viewpoint and shared language. The beginning of the end for “Lost in Translation” costly situations.

« Previous Entries