Feb 28

Written by:

I was recently giving a presentation to a rather large utility provider, and was asked the question “But why do I need real-time?

Good question.  And a very difficult question to answer correctly in most cases.  There’s lots of answers that address the requirement – in theory.  Take your pick from the ones I come across most often, and may even be guilty of uttering one or two of these myself:

  • React to an opportunity or threat
  • Become more proactive
  • Prioritize resources
  • Make smarter decisions quicker

But sometimes the right answer lies in asking a question in return.  “Is there anything you could do if you knew something had just occurred?“  This turns the question around to the customer and they always find that there is always something that can be done to improve the situation!

Remember To React 

Within a business context, knowing immediately if there are problems means that you know at least as soon as your customers do.  A well known analyst firm estimates that more than 60% of problems are reported by customers (and in our experience, this number is low.  We generally see numbers approaching 80%)  But the value of detection is completely lost if there isn’t a plan for reacting.

So one definition of real-time is defined or determined by the window of time that exists whereby a detection and reaction have maximum benefit.  So whether you need to react within 5 minutes, or 5 micro-seconds,  each situation has it’s own context and definition of real-time.

Ask Brian

Ask The Brian

But the real value of detection is predicting situations in advance by being able to detect the patterns that indicate a high probability of something about to happen in the future.  This is possible so long as situations exhibit a consistent set of early-warning signs – to the uninitiated it can seem a little bit like High tech fortune telling.  But the results are definitely worth it.  We’ve seen results in our customers showing more than an 80% drop in customer detection rates and incidents tagged as high priority.  It’s where we definitely see the value of Complex Event Processing and Business Activitiy Monitoring intersecting.

Jan 31

Written by:

Sometimes (and most especially it seems in Europe), bad weather can bring a country to a standstill.  Not long ago, some snow fell in Madrid (pretty rare) and resulted in what appeared to be a national emergency including shutting down the airport for a small period.  The following is an email written by one of our guys trapped in Madrid airport.  I post this because it is amusing and well written, and because there is a very tenuous link between event causality and weather prediction.  And context.  This photo from the day will cause several people to question the definition of “Heavy Snow”.  So thanks Nigel!

Heavy Snow at Madrid Airport caused Mayhem

Heavy Snow at Madrid Airport caused Mayhem

As I enter my 28th hour of captivity I am forming some sort of Stockholm Syndrome affinity with my kidnapper, Madrid Airport. Its really not so bad, there is food (small dried toasts onto which you can drop olive oil for a treat) and even showers (where what looks really like a fan switch in fact called the emergency guy).

Yesterday saw the entire airport shutdown for around 10 hours due to 100 or so snow flakes. When it opened I was lucky enough to get transfered onto the late flight (which ironically was the one I was originally booked onto before changing to the earlier one). So our late flight became later and later – every 15 minutes the board showed it slipping by another 15 minutes. Now the know-it-alls like me know Munich has a very strict midnight curfew for landing. German rules dictate that even if you are plunging from 30000 feet due to engine failure, you will need written permission from the mayor before you are allowed to penetrate the runway like a flying dart. In short, we knew there was no chance of the flight leaving so even though the plane was fueled up and waiting, it was all in vain. Quite pleased I did not get on as I had already spent 3 hours on the morning plane at the gate before being hoofed off due to the snow flake.

The night was fun. I made a bed by pushing two small chairs together which formed a comfy oval cave. I now walk like a hunchback and if lucky will straighten up by Monday. I noticed also that small tribes are forming – I was naturally drawn to the alpha males holding BA Gold Cards and minor tribes such as the Silver Card Holders and ´fresh from the swamp¨ occasional flyers are keeping away from us. If they do not bring something to replace the small toasts, we are considering starting to eat some of the Silver card holders.

So the plan today is I am on a wait list for 8.20. In Spanish ´Wait List´ translates to ´not a hope¨  but it will keep you from being a pain in the butt at the service desk for a couple more hours. As this Wait turns inevitably into disapointment, I am also booked onto a 16.20 which is actually confirmed. So only 10 more hours to go before I might get a flight. I am also considering the train which is 28 hours via Paris but it is hard to call – will I get that 16:20 dream flight home or won´t I ?  I think I will make the call if I miss the 16:20 and should then make it home sometime on Monday by train.

Otherwise I am having a pleasant time. Small diversions like using the toilet at the other end of the airport can kill nearly 50 minutes and I am looking forward to breakfast at McDonalds at 07:00. Its also fun watching the people, the airport is full of people who were here all night and many all day yesterday – there are the enraged, the cool, the up-all-night Redbull folks, sadly many of have kids which is just awful.

If you are wondering why we did not all go to a nice hotel – the roads were closed or extremely slow due to the snow-flake and we were told we were unlikely to get to a hotel before 2 or even 3am.

Planes Scuttling for Cover at Madrid Airport

Planes Scuttling for Cover at Madrid Airport

The only thing that really worries me apart from forgetting what my kids look like is my socks. Over nearly 48 hours of walking, I swear I could see them gently glowing in the night. They redefine the word ´funk´ and I feel they should be sent to Jim and Mort as a record of our absolute committment to the cause.

Just of for a glass of wine for breakfast -  of course there is more booze than water in the lounge.

Nov 10

Written by:

Twitter provide nice APIs that allow users access to their data. There’s a whole bunch of interesting applications built on top of this data, most of them gimmicks while a few have serious potential. One of the ways of accessing the twitter data is via the Twitter Firehose which is supposed to be the entire stream of “tweets” (tweet is a message on twitter). Unfortunately the firehose is still not open to all users which is a real pity.

Bill de hÓra has a very interesting posting on this topic. There are 2 interesting aspects to Bill’s post:

1. The story behind the firehose and some of the reactions to the fact that Twitter still haven’t given open access to this data.

2. The speculation that the reason for the delay is not so much a business reason but a technical limitation on Twitter’s ability to scale.

If the reason for the delay is that Twitter is protecting the data so they can exploit the value themselves, I wouldn’t be too surprised. To be fair that would be considered the second most valuable thing they have. The most valuable asset is the fact that they have everyone’s attention.

Bill seems to think the issues are technical. Roy Fielding suggests as much when he blogged about Inverse Economies of Scale with PubSub systems. Roy’s solution to the problem of people demanding too much data/events and therefore crippling your service was to charge for the service. That is possibly what Twitter may announce next year.

The technical problems that Roy and Bill discuss are caused by Event and/or Data gluttons. These are users/machines that oversubscribe to events or data services. This is human nature and provides developers with a challenge when building distributed software. Luckily there are a lot of useful patterns and tools available to today’s developers to solve these problems. At WestGlobal we face the challenge of Event Gluttons and we handle it by having a flexible architecture that can be optimised depending on the situation. In general, the approach we take is for a distributed Event Processing Network (EPN) with flexible deployment options. As such, Vantify deployment are layered as follows:

1. The first layer of consumers are our Event Processors who sit closest to the action. These event gluttons are deployed on a network and want to know everything that is happening in your business. These Event Gluttons process events as they happen using CEP and other techniques. The good news for the network is that you don’t need too many of these event processors so over subscription isn’t a problem. They can be distributed almost anywhere on your network and if there is a high volume of data then we can limit the chatter by deploying the processors closer to the events or splitting up the event streams. Although these are very greedy consumers we can satisfy their appetite.

2. The other place where we need to handle event gluttons is at the top layer in our product where real users want to view reports and dashboards about their business. Here we do need to worry about supporting a large number of users. However most of these users are using a browser and polling our data feeds which will scale very well. There will be a limited number of users who want subscriptions in order to be notified. This is where we need to worry about scale but this is typically a small number.

So I think the world is big enough to accommodate Event Gluttons. Thanks to people like Roy and Bill, developers have a lot of tools and techniques available to them to handle scale. The only good reason for restricting data or events should be a business reason. I certainly hope Twitter open up their data and like many others I’ll be watching with interest when they reveal all.

Oct 31

Written by:

Having read Opher’s excellent blog posting on describing CEP maturity models, I found that while I agreed with Opher’s descriptions of differences between messaging and events, I disagreed with describing phase 3 of CEP as “towards looking at ‘event clouds’ instead of events one-by-one”.

The glossary notes on the definition of an event cloud says that “CEP usually refers to event processing that assumes an event cloud as input, and thereby can make no assumptions about the arrival order of events”. This implies that events “arrive” – just not necessarily in a defined order such as creation time, etc. It also implies that events may arrive one-by-one. It certainly does not preclude one-by-one processing.

The other implication of Opher’s posting is that the cloud may somehow be processed as a whole. Looking at the definition of a cloud, it is made up of many events of differing types where each event may have been created at a different time, and may have a different time-to-live value within the cloud. But in order to make the entire cloud accessible to an event processing agent as a whole, a mechanism must exist that persists the events within the event cloud and manages the cloud events according to their time-to-live values.

(An easy parallel to this view is “ordinary” data processing where sets of persisted data (i.e. events) are made available for queries. Data/Events are stored in tables and keyed by their time-to-live values. Obviously, given a large enough quantity of events, the storage and processing requirement may be considerable.)

But I disagree that this is the only way to define CEP. Indeed it has long been a fiery debate among the CEP community on how, exactly, an event cloud may be practically processed without creating a partially ordered set of events (which may be regarded as a stream of one-by-one events). I would argue that persisting an entire event cloud is fine for ad-hoc processing and analysis, but that the vast majority of CEP involves detection of predefined patterns and is efficiently performed as a form of one-by-one processing.

In Operational Intelligence, applying CEP to the enterprise event cloud is a practical application whereby predefined patterns are detected and acted-upon in real-time. It would be impractical and practically impossible to persist the entire event cloud as the volumes of events are considerable, and the rate of events would require a lot of expensive equipment to provide the required processing power. Yet Vantify Experience Center uses CEP to process an enterprise’s event cloud, and provides real-time intelligence to operational staff to meet challenges and opportunities for maximum business benefit with relatively inexpensive equipment. The events are processed one-at-a-time rather than as a single cloud for efficiency, and the value to customers has been demonstrated many times. To imply that CEP excludes one-by-one processing is inaccurate and wrong – rather it is an important and critical subset of CEP.

Oct 13

Written by:

“So what do you actually do?”

That dreaded question! I’ve moved job recently and I’m hearing it a lot.

One of the great thing about working at WestGlobal is that now it’s an easier question to answer. I used to work on SOA products. Need I say more?

Like with most questions, the response to “So what do you actually do?” depends on the audience. I’ve been thinking about how I should respond to this question if asked by a previous technical colleague or by my 7 year old son. The interesting thing is that I don’t think my response would vary that much. I think the difference would be in the language and metaphors used.

If the colleague asked I’d obviously try to impress them but I’d probably try something like the following:

  1. Using agents/sensors across your business you gather real-time events as they happen.
  2. You process the events from these sensors and produce new events and data using advanced techniques like CEP.
  3. You present your new events and data in a variety of forms like dashboards, reports and alarms to the customer. They get a detailed picture of what is happening in their business including the relationships between different agents and activities.

If my son asked, I could simplify the language and introduce a metaphor:

  1. Using spies across your business you gather information about what is going on. Your spies send you reports whenever anything happens.
  2. Back at the command centre, you decode the message from all of your spies and try to figure out what it all means.
  3. You’re able to give info to your friends to help them save the day. They know what is happening before anybody else so can react quickly.

Agent Vantify Issue 1

I think I prefer my job through the eyes of my son!

Tune in next week to find out if any agents defected.

Sep 10

Written by:

Many vendors use variations of the concept of enabling an enterprise to “Align IT with the Business”. For example, products to help to manage IT from the perspective of the Business and to do more of what drives the business and less of what doesn’t. Or to view your IT as an engine for business value. These are valuable perspectives, and Business Service Management (BSM) is being promoted as the answer. Here at WestGlobal, we believe that BSM is only half the picture.

Sinage with messages

Let’s ask an important question:

Q. What does the business want from aligning IT with the business?

A. The business wants a clear and simple solution that monitors how well IT services are being delivered to support business activities and transactions. The business also wants quantitative and qualitative data in order to understand how well individual services are performing, and would like the IT department to prioritize their Operational activity to maximize business activities and minimize negative business impacts.

In order to deliver this vision, there are two different aspects that need to be addressed.

The first part concerns IT resources. Servers, networks, routers, websites – all of the technology and resources and tools that are used to deliver services. Monitoring solutions are required to check the health and availability of these components. Enterprise monitoring tools are vital in this regard, and they’re readily available and do a good job.

The second part concerns Business Activities. Sales orders, shipping, payments – all of the vital business transactions and processes that rely on IT infrastructure that are the life blood of any business.

Traditionally, enterprises are very good at addressing the first part – it is well understood and products are available. On the other hand, very few properly address the second. Without the second part, an enterprise will not be able to align the business and IT departments. Instead of measuring how well sales orders are being processed, the IT department only has lower level tools to measure server uptime or CPU load. Reporting a monthly statistic that the web servers were available within their SLA of 98% does nothing to assure the business that all orders were captured and that every customer had a satisfactory experience. It’s why enterprises that only address the first part still rely on their customers to report problems first.

Addressing the second part means adopting a different approach to gathering data for measuring service delivery. Event processing is an ideal underlying technology to extract relevant and meaningful data from the thousands of events that occur every hour in the enterprise. In terms of Business Activity Monitoring, an event is simply the fact that a process or transaction or activity has progressed. For example, an event may signify that a customer has logged in. A subsequent event may signify that a customer has queried stock availability or placed an item in a basket, and so on until the individual transaction has completed. Because most business activities can be broken up into a start and end, with varying numbers of units of work in between, figuring out the significance of each event is straightforward. By measuring how long it takes for each unit of work, and by tracking events that relate to different activities, the IT department can report to the business in terms that are meaningful.

Enterprise Monitoring Systems with Business Service Management (BSM) do a great job with the first part. Business Activity Monitoring (BAM) that is capable of monitoring Service experience and Customer experience does a great job on the second part, and together enables IT and Business alignment.


Aug 14

Written by:

The evolution of enterprise monitoring has evolved greatly in recent years by focusing on the service being delivered to the Business rather than the health of the underlying IT infrastructure. We have ITIL and ISO20000 frameworks and certification to learn how to align the services to Business needs, and we also have CMDB and BSM solutions to help organize and manage our IT resources. And while these products have improved how IT delivers services and prioritizes resources, it’s only half the picture and severely limits the ability of IT Operations to detect and react to threats. So while IT has better tools to organize management of IT infrastructure resources, IT Operations is still a stressful place where most problems are still reported by users.

Chill Pill

An analogy that’s often used is monitoring the human body. If you monitor the Key Performance Indicators (KPIs) of the human body, you probably end up a list such as heart rate, respiration, and temperature. You may even try to develop a more holistic approach by not only monitoring each KPI, but also monitoring the relationship between each KPI. Therefore if you see an increase in the heart rate, you also expect to see a corresponding increase in body temperature and respiration. And if you see such an increase, you might infer that the body is exercising – perhaps riding a bicycle.

To me, this represents a problem. Sure, the body may be biking, but is it going in the right direction?

Wrong Way

Applying this same technique to IT system management is inaccurate and results in many business problems remaining undetected. It is just not possible to qualitatively monitor business activities from infrastructure data.

But a complete 360° view is possible. By monitoring the myriad interactions and units of work that make up the business transactions and activities, the IT organization now provides a business context for the work that the infrastructure is carrying out. We use Service Activity Monitoring (SAM) to qualitatively monitor the units of work, and Business Activity Monitoring (BAM) to qualitatively monitor the desired Business Activity or transaction. By looking qualitatively at Service Delivery from Top Down as well as Bottom Up, an IT organization can control all aspects of Service Delivery with all the benefits such as lower costs, higher revenues, and happier customers.

Note: In my previous musings, Doug McClure kindly pointed me to his excellent blog, and mentioned that he hadn’t heard of “Service Activity Monitoring” (SAM) before. It’s a term I used after I first came across it from a presentation given by CITT online describing an architecture overview in Deutsche Post. Essentially, SAM sits between BAM and infrastructure monitoring and monitors how services from applications are being delivered as units of work within a business process.

Jul 28

Written by:

Occasionally I get asked a simple question by IT Operations managers, “Why do I need another monitoring tool? I’m already monitoring all my IT and network technology – what else could I need?”. And then in the next meeting an Executive will ask me “Why are we still only discovering incidents when the customer calls in a problem. Don’t we monitor this stuff ?”

Executives naturally have a world-view oriented around measuring and improving business targets such as customer satisfaction, churn, volume of new customers…etc. They’re generally not interested in megabits per second, memory leaks, or whether the CPU is working at 50% or 90%. Sometimes I hear amusing anecdotes – for example the reaction of a CEO being told that Customer Sat was down due to high loading on the Mediation server CPU.

IT Operations on the other hand live and breathe CPU Utilization, load-balancing, bandwidth, megabits per second and other dark arts. If the servers are up and the applications are responding, then there is often an implied conclusion that all is good in the world.

There is a real language barrier in most organizations between IT and Business departments, and all too often this results in real execution problems that affect customers and revenues.

A coherent monitoring strategy and implementation will play a critical role in building a bridge between these two valid but orthogonal viewpoints. Specifically the ability to monitor Business Activity in terms of key indicators (e.g. data connection set-up time, number porting delay, online ordering, automated fulfilment) extends the view of IT operations to provide assurance that technology is delivering Business Performance targets and not only technical metrics such as those described above.

Business Activity Monitoring (BAM) provides executives with the ability to access real-time business performance metrics. Service Activity Monitoring (SAM) is the IT department equivalent and provides Operations staff with the ability to access real-time service delivery performance metrics, and to associate the service with underlying infrastructure as well as the corresponding business process, transaction, and customer.

In other words, by using products that combine BAM and SAM capabilities, both Business and IT executives have a common viewpoint and shared language. The beginning of the end for “Lost in Translation” costly situations.

« Previous Entries Next Entries »