Brian Connell's page

Brian is the founder and CTO of WestGlobal and focuses on the successful creation and delivery to the market of our products and services. Brian has worked for some of the world's leading software companies, such as IBM, Lotus Development, Ingres and Computer Associates. Brian is a well known writer and contributor on the subject of Complex Event Processing and is an active member of the Event Processing Technical Society (EPTS).

Oct 7

Written by: Brian Connell

Back from beautiful Trento in Italy where the 5th EPTS Symposium was held.  Marvelous location, and we even managed to find a really beautiful Michelin-star rated restaurant.

While it was great to meet up with some friends and colleagues, and there were some very interesting nuggets at the conference, my overall impression was one of disappointment and frustration.  We are still struggling with basic concepts, arguing about the definition of Event Processing (and there are some very … different … views), and still haven’t managed to produce anything that either identifies the major components you’d find in a reference architecture, or found commonality in any of the use cases.  Actually .. working groups aside, that’s kinda what we ended up with last year.  And the year before…

Groundhog Day

And before I offend the very people I wish to praise and give credit to, let me be clear.  It’s most definitely not the fault of the people collaborating on the working groups.  They’re contributing.  Giving up their precious time.  In order to make progress, we really need to get more people involved, and we need to set clearer mandates on the deliverables for these working groups.

A new working group was proposed to promote the EPTS.  This will also involve publishing the public deliverables from the working groups, and encouraging new members to join and contribute.

Here’s hoping that this time next year, I won’t feel another Bill Murray moment coming on!

Jul 3

Written by: Brian Connell

A partner of ours returned from a meeting recently with the reaction from the prospect of “I already monitor my systems, why do I need more monitoring?“.  Great question.

I get that a lot. It’s a normal reaction. Usually from the IT Operations Director who has spent considerable sums of money on monitoring to date, and can boast an arsenal including:

  • Hardware monitoring
  • Application availability
  • Network monitoring
  • Website monitoring
  • Transaction monitoring
  • Speciality tools for monitoring Oracle, SAP and the like
  • And perhaps some dashboard aggregators that consolidate information from many separate sources into one single dashboard.

So why would an organization need more monitoring?  Well, the single most compelling reason is to cut down on the number of outages and incidents that impact business performance.  To do this, there’s more to monitoring than just detecting when things go wrong which is what the products in use by most organizations are stuck with.  By then, the damage is done, and something has already gone wrong.

Ideally, monitoring should be smart enough, and powerful enough, to detect a situation that indicates with a high probability that something needs attention before the situation develops into something bigger and more costly.  It’s the equivalent of warning you that your car is about to be towed instead of telling you that your car has just been towed.

My Ferrari (I wish that were true) getting towed!
In other words, rather than coping with IT disasters, what about averting them in the first place?  A system that constantly monitors your key business activities and transactions, with the ability to connect events together in order to detect variances within your business transactions.  Tells you exactly what’s going on in real-time and provides timely warnings.

For example, your current monitoring systems for processing orders might provide the following information:

  1. Database server OK, ping round trip 0.112s
  2. Database OK, 32 transactions per second, average transaction 1.232s
  3. Web Server OK, 42 connections

Whereas a system monitoring business events would instead report:

  1. 14 Orders in progress
  2. Average time to process orders is 6.687 seconds
  3. Alert: 13% of orders processed in last 5 minutes were above 9 seconds.  Current trend is that an order will breach the SLA of 12.5 seconds within 40 minutes.

So rather than overworked IT staff trying to filter millions of seemingly disconnected IT events, most of which report little or nothing by way of business significance, they can instead focus on meaningful business objectives and performance indicators, and can react quicker to events that impact business performance, as well as communicate with non-IT staff using the lingua franca of your business.  And most importantly, if you’re already solely relying on traditional monitoring approaches, then you can expect to significantly further reduce the number of outages and incidents from anywhere between 20% and 80%!

Feb 28

Written by: Brian Connell

I was recently giving a presentation to a rather large utility provider, and was asked the question “But why do I need real-time?

Good question.  And a very difficult question to answer correctly in most cases.  There’s lots of answers that address the requirement – in theory.  Take your pick from the ones I come across most often, and may even be guilty of uttering one or two of these myself:

  • React to an opportunity or threat
  • Become more proactive
  • Prioritize resources
  • Make smarter decisions quicker

But sometimes the right answer lies in asking a question in return.  “Is there anything you could do if you knew something had just occurred?“  This turns the question around to the customer and they always find that there is always something that can be done to improve the situation!

Remember To React 

Within a business context, knowing immediately if there are problems means that you know at least as soon as your customers do.  A well known analyst firm estimates that more than 60% of problems are reported by customers (and in our experience, this number is low.  We generally see numbers approaching 80%)  But the value of detection is completely lost if there isn’t a plan for reacting.

So one definition of real-time is defined or determined by the window of time that exists whereby a detection and reaction have maximum benefit.  So whether you need to react within 5 minutes, or 5 micro-seconds,  each situation has it’s own context and definition of real-time.

Ask Brian

Ask The Brian

But the real value of detection is predicting situations in advance by being able to detect the patterns that indicate a high probability of something about to happen in the future.  This is possible so long as situations exhibit a consistent set of early-warning signs – to the uninitiated it can seem a little bit like High tech fortune telling.  But the results are definitely worth it.  We’ve seen results in our customers showing more than an 80% drop in customer detection rates and incidents tagged as high priority.  It’s where we definitely see the value of Complex Event Processing and Business Activitiy Monitoring intersecting.

Jan 31

Written by: Brian Connell

Sometimes (and most especially it seems in Europe), bad weather can bring a country to a standstill.  Not long ago, some snow fell in Madrid (pretty rare) and resulted in what appeared to be a national emergency including shutting down the airport for a small period.  The following is an email written by one of our guys trapped in Madrid airport.  I post this because it is amusing and well written, and because there is a very tenuous link between event causality and weather prediction.  And context.  This photo from the day will cause several people to question the definition of “Heavy Snow”.  So thanks Nigel!

Heavy Snow at Madrid Airport caused Mayhem

Heavy Snow at Madrid Airport caused Mayhem

As I enter my 28th hour of captivity I am forming some sort of Stockholm Syndrome affinity with my kidnapper, Madrid Airport. Its really not so bad, there is food (small dried toasts onto which you can drop olive oil for a treat) and even showers (where what looks really like a fan switch in fact called the emergency guy).

Yesterday saw the entire airport shutdown for around 10 hours due to 100 or so snow flakes. When it opened I was lucky enough to get transfered onto the late flight (which ironically was the one I was originally booked onto before changing to the earlier one). So our late flight became later and later – every 15 minutes the board showed it slipping by another 15 minutes. Now the know-it-alls like me know Munich has a very strict midnight curfew for landing. German rules dictate that even if you are plunging from 30000 feet due to engine failure, you will need written permission from the mayor before you are allowed to penetrate the runway like a flying dart. In short, we knew there was no chance of the flight leaving so even though the plane was fueled up and waiting, it was all in vain. Quite pleased I did not get on as I had already spent 3 hours on the morning plane at the gate before being hoofed off due to the snow flake.

The night was fun. I made a bed by pushing two small chairs together which formed a comfy oval cave. I now walk like a hunchback and if lucky will straighten up by Monday. I noticed also that small tribes are forming – I was naturally drawn to the alpha males holding BA Gold Cards and minor tribes such as the Silver Card Holders and ´fresh from the swamp¨ occasional flyers are keeping away from us. If they do not bring something to replace the small toasts, we are considering starting to eat some of the Silver card holders.

So the plan today is I am on a wait list for 8.20. In Spanish ´Wait List´ translates to ´not a hope¨  but it will keep you from being a pain in the butt at the service desk for a couple more hours. As this Wait turns inevitably into disapointment, I am also booked onto a 16.20 which is actually confirmed. So only 10 more hours to go before I might get a flight. I am also considering the train which is 28 hours via Paris but it is hard to call – will I get that 16:20 dream flight home or won´t I ?  I think I will make the call if I miss the 16:20 and should then make it home sometime on Monday by train.

Otherwise I am having a pleasant time. Small diversions like using the toilet at the other end of the airport can kill nearly 50 minutes and I am looking forward to breakfast at McDonalds at 07:00. Its also fun watching the people, the airport is full of people who were here all night and many all day yesterday – there are the enraged, the cool, the up-all-night Redbull folks, sadly many of have kids which is just awful.

If you are wondering why we did not all go to a nice hotel – the roads were closed or extremely slow due to the snow-flake and we were told we were unlikely to get to a hotel before 2 or even 3am.

Planes Scuttling for Cover at Madrid Airport

Planes Scuttling for Cover at Madrid Airport

The only thing that really worries me apart from forgetting what my kids look like is my socks. Over nearly 48 hours of walking, I swear I could see them gently glowing in the night. They redefine the word ´funk´ and I feel they should be sent to Jim and Mort as a record of our absolute committment to the cause.

Just of for a glass of wine for breakfast -  of course there is more booze than water in the lounge.

Oct 31

Written by: Brian Connell

Having read Opher’s excellent blog posting on describing CEP maturity models, I found that while I agreed with Opher’s descriptions of differences between messaging and events, I disagreed with describing phase 3 of CEP as “towards looking at ‘event clouds’ instead of events one-by-one”.

The glossary notes on the definition of an event cloud says that “CEP usually refers to event processing that assumes an event cloud as input, and thereby can make no assumptions about the arrival order of events”. This implies that events “arrive” – just not necessarily in a defined order such as creation time, etc. It also implies that events may arrive one-by-one. It certainly does not preclude one-by-one processing.

The other implication of Opher’s posting is that the cloud may somehow be processed as a whole. Looking at the definition of a cloud, it is made up of many events of differing types where each event may have been created at a different time, and may have a different time-to-live value within the cloud. But in order to make the entire cloud accessible to an event processing agent as a whole, a mechanism must exist that persists the events within the event cloud and manages the cloud events according to their time-to-live values.

(An easy parallel to this view is “ordinary” data processing where sets of persisted data (i.e. events) are made available for queries. Data/Events are stored in tables and keyed by their time-to-live values. Obviously, given a large enough quantity of events, the storage and processing requirement may be considerable.)

But I disagree that this is the only way to define CEP. Indeed it has long been a fiery debate among the CEP community on how, exactly, an event cloud may be practically processed without creating a partially ordered set of events (which may be regarded as a stream of one-by-one events). I would argue that persisting an entire event cloud is fine for ad-hoc processing and analysis, but that the vast majority of CEP involves detection of predefined patterns and is efficiently performed as a form of one-by-one processing.

In Operational Intelligence, applying CEP to the enterprise event cloud is a practical application whereby predefined patterns are detected and acted-upon in real-time. It would be impractical and practically impossible to persist the entire event cloud as the volumes of events are considerable, and the rate of events would require a lot of expensive equipment to provide the required processing power. Yet Vantify Experience Center uses CEP to process an enterprise’s event cloud, and provides real-time intelligence to operational staff to meet challenges and opportunities for maximum business benefit with relatively inexpensive equipment. The events are processed one-at-a-time rather than as a single cloud for efficiency, and the value to customers has been demonstrated many times. To imply that CEP excludes one-by-one processing is inaccurate and wrong – rather it is an important and critical subset of CEP.

Sep 10

Written by: Brian Connell

Many vendors use variations of the concept of enabling an enterprise to “Align IT with the Business”. For example, products to help to manage IT from the perspective of the Business and to do more of what drives the business and less of what doesn’t. Or to view your IT as an engine for business value. These are valuable perspectives, and Business Service Management (BSM) is being promoted as the answer. Here at WestGlobal, we believe that BSM is only half the picture.

Sinage with messages

Let’s ask an important question:

Q. What does the business want from aligning IT with the business?

A. The business wants a clear and simple solution that monitors how well IT services are being delivered to support business activities and transactions. The business also wants quantitative and qualitative data in order to understand how well individual services are performing, and would like the IT department to prioritize their Operational activity to maximize business activities and minimize negative business impacts.

In order to deliver this vision, there are two different aspects that need to be addressed.

The first part concerns IT resources. Servers, networks, routers, websites – all of the technology and resources and tools that are used to deliver services. Monitoring solutions are required to check the health and availability of these components. Enterprise monitoring tools are vital in this regard, and they’re readily available and do a good job.

The second part concerns Business Activities. Sales orders, shipping, payments – all of the vital business transactions and processes that rely on IT infrastructure that are the life blood of any business.

Traditionally, enterprises are very good at addressing the first part – it is well understood and products are available. On the other hand, very few properly address the second. Without the second part, an enterprise will not be able to align the business and IT departments. Instead of measuring how well sales orders are being processed, the IT department only has lower level tools to measure server uptime or CPU load. Reporting a monthly statistic that the web servers were available within their SLA of 98% does nothing to assure the business that all orders were captured and that every customer had a satisfactory experience. It’s why enterprises that only address the first part still rely on their customers to report problems first.

Addressing the second part means adopting a different approach to gathering data for measuring service delivery. Event processing is an ideal underlying technology to extract relevant and meaningful data from the thousands of events that occur every hour in the enterprise. In terms of Business Activity Monitoring, an event is simply the fact that a process or transaction or activity has progressed. For example, an event may signify that a customer has logged in. A subsequent event may signify that a customer has queried stock availability or placed an item in a basket, and so on until the individual transaction has completed. Because most business activities can be broken up into a start and end, with varying numbers of units of work in between, figuring out the significance of each event is straightforward. By measuring how long it takes for each unit of work, and by tracking events that relate to different activities, the IT department can report to the business in terms that are meaningful.

Enterprise Monitoring Systems with Business Service Management (BSM) do a great job with the first part. Business Activity Monitoring (BAM) that is capable of monitoring Service experience and Customer experience does a great job on the second part, and together enables IT and Business alignment.


Aug 14

Written by: Brian Connell

The evolution of enterprise monitoring has evolved greatly in recent years by focusing on the service being delivered to the Business rather than the health of the underlying IT infrastructure. We have ITIL and ISO20000 frameworks and certification to learn how to align the services to Business needs, and we also have CMDB and BSM solutions to help organize and manage our IT resources. And while these products have improved how IT delivers services and prioritizes resources, it’s only half the picture and severely limits the ability of IT Operations to detect and react to threats. So while IT has better tools to organize management of IT infrastructure resources, IT Operations is still a stressful place where most problems are still reported by users.

Chill Pill

An analogy that’s often used is monitoring the human body. If you monitor the Key Performance Indicators (KPIs) of the human body, you probably end up a list such as heart rate, respiration, and temperature. You may even try to develop a more holistic approach by not only monitoring each KPI, but also monitoring the relationship between each KPI. Therefore if you see an increase in the heart rate, you also expect to see a corresponding increase in body temperature and respiration. And if you see such an increase, you might infer that the body is exercising – perhaps riding a bicycle.

To me, this represents a problem. Sure, the body may be biking, but is it going in the right direction?

Wrong Way

Applying this same technique to IT system management is inaccurate and results in many business problems remaining undetected. It is just not possible to qualitatively monitor business activities from infrastructure data.

But a complete 360° view is possible. By monitoring the myriad interactions and units of work that make up the business transactions and activities, the IT organization now provides a business context for the work that the infrastructure is carrying out. We use Service Activity Monitoring (SAM) to qualitatively monitor the units of work, and Business Activity Monitoring (BAM) to qualitatively monitor the desired Business Activity or transaction. By looking qualitatively at Service Delivery from Top Down as well as Bottom Up, an IT organization can control all aspects of Service Delivery with all the benefits such as lower costs, higher revenues, and happier customers.

Note: In my previous musings, Doug McClure kindly pointed me to his excellent blog, and mentioned that he hadn’t heard of “Service Activity Monitoring” (SAM) before. It’s a term I used after I first came across it from a presentation given by CITT online describing an architecture overview in Deutsche Post. Essentially, SAM sits between BAM and infrastructure monitoring and monitors how services from applications are being delivered as units of work within a business process.

Jul 28

Written by: Brian Connell

Occasionally I get asked a simple question by IT Operations managers, “Why do I need another monitoring tool? I’m already monitoring all my IT and network technology – what else could I need?”. And then in the next meeting an Executive will ask me “Why are we still only discovering incidents when the customer calls in a problem. Don’t we monitor this stuff ?”

Executives naturally have a world-view oriented around measuring and improving business targets such as customer satisfaction, churn, volume of new customers…etc. They’re generally not interested in megabits per second, memory leaks, or whether the CPU is working at 50% or 90%. Sometimes I hear amusing anecdotes – for example the reaction of a CEO being told that Customer Sat was down due to high loading on the Mediation server CPU.

IT Operations on the other hand live and breathe CPU Utilization, load-balancing, bandwidth, megabits per second and other dark arts. If the servers are up and the applications are responding, then there is often an implied conclusion that all is good in the world.

There is a real language barrier in most organizations between IT and Business departments, and all too often this results in real execution problems that affect customers and revenues.

A coherent monitoring strategy and implementation will play a critical role in building a bridge between these two valid but orthogonal viewpoints. Specifically the ability to monitor Business Activity in terms of key indicators (e.g. data connection set-up time, number porting delay, online ordering, automated fulfilment) extends the view of IT operations to provide assurance that technology is delivering Business Performance targets and not only technical metrics such as those described above.

Business Activity Monitoring (BAM) provides executives with the ability to access real-time business performance metrics. Service Activity Monitoring (SAM) is the IT department equivalent and provides Operations staff with the ability to access real-time service delivery performance metrics, and to associate the service with underlying infrastructure as well as the corresponding business process, transaction, and customer.

In other words, by using products that combine BAM and SAM capabilities, both Business and IT executives have a common viewpoint and shared language. The beginning of the end for “Lost in Translation” costly situations.

« Previous Entries