Brian Connell's page

Brian is the founder and CTO of WestGlobal and focuses on the successful creation and delivery to the market of our products and services. Brian has worked for some of the world's leading software companies, such as IBM, Lotus Development, Ingres and Computer Associates. Brian is a well known writer and contributor on the subject of Complex Event Processing and is an active member of the Event Processing Technical Society (EPTS).

Jan 16

Written by:

Here’s 5 of our favorite tips we’ve gathered over the years to assist you in getting the most from your Business Transaction Monitoring solution:

Einstein Graffiti Banksy

  1. Start out by identifying your key transactions and Services.  Be selective.  Too much noise in terms of data and alerts can be detrimental.  Focus on actionable information, not just data for the sake of data.
  2. Identify the key metrics associated with the transactions operational characteristics.  For example, does the daily or weekly volumes vary.  How does Service Delivery Performance change in relation to the time of day or week or month?  Are usage spikes normal due to an announcement or external event?
  3. Start small, start simple.  A big bang approach will likely fail as it will be seen as too disruptive, or providing too much data.  Similarly, a complex initial approach may be seen as too disruptive.
  4. Most importantly, design an action plan for each alert you may receive.
  5. Finally, decide what the key success factors are in advance, and regularly measure your BTM usage against the success factors.

If you’ve any tips or tricks you’d like to share, please leave a comment.

Apr 19

Written by:

I’ve seen on a number of occasions where alerts are placed on a dashboard, assigned a trouble ticket, and/or emailed to all members within the IT operations department.  Inevitably, and predictably, chaos reigns.  The easy alerts are quickly dealt with, and the more difficult situations are never given more than a cursory look.  It’s generally true that if the Operations team can’t identify the problem in less than 30 minutes, it won’t get fixed.

With Business Transaction Monitoring (BTM), it is important to ensure that your IT Operations team understand how to prioritize and respond to an alert.  And unlike traditional monitoring systems, many of these alerts will require escalation.  Unlike the maturity of IT infrastructure management, there are often no failsafe mechanisms or reset buttons available to fix transaction issues.  BTM alerts may concern an individual transaction, but may equally highlight a slowdown along the transaction path and thereby affecting large volumes of transactions.

As BTM matures within an IT Operations department, reactions become smoother and shorter, increasing the overall Service Delivery Performance and Customer Experience.  Underpinning this evolution – and making it less painful for all concerned – lies an understanding and acceptence that most IT organizations are really only starting out and learning about transaction paths, transaction bottlenecks, the interdependence of software services, and the close collaboration required between Business and IT to set clear goals and expectations.

Mar 3

Written by:

I have yet to meet with a customer who hasn’t asked within the first couple of minutes to explain the difference between Vantify and their usual Enterprise Monitoring solutions.

Here’s a simple method.  Simply figure out which of the following your monitoring is providing answers to:

  • Resource Availability.  Any element within your IT infrastructure is simply a resource, available to work on delivering a service.  Many old-school traditional monitoring solutions fall into this category.  They ping a resource, wait for a response, and mark the resource as available.
  • Resource Capability.  Once resources are available, an additional technique is to check up on their current capability to perform the task that they are required to perform.  This might entail running a synthetic transaction – for example, mimicking a user working on a website, or running a script to insert, modify, and delete a record in a database.
  • Service Delivery.  This technique evaluates the quality of the actual work being performed by the resource.  For example, analyzing the speed at which a website serves each and every page and the number of errors, or monitoring the each and every request for a customer balance from a back-end system.

Most Operations Departments “make do” with Availability and Capability.  Hence the reason why they produce reports like “We had 99% uptime last month” – and hence the reason why the rest of the Business looks on and slowly shake their collective heads and mutter “these guys don’t get it”….

If you’re in Business and rely on IT to delivery key Business Services to your customers, you need to monitor how well you are *actually* delivering your service.

Feb 11

Written by:

There’s no doubt that enterprises have always known about Business Events, but it’s only in relatively recent times that Business Events are being recognized and processed as vital signals and indicators of Business activity.

Capturing and analyzing events the moment they occur provides valuable insights, capable of improving the reactions times and quality of decisions that every enterprise takes, and managed correctly is capable of transforming an enterprise from a knee-jerk reactionary organization into a proactive customer- and service-centric organization.

These events already occur within every organization, but not every organization is geared up to capture, evaluate and analyze this data.  John Rymer at Forrestor recently posed some questions directly related to how organizations might be struggling to capture, store, manage and analyze this potentially vast amount of event data, and asks for stats.  Should be interesting, but not just as growth stats of transactions volumes, but probably more in terms of enterprises waking up to the value of the event data they already produce but know nothing about.  Yet.

Feb 4

Written by:

quietIt’s easier than you might think to simply not keep up with a blog. Before you know it, it’s been 12 months or more since the last update.

We’ve decided to restart blogging again, mainly because people have made nice comments to me on the posts, and also because we’ve always got something to say!

We stopped blogging, not with a bang but with a whisper.  We’re back in the same way we went out. We’ll just quietly get on with it from here.

Oct 7

Written by:

Back from beautiful Trento in Italy where the 5th EPTS Symposium was held.  Marvelous location, and we even managed to find a really beautiful Michelin-star rated restaurant.

While it was great to meet up with some friends and colleagues, and there were some very interesting nuggets at the conference, my overall impression was one of disappointment and frustration.  We are still struggling with basic concepts, arguing about the definition of Event Processing (and there are some very … different … views), and still haven’t managed to produce anything that either identifies the major components you’d find in a reference architecture, or found commonality in any of the use cases.  Actually .. working groups aside, that’s kinda what we ended up with last year.  And the year before…

Groundhog Day

And before I offend the very people I wish to praise and give credit to, let me be clear.  It’s most definitely not the fault of the people collaborating on the working groups.  They’re contributing.  Giving up their precious time.  In order to make progress, we really need to get more people involved, and we need to set clearer mandates on the deliverables for these working groups.

A new working group was proposed to promote the EPTS.  This will also involve publishing the public deliverables from the working groups, and encouraging new members to join and contribute.

Here’s hoping that this time next year, I won’t feel another Bill Murray moment coming on!

Jul 3

Written by:

A partner of ours returned from a meeting recently with the reaction from the prospect of “I already monitor my systems, why do I need more monitoring?“.  Great question.

I get that a lot. It’s a normal reaction. Usually from the IT Operations Director who has spent considerable sums of money on monitoring to date, and can boast an arsenal including:

  • Hardware monitoring
  • Application availability
  • Network monitoring
  • Website monitoring
  • Transaction monitoring
  • Speciality tools for monitoring Oracle, SAP and the like
  • And perhaps some dashboard aggregators that consolidate information from many separate sources into one single dashboard.

So why would an organization need more monitoring?  Well, the single most compelling reason is to cut down on the number of outages and incidents that impact business performance.  To do this, there’s more to monitoring than just detecting when things go wrong which is what the products in use by most organizations are stuck with.  By then, the damage is done, and something has already gone wrong.

Ideally, monitoring should be smart enough, and powerful enough, to detect a situation that indicates with a high probability that something needs attention before the situation develops into something bigger and more costly.  It’s the equivalent of warning you that your car is about to be towed instead of telling you that your car has just been towed.

My Ferrari (I wish that were true) getting towed!
In other words, rather than coping with IT disasters, what about averting them in the first place?  A system that constantly monitors your key business activities and transactions, with the ability to connect events together in order to detect variances within your business transactions.  Tells you exactly what’s going on in real-time and provides timely warnings.

For example, your current monitoring systems for processing orders might provide the following information:

  1. Database server OK, ping round trip 0.112s
  2. Database OK, 32 transactions per second, average transaction 1.232s
  3. Web Server OK, 42 connections

Whereas a system monitoring business events would instead report:

  1. 14 Orders in progress
  2. Average time to process orders is 6.687 seconds
  3. Alert: 13% of orders processed in last 5 minutes were above 9 seconds.  Current trend is that an order will breach the SLA of 12.5 seconds within 40 minutes.

So rather than overworked IT staff trying to filter millions of seemingly disconnected IT events, most of which report little or nothing by way of business significance, they can instead focus on meaningful business objectives and performance indicators, and can react quicker to events that impact business performance, as well as communicate with non-IT staff using the lingua franca of your business.  And most importantly, if you’re already solely relying on traditional monitoring approaches, then you can expect to significantly further reduce the number of outages and incidents from anywhere between 20% and 80%!

Feb 28

Written by:

I was recently giving a presentation to a rather large utility provider, and was asked the question “But why do I need real-time?

Good question.  And a very difficult question to answer correctly in most cases.  There’s lots of answers that address the requirement – in theory.  Take your pick from the ones I come across most often, and may even be guilty of uttering one or two of these myself:

  • React to an opportunity or threat
  • Become more proactive
  • Prioritize resources
  • Make smarter decisions quicker

But sometimes the right answer lies in asking a question in return.  “Is there anything you could do if you knew something had just occurred?“  This turns the question around to the customer and they always find that there is always something that can be done to improve the situation!

Remember To React 

Within a business context, knowing immediately if there are problems means that you know at least as soon as your customers do.  A well known analyst firm estimates that more than 60% of problems are reported by customers (and in our experience, this number is low.  We generally see numbers approaching 80%)  But the value of detection is completely lost if there isn’t a plan for reacting.

So one definition of real-time is defined or determined by the window of time that exists whereby a detection and reaction have maximum benefit.  So whether you need to react within 5 minutes, or 5 micro-seconds,  each situation has it’s own context and definition of real-time.

Ask Brian

Ask The Brian

But the real value of detection is predicting situations in advance by being able to detect the patterns that indicate a high probability of something about to happen in the future.  This is possible so long as situations exhibit a consistent set of early-warning signs – to the uninitiated it can seem a little bit like High tech fortune telling.  But the results are definitely worth it.  We’ve seen results in our customers showing more than an 80% drop in customer detection rates and incidents tagged as high priority.  It’s where we definitely see the value of Complex Event Processing and Business Activitiy Monitoring intersecting.

« Previous Entries