Oct 7

Written by: Brian Connell

Back from beautiful Trento in Italy where the 5th EPTS Symposium was held.  Marvelous location, and we even managed to find a really beautiful Michelin-star rated restaurant.

While it was great to meet up with some friends and colleagues, and there were some very interesting nuggets at the conference, my overall impression was one of disappointment and frustration.  We are still struggling with basic concepts, arguing about the definition of Event Processing (and there are some very … different … views), and still haven’t managed to produce anything that either identifies the major components you’d find in a reference architecture, or found commonality in any of the use cases.  Actually .. working groups aside, that’s kinda what we ended up with last year.  And the year before…

Groundhog Day

And before I offend the very people I wish to praise and give credit to, let me be clear.  It’s most definitely not the fault of the people collaborating on the working groups.  They’re contributing.  Giving up their precious time.  In order to make progress, we really need to get more people involved, and we need to set clearer mandates on the deliverables for these working groups.

A new working group was proposed to promote the EPTS.  This will also involve publishing the public deliverables from the working groups, and encouraging new members to join and contribute.

Here’s hoping that this time next year, I won’t feel another Bill Murray moment coming on!

Jul 3

Written by: Brian Connell

A partner of ours returned from a meeting recently with the reaction from the prospect of “I already monitor my systems, why do I need more monitoring?“.  Great question.

I get that a lot. It’s a normal reaction. Usually from the IT Operations Director who has spent considerable sums of money on monitoring to date, and can boast an arsenal including:

  • Hardware monitoring
  • Application availability
  • Network monitoring
  • Website monitoring
  • Transaction monitoring
  • Speciality tools for monitoring Oracle, SAP and the like
  • And perhaps some dashboard aggregators that consolidate information from many separate sources into one single dashboard.

So why would an organization need more monitoring?  Well, the single most compelling reason is to cut down on the number of outages and incidents that impact business performance.  To do this, there’s more to monitoring than just detecting when things go wrong which is what the products in use by most organizations are stuck with.  By then, the damage is done, and something has already gone wrong.

Ideally, monitoring should be smart enough, and powerful enough, to detect a situation that indicates with a high probability that something needs attention before the situation develops into something bigger and more costly.  It’s the equivalent of warning you that your car is about to be towed instead of telling you that your car has just been towed.

My Ferrari (I wish that were true) getting towed!
In other words, rather than coping with IT disasters, what about averting them in the first place?  A system that constantly monitors your key business activities and transactions, with the ability to connect events together in order to detect variances within your business transactions.  Tells you exactly what’s going on in real-time and provides timely warnings.

For example, your current monitoring systems for processing orders might provide the following information:

  1. Database server OK, ping round trip 0.112s
  2. Database OK, 32 transactions per second, average transaction 1.232s
  3. Web Server OK, 42 connections

Whereas a system monitoring business events would instead report:

  1. 14 Orders in progress
  2. Average time to process orders is 6.687 seconds
  3. Alert: 13% of orders processed in last 5 minutes were above 9 seconds.  Current trend is that an order will breach the SLA of 12.5 seconds within 40 minutes.

So rather than overworked IT staff trying to filter millions of seemingly disconnected IT events, most of which report little or nothing by way of business significance, they can instead focus on meaningful business objectives and performance indicators, and can react quicker to events that impact business performance, as well as communicate with non-IT staff using the lingua franca of your business.  And most importantly, if you’re already solely relying on traditional monitoring approaches, then you can expect to significantly further reduce the number of outages and incidents from anywhere between 20% and 80%!

Mar 5

Written by: Des Carbery

A friend of mine has been tweeting about a very interesting use of Twitter.

Paul Watson responded to the challenge from DIYcity to build “an early warning system for outbreaks of flu, colds and other communicable disease at the city level”.

Paul’s design is to correlate twitter trends and location information to identify when people start talking about the flu. Then he can compare these stats against expected trends to identify possible spikes. Paul plans to use twitter to send out warnings and you can subscribe to these feeds to receive early warning of an outbreak.

Note Google did something similar for the US based on search terms but without the warning system.

How reliable will the twitter trends be? Time will tell but what’s fascinating about the Google data is you can see their results compare well with data from U.S. Centers for Disease Control and Prevention (CDC).

I hope we’ll see similar results with the twitter data.

This is a fascinating application, it’s built using data that is easy to access and with technology that is easy to use and open-source. To think that you would have probably needed the cooperation of the military to do something like this a few years ago.

Feb 28

Written by: Brian Connell

I was recently giving a presentation to a rather large utility provider, and was asked the question “But why do I need real-time?

Good question.  And a very difficult question to answer correctly in most cases.  There’s lots of answers that address the requirement – in theory.  Take your pick from the ones I come across most often, and may even be guilty of uttering one or two of these myself:

  • React to an opportunity or threat
  • Become more proactive
  • Prioritize resources
  • Make smarter decisions quicker

But sometimes the right answer lies in asking a question in return.  “Is there anything you could do if you knew something had just occurred?“  This turns the question around to the customer and they always find that there is always something that can be done to improve the situation!

Remember To React 

Within a business context, knowing immediately if there are problems means that you know at least as soon as your customers do.  A well known analyst firm estimates that more than 60% of problems are reported by customers (and in our experience, this number is low.  We generally see numbers approaching 80%)  But the value of detection is completely lost if there isn’t a plan for reacting.

So one definition of real-time is defined or determined by the window of time that exists whereby a detection and reaction have maximum benefit.  So whether you need to react within 5 minutes, or 5 micro-seconds,  each situation has it’s own context and definition of real-time.

Ask Brian

Ask The Brian

But the real value of detection is predicting situations in advance by being able to detect the patterns that indicate a high probability of something about to happen in the future.  This is possible so long as situations exhibit a consistent set of early-warning signs – to the uninitiated it can seem a little bit like High tech fortune telling.  But the results are definitely worth it.  We’ve seen results in our customers showing more than an 80% drop in customer detection rates and incidents tagged as high priority.  It’s where we definitely see the value of Complex Event Processing and Business Activitiy Monitoring intersecting.

Jan 31

Written by: Brian Connell

Sometimes (and most especially it seems in Europe), bad weather can bring a country to a standstill.  Not long ago, some snow fell in Madrid (pretty rare) and resulted in what appeared to be a national emergency including shutting down the airport for a small period.  The following is an email written by one of our guys trapped in Madrid airport.  I post this because it is amusing and well written, and because there is a very tenuous link between event causality and weather prediction.  And context.  This photo from the day will cause several people to question the definition of “Heavy Snow”.  So thanks Nigel!

Heavy Snow at Madrid Airport caused Mayhem

Heavy Snow at Madrid Airport caused Mayhem

As I enter my 28th hour of captivity I am forming some sort of Stockholm Syndrome affinity with my kidnapper, Madrid Airport. Its really not so bad, there is food (small dried toasts onto which you can drop olive oil for a treat) and even showers (where what looks really like a fan switch in fact called the emergency guy).

Yesterday saw the entire airport shutdown for around 10 hours due to 100 or so snow flakes. When it opened I was lucky enough to get transfered onto the late flight (which ironically was the one I was originally booked onto before changing to the earlier one). So our late flight became later and later – every 15 minutes the board showed it slipping by another 15 minutes. Now the know-it-alls like me know Munich has a very strict midnight curfew for landing. German rules dictate that even if you are plunging from 30000 feet due to engine failure, you will need written permission from the mayor before you are allowed to penetrate the runway like a flying dart. In short, we knew there was no chance of the flight leaving so even though the plane was fueled up and waiting, it was all in vain. Quite pleased I did not get on as I had already spent 3 hours on the morning plane at the gate before being hoofed off due to the snow flake.

The night was fun. I made a bed by pushing two small chairs together which formed a comfy oval cave. I now walk like a hunchback and if lucky will straighten up by Monday. I noticed also that small tribes are forming – I was naturally drawn to the alpha males holding BA Gold Cards and minor tribes such as the Silver Card Holders and ´fresh from the swamp¨ occasional flyers are keeping away from us. If they do not bring something to replace the small toasts, we are considering starting to eat some of the Silver card holders.

So the plan today is I am on a wait list for 8.20. In Spanish ´Wait List´ translates to ´not a hope¨  but it will keep you from being a pain in the butt at the service desk for a couple more hours. As this Wait turns inevitably into disapointment, I am also booked onto a 16.20 which is actually confirmed. So only 10 more hours to go before I might get a flight. I am also considering the train which is 28 hours via Paris but it is hard to call – will I get that 16:20 dream flight home or won´t I ?  I think I will make the call if I miss the 16:20 and should then make it home sometime on Monday by train.

Otherwise I am having a pleasant time. Small diversions like using the toilet at the other end of the airport can kill nearly 50 minutes and I am looking forward to breakfast at McDonalds at 07:00. Its also fun watching the people, the airport is full of people who were here all night and many all day yesterday – there are the enraged, the cool, the up-all-night Redbull folks, sadly many of have kids which is just awful.

If you are wondering why we did not all go to a nice hotel – the roads were closed or extremely slow due to the snow-flake and we were told we were unlikely to get to a hotel before 2 or even 3am.

Planes Scuttling for Cover at Madrid Airport

Planes Scuttling for Cover at Madrid Airport

The only thing that really worries me apart from forgetting what my kids look like is my socks. Over nearly 48 hours of walking, I swear I could see them gently glowing in the night. They redefine the word ´funk´ and I feel they should be sent to Jim and Mort as a record of our absolute committment to the cause.

Just of for a glass of wine for breakfast -  of course there is more booze than water in the lounge.

Nov 10

Written by: Des Carbery

Twitter provide nice APIs that allow users access to their data. There’s a whole bunch of interesting applications built on top of this data, most of them gimmicks while a few have serious potential. One of the ways of accessing the twitter data is via the Twitter Firehose which is supposed to be the entire stream of “tweets” (tweet is a message on twitter). Unfortunately the firehose is still not open to all users which is a real pity.

Bill de hÓra has a very interesting posting on this topic. There are 2 interesting aspects to Bill’s post:

1. The story behind the firehose and some of the reactions to the fact that Twitter still haven’t given open access to this data.

2. The speculation that the reason for the delay is not so much a business reason but a technical limitation on Twitter’s ability to scale.

If the reason for the delay is that Twitter is protecting the data so they can exploit the value themselves, I wouldn’t be too surprised. To be fair that would be considered the second most valuable thing they have. The most valuable asset is the fact that they have everyone’s attention.

Bill seems to think the issues are technical. Roy Fielding suggests as much when he blogged about Inverse Economies of Scale with PubSub systems. Roy’s solution to the problem of people demanding too much data/events and therefore crippling your service was to charge for the service. That is possibly what Twitter may announce next year.

The technical problems that Roy and Bill discuss are caused by Event and/or Data gluttons. These are users/machines that oversubscribe to events or data services. This is human nature and provides developers with a challenge when building distributed software. Luckily there are a lot of useful patterns and tools available to today’s developers to solve these problems. At WestGlobal we face the challenge of Event Gluttons and we handle it by having a flexible architecture that can be optimised depending on the situation. In general, the approach we take is for a distributed Event Processing Network (EPN) with flexible deployment options. As such, Vantify deployment are layered as follows:

1. The first layer of consumers are our Event Processors who sit closest to the action. These event gluttons are deployed on a network and want to know everything that is happening in your business. These Event Gluttons process events as they happen using CEP and other techniques. The good news for the network is that you don’t need too many of these event processors so over subscription isn’t a problem. They can be distributed almost anywhere on your network and if there is a high volume of data then we can limit the chatter by deploying the processors closer to the events or splitting up the event streams. Although these are very greedy consumers we can satisfy their appetite.

2. The other place where we need to handle event gluttons is at the top layer in our product where real users want to view reports and dashboards about their business. Here we do need to worry about supporting a large number of users. However most of these users are using a browser and polling our data feeds which will scale very well. There will be a limited number of users who want subscriptions in order to be notified. This is where we need to worry about scale but this is typically a small number.

So I think the world is big enough to accommodate Event Gluttons. Thanks to people like Roy and Bill, developers have a lot of tools and techniques available to them to handle scale. The only good reason for restricting data or events should be a business reason. I certainly hope Twitter open up their data and like many others I’ll be watching with interest when they reveal all.

Oct 31

Written by: Brian Connell

Having read Opher’s excellent blog posting on describing CEP maturity models, I found that while I agreed with Opher’s descriptions of differences between messaging and events, I disagreed with describing phase 3 of CEP as “towards looking at ‘event clouds’ instead of events one-by-one”.

The glossary notes on the definition of an event cloud says that “CEP usually refers to event processing that assumes an event cloud as input, and thereby can make no assumptions about the arrival order of events”. This implies that events “arrive” – just not necessarily in a defined order such as creation time, etc. It also implies that events may arrive one-by-one. It certainly does not preclude one-by-one processing.

The other implication of Opher’s posting is that the cloud may somehow be processed as a whole. Looking at the definition of a cloud, it is made up of many events of differing types where each event may have been created at a different time, and may have a different time-to-live value within the cloud. But in order to make the entire cloud accessible to an event processing agent as a whole, a mechanism must exist that persists the events within the event cloud and manages the cloud events according to their time-to-live values.

(An easy parallel to this view is “ordinary” data processing where sets of persisted data (i.e. events) are made available for queries. Data/Events are stored in tables and keyed by their time-to-live values. Obviously, given a large enough quantity of events, the storage and processing requirement may be considerable.)

But I disagree that this is the only way to define CEP. Indeed it has long been a fiery debate among the CEP community on how, exactly, an event cloud may be practically processed without creating a partially ordered set of events (which may be regarded as a stream of one-by-one events). I would argue that persisting an entire event cloud is fine for ad-hoc processing and analysis, but that the vast majority of CEP involves detection of predefined patterns and is efficiently performed as a form of one-by-one processing.

In Operational Intelligence, applying CEP to the enterprise event cloud is a practical application whereby predefined patterns are detected and acted-upon in real-time. It would be impractical and practically impossible to persist the entire event cloud as the volumes of events are considerable, and the rate of events would require a lot of expensive equipment to provide the required processing power. Yet Vantify Experience Center uses CEP to process an enterprise’s event cloud, and provides real-time intelligence to operational staff to meet challenges and opportunities for maximum business benefit with relatively inexpensive equipment. The events are processed one-at-a-time rather than as a single cloud for efficiency, and the value to customers has been demonstrated many times. To imply that CEP excludes one-by-one processing is inaccurate and wrong – rather it is an important and critical subset of CEP.

Oct 13

Written by: Des Carbery

“So what do you actually do?”

That dreaded question! I’ve moved job recently and I’m hearing it a lot.

One of the great thing about working at WestGlobal is that now it’s an easier question to answer. I used to work on SOA products. Need I say more?

Like with most questions, the response to “So what do you actually do?” depends on the audience. I’ve been thinking about how I should respond to this question if asked by a previous technical colleague or by my 7 year old son. The interesting thing is that I don’t think my response would vary that much. I think the difference would be in the language and metaphors used.

If the colleague asked I’d obviously try to impress them but I’d probably try something like the following:

  1. Using agents/sensors across your business you gather real-time events as they happen.
  2. You process the events from these sensors and produce new events and data using advanced techniques like CEP.
  3. You present your new events and data in a variety of forms like dashboards, reports and alarms to the customer. They get a detailed picture of what is happening in their business including the relationships between different agents and activities.

If my son asked, I could simplify the language and introduce a metaphor:

  1. Using spies across your business you gather information about what is going on. Your spies send you reports whenever anything happens.
  2. Back at the command centre, you decode the message from all of your spies and try to figure out what it all means.
  3. You’re able to give info to your friends to help them save the day. They know what is happening before anybody else so can react quickly.

Agent Vantify Issue 1

I think I prefer my job through the eyes of my son!

Tune in next week to find out if any agents defected.

« Previous Entries