Sunday, July 8, 2007

Decision Table Optimization

The Benchmarking have been implemented and tested, before publishing I'd like to make a few DT optimizations.Here is some ideas for Decision Table Optimization approach(from OpenL Tablets javadoc):

The basic algorithm for decision table evaluation is simple:


  1. For each rule (column) evaluate conditions from the top to the bottom.
  2. If any condition is false, evaluate all the actions in the rule.
  3. If the action is non-empty return action, return the value of the action
  4. If all columns were evaluated return null

The rules for the basic algorithm must hold true for all the optimizations that we are going to implement here, and for all the future implementations.

The assumption that has the most impact on the optimization technique in this algorithm is the read-onlyness of the input data with regard to the decision table execution. In other words, DT actions should not affect the results of condition evaluations. If your decision table does not conform to this assumption, you should not use optimizations. It is also recommended that you take another look at your design and ask yourself: was it really necessary to produce such a twisted logic?

To make optimizations more explicit and simultaneously slightly reduce the number of the keystrokes, the condition expressions that are intended for optimization have one notable distinction: their type is not boolean!!!!

Before you start scratching your head let me explain quickly. For example we have condition


driver.type == type


This condition intended for optimization will look like this:


driver.type



The OpenL Tablets DT Optimizer will automatically recognize this as the equality condition if the condition parameter is String or as contains() if it is an array or some other collection of Strings

We will describe the approach more formally later, but now let's discuss the advantages of this approach.

  • there is less to type and therefore less to read, less places to make typos etc.
  • there is less to compile and parse, the less work for the compiler or optimizer to determine programmer's intentions == better performance and less errors
  • optimization algorithm is easy to turn on or off for any condition, it is easy to predict what kind of optimization will be used for what kind of data, there would not be any "black box magic". Eventually we are going to publish all the optimization algorithms and provide if possible some formula-based estimates for expected performance. In theory this would provide the basis for static compile-time performance analysis.

The table below will describe the optimizations that we are implementing now and some future ones:



















Expr TypeN Params Param Type Condition Optimization
String 1 String x == value Index with constant(HashMap) performance
String 1 String[] list.contains(x) Index with constant(HashMap) performance
int 1 int x == value Index with constant(HashMap) performance
int 1 int[] list.contains(x) Index with constant(HashMap) performance
int 2 int min < x && x < max Index with ln(n) performance


The more detailed analysis of the performance to be continued ...


2 comments:

woolfel said...

those performance optimizations are simple and straight forward. I describe some other techniques, that I think should provide even better performance. I've blogged about these techniques in the past, but here is a link to the most recent attempt at explaining the technique.

http://woolfel.blogspot.com/2007/07/semweb-to-rescue.html

I plan to write a few more entries covering some well known pattern matching optimizations. For the record, most of these techniques are old and date back to 70's and 80's. It's new to open source rule engines, but it's far from being new in computer science world.

snshor said...

After another discussion with Peter Lin I figured out how to efficiently index statements like in CA,MA,NY and not in PA,MI,IN. It happened in his by-invitation-only blog; don't worry the algorithm will be explained in OpenL Tablets doc on performance and optimization.


Discussions with him are getting more and more productive, he never agrees to anything and challenges every statement. I like it

BTW, some very special samples of OpenL Tablets benchmarks on my old Dell laptop show 2M TPS for simple decision tables vs 60K TPS without optimization. Not a bad improvement for a week's work.