Catherine: pyOraGeek: don't need no stinking rules engine

Wednesday, June 24, 2009

don't need no stinking rules engine

There's a whole class of programs called "rules engines". The idea is to remove the details of a process from the hard-code of the program, store them externally, and view/modify them easily. The engine then converts the rules, stored in some sort of custom format, back into an executable form at runtime.

In my experience, Python is an effective rule engine. Thanks to Python's readability, you can store business rules as snippets of Python code - in textfiles, a database table, or wherever you prefer - and business users should be able to read them comfortably. After that, a very lightweight Python program can load the rules and the relevant data and use exec() or eval() to apply the rules to it.

One of my main projects is an example of this. It's program that synchronizes data between two Oracle databases. That sounds easy, but business details complicate it enormously:

Table and column naming, structure, and normalization differ
Only some rows are transferred, according to a complex set of business rules
Only some columns are transferred. Column values are combined, split, truncated, have functions applied, etc. Again, governed by a jungle of business rules
The business rules change continuallyRules must be documented. Letting documentation get out of synch with implemented rules is very bad.
Users may demand explanations for each decision made by the program, down to the row and column level

My first take on the problem was a large hard-coded PL/SQL procedure. What a nightmare!

Later, I rewrote the rules as snippets of Python. Each rule is stored a database table along with the dates it takes effect and expires, the person authorizing the rule, and a justification. This readable, self-documenting set of rules can also answer questions like, "Why did things change since last month?".

Unfortunately, I didn't know much about object-relational mappers when I wrote it, so the program has clunky data-fetching code. I'm currently working on a third version of the program that uses SQLAlchemy; the resulting program is very short. Broadly, here's what it does:

- Queries the row-level and column-level rules from their respective tables

- Fetches a row from the local database (ours) and the corresponding row from the remote database (theirs)

- The heart of the engine:


data = {'ours': ours, 'theirs': theirs}

for row_rule in row_rules:

    if not eval(row_rule, data):

        return

for column_rule in column_rules:

    exec(column_rule, data)

Actually, the data dict also includes definitions of a few functions that some of the rules invoke. The function names are chosen to be self-explanatory to business users. For example:


def fiscalYear(inDate):

    if inDate.month > 9:

        result = inDate.year + 1

    else:

        result = inDate.year

    return result



data = {'ours': ours, 'theirs': theirs, 'fiscalYear': fiscalYear}

I suppose it wouldn't be too hard to put the function definitions themselves in among the rules, then include locals() in with data, as long as execution order is controlled (easily done by putting execution_order columns in the rules tables). It hasn't been necessary for my project.

- Now row_rules has eval()-able entries like::


  id  start   end     code                      authorized   reason

   1  1/1/06  7/1/09  ours.funded == "Y"        Bob          I said so

and column_rules has exec()-able entries like::


  id  start   end     code                           authorized   reason

   1  2/2/08          if (ours.value > 1000):        Steve        to annoy Bob

                          theirs.value = ours.value

- Add some logging and a "test-run" capacity which reports on the changes without actually performing them. (It uses sqlalchemy.orm.attributes.get_history() for this; make sure to set autoflush=False if you use this, or intermediate flushes might clear the history.)

I suppose I could try writing a sample rules-engine implementation in simple, general terms, that people could crib from for their own "rules engine" applications. I wonder if that would be helpful, or if just the general idea is enough guidance.

6 comments:

Unknown said...: To me the general idea is enough guidance. I am working in the java field and have just developed a scripting engine because 'it came over me'. Unfortunately I had no idea for what it may be good for. Now I know. Your approach seams very reasonable to me. I will give it a try.; 8:33 AM
Art Vandalay said...: You have my admiration that your post doesn't contain "DSL" once.; 9:54 AM
Guy Davis said...: Good points. I'm conflicted as to your conclusion though. I'm a big fan of Python, using it in many of my personal projects and at past jobs. However, I'm working on a rule-based data manipulation (profiling, cleaning, merging, etc) tool for my current employer. Our users require a GUI for rule-config and aren't ready for Python scripting.; 2:19 PM
Amr Ellafy said...: I disagree with you. Rules may a bit complex and storing them back to txt files or even the database did not solve the problem. Though i love Python a lot i find some Rules Engine syntax a lot readable when it comes to "Rules"; 4:21 PM
Pranab said...: I am doing something similar using ruby DSL (domain specific language). Rules are defined as snippets of ruby DSL code stored in the database. At run time they are evaluated to generate rules that plug into a real ruby based rule engine.; 7:53 PM
Michele Mauro said...: Rule engines are far more complex of what you think. Maybe your project is better served with Python, but you should at least check out JBoss Drools to understand what a rule engine is and does.
I use in projects where decisions can't be made in code (e.g. BPM projects) and conditions must be part of the configuration, and not of the program. But that's only my preferred use case.

Michele; 11:59 PM

Wednesday, June 24, 2009

don't need no stinking rules engine

6 comments:

About Me