My struggles in understanding and learning about Object Oriented design, and the tools and knowledge I've taken from them.

Tuesday, February 1, 2011

ID3 Decision Tree in C#

This blog references an executable and source code available for download. To download the referenced executable, Click Here

To download source, Click Here

It seems like just about anything "new and shiny" can distract me from the last "new and shiny" thing that I decide to devote all my energy to learning about and/or building. Case in point: AI (artificial intelligence). There are all kinds of ways to use AI, and there's all kinds of subtechnologies that make up the field of Artificial Intelligence.

Having said that, the first technology that has been easy enough for me to get my head around is "Decision Tree Learning." A decision tree is basically an algorithm that takes a set of collected data, the outcomes of each collection of inputs, and builds a tree to demonstrate the best input variable for a particular output. From my college days, I had lots of statistics classes that were very similar to this concept - particularly, regression analysis.

A decision tree ends up looking a bit like the image below:



An example of a decision tree implementation is, if you want to gather a bunch of information about what inputs affect whether or not it's going to rain, a decision tree can build a graphical representation on whether it will rain or not based on the inputs you provided. Inputs for a decision tree may include cloudiness, temperature, relative humidity, what the weather report says, etc. The decision tree algorithm should calculate the input that best guesses whether or not it's going to rain, and then finds the next best variable, etc, until a tree has been built that demonstrates a decision tree for determining whether or not it's going to rain.

To demonstrate with words: If the weather report says it will rain, and if the relative humidity is high, and it's very cloudy, then the output will be rain.

Well, you get the idea...I don't completely have my head around the various ways to build a decision tree. In fact, I'm quite a novice when it comes to demonstrating decision tree algorithms.

The reason that I'm writing this blog entry at all is because I found a pretty decent implementation of an ID3 decision tree in C# at codeproject.com. But when trying to use it to suit my needs, I wasn't able to make the original code suit my needs. There were a few problems with the code that I felt compelled to fix.

I don't know much about fellow who wrote this particular C# ID3 algorithm except that his screen name is "Roosevelt" and he's from Brazil - and the source code is commented in Portugese. And he wrote it a long time ago.

The funny thing is that this was really the only C# code I could find on the subject, and this code was written 7 1/2 years ago. ID3 is not the most recent technology in Decision trees (evidently, C4.5 is a more recent iteration of decision tree learning). Sure, Java code exists, but I'm not a Java developer, so some of the base libraries referenced made it difficult for me to translate the Java code to C#.

So, I decided to see if I could take Roosevelt's code and make it less rigid (you see, all of the source data and attributes are statically defined in the code. There's no way to configure the data without recompiling, and that just won't do). In my iteration, the decision tree can be built dynamically based on the source data - it does not rely on statically defined concepts within the code, anymore (and output is not in Portugese, either).

I did some other refactoring of the code as well, and made it a bit better - probably still not quite right, but I think it's quite a bit better.

To download the executable I built, Click Here

To download source, Click Here

Now that I've gotten my fill of this "new and shiny" thing, I can get back to the last "new and shiny" thing I was working on, and maybe some day (hopefully sooner than 7 1/2 years from now), someone who is zealous enough to improve my code can do so, and share it with the world. For now, this is my contribution.
Post a Comment

Followers

Search This Blog

Powered by Blogger.