My struggles in understanding and learning about Object Oriented design, and the tools and knowledge I've taken from them.

Tuesday, February 1, 2011

ID3 Decision Tree in C#

This blog references an executable and source code available for download. To download the referenced executable, Click Here

To download source, Click Here

It seems like just about anything "new and shiny" can distract me from the last "new and shiny" thing that I decide to devote all my energy to learning about and/or building. Case in point: AI (artificial intelligence). There are all kinds of ways to use AI, and there's all kinds of subtechnologies that make up the field of Artificial Intelligence.

Having said that, the first technology that has been easy enough for me to get my head around is "Decision Tree Learning." A decision tree is basically an algorithm that takes a set of collected data, the outcomes of each collection of inputs, and builds a tree to demonstrate the best input variable for a particular output. From my college days, I had lots of statistics classes that were very similar to this concept - particularly, regression analysis.

A decision tree ends up looking a bit like the image below:



An example of a decision tree implementation is, if you want to gather a bunch of information about what inputs affect whether or not it's going to rain, a decision tree can build a graphical representation on whether it will rain or not based on the inputs you provided. Inputs for a decision tree may include cloudiness, temperature, relative humidity, what the weather report says, etc. The decision tree algorithm should calculate the input that best guesses whether or not it's going to rain, and then finds the next best variable, etc, until a tree has been built that demonstrates a decision tree for determining whether or not it's going to rain.

To demonstrate with words: If the weather report says it will rain, and if the relative humidity is high, and it's very cloudy, then the output will be rain.

Well, you get the idea...I don't completely have my head around the various ways to build a decision tree. In fact, I'm quite a novice when it comes to demonstrating decision tree algorithms.

The reason that I'm writing this blog entry at all is because I found a pretty decent implementation of an ID3 decision tree in C# at codeproject.com. But when trying to use it to suit my needs, I wasn't able to make the original code suit my needs. There were a few problems with the code that I felt compelled to fix.

I don't know much about fellow who wrote this particular C# ID3 algorithm except that his screen name is "Roosevelt" and he's from Brazil - and the source code is commented in Portugese. And he wrote it a long time ago.

The funny thing is that this was really the only C# code I could find on the subject, and this code was written 7 1/2 years ago. ID3 is not the most recent technology in Decision trees (evidently, C4.5 is a more recent iteration of decision tree learning). Sure, Java code exists, but I'm not a Java developer, so some of the base libraries referenced made it difficult for me to translate the Java code to C#.

So, I decided to see if I could take Roosevelt's code and make it less rigid (you see, all of the source data and attributes are statically defined in the code. There's no way to configure the data without recompiling, and that just won't do). In my iteration, the decision tree can be built dynamically based on the source data - it does not rely on statically defined concepts within the code, anymore (and output is not in Portugese, either).

I did some other refactoring of the code as well, and made it a bit better - probably still not quite right, but I think it's quite a bit better.

To download the executable I built, Click Here

To download source, Click Here

Now that I've gotten my fill of this "new and shiny" thing, I can get back to the last "new and shiny" thing I was working on, and maybe some day (hopefully sooner than 7 1/2 years from now), someone who is zealous enough to improve my code can do so, and share it with the world. For now, this is my contribution.

10 comments:

Aziz ur Rahman said...

I am unable to download the source. Would you please check the download link.

Unknown said...

Aziz: I hadn't shared the source properly. It should be fixed now

Swerrrvin said...

Hello. Great article. Unfortunately, I can't download the executable for testing. "Sorry, we are unable to scan the file for viruses"...

Swerrrvin said...

Also, would you mind posting some sample data that you have used for testing? Thanks!

zahid said...

Good Work. But comments are still in non-English language.if you can translate it,It will be more helpful.
Regards,

Matej Hlatky said...

Hi

I was using your code and I have few remarks:
1. This implementation does not permit more than two classes.
2. Code has built-in dependency on the "result" column, true and false values.
3. Try this sample data:
vote.data and vote.test (structure) - this implementation does not create complete decision tree, because when I validate against training data, the success ratio is 94% (should be 100% when testing with training set)!

gm said...

I've tried the votes data on this ID3 however some of the attributes were repeated in the tree. Is this OK? I have the impression that an attribute can only be used ones in the tree.

Desh said...

Nice work...
please give me example text file to test the project..Thanks...

Unknown said...

I have an isuue it doesnt give right result on the following example which is quite popular.
Outlook,Temprature,Humidity,Wind,result
Sunny,Hot,High,Weak,FALSE
Sunny,Hot,High,Strong,FALSE
Overcast,Hot,High,Weak,TRUE
Rain,Mild,High,Weak,TRUE
Rain,Cool,Normal,Weak,TRUE
Rain,Cool,Normal,Strong,TRUE
Overcast,Cool,Normal,Strong,FALSE
Sunny,Mild,High,Weak,FALSE
Sunny,Cool,Normal,Weak,TRUE
Rain,Mild,Normal,Weak,TRUE
Sunny,Mild,Normal,Strong,TRUE
Overcast,Mild,High,Strong,TRUE
Overcast,Hot,Normal,Weak,TRUE
Rain,Mild,High,Strong,FALSE

saddam said...

can i have any c++ source code for this algorithm

Followers

Search This Blog

Powered by Blogger.