My struggles in understanding and learning about Object Oriented design, and the tools and knowledge I've taken from them.

Friday, December 2, 2011

NPOI Wrapper

There's a utility that I find quite useful, and probably use it as much as any other 3rd party library. It's called NPOI, which is a .Net version of a Java project called POI. NPOI offers the ability to read and write Excel 97-2003 files. NPOI does not offer Excel 2007 and later manipulation (as far as I know), but that's for a different post.


***Update: NPOI now supports Excel 2007 and later. See this post for more details.

Prior to stumbling onto NPOI, I used to use Interop. I would occassionally use ADO, as well; however, both Interop and ADO were clunky, and had some pretty major drawbacks, including slowness, and awkward implementation. I'm also not a fan of using COM directly, because I'm not very good at it. So, eventually I found NPOI.

The problem with NPOI, (at least, my problem with NPOI), is that even though it is a fantastic tool, I had a tough time finding documentation all in one place. On top of that, I didn't find its external interfaces particularly intuitive.

So, what does a programmer do when he finds a tool useful, yet lacking in documentation and intuitiveness? He (or she) starts building a wrapper! And that's exactly what I did.

You can download the wrapper here.

The wrapper, as it is, covers the bulk of what I need to do with Excel (save for the lack of Excel 2007 support - this is because Microsoft changed the file format of Excel files in 2007 to a XML/binary format). However, there's probably things that some people feel inclined to do with Excel that this wrapper does not cover. If you're so inclined to use this wrapper, and add functionality to cover your needs, feel free to comment on this post.

I'll spare you some of the epiphanies and stories about how I had to track down various functionality of NPOI, and just get to the meat of how to use it, in a variety of examples.

Example 1 - Creating a new file
   ExcelWrapper excel = new ExcelWrapper();
   excel.CreateFile(@"c:\ExcelFile.xls");

Example 2 - Opening an existing file, then saving it
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   excel.SaveFile();

Example 3 - Getting cell values (note, as opposed to Interop, NPOI is 0-based, not 1-based)
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   string a1 = excel.GetCellValue(0, 0);
   string b1 = excel.GetCellValue(0, 1);
   string b2 = excel.GetCellValue(1, 1);
   string d4 = excel.GetCellValue(3, 3);

Example 4 - Getting a sheet as a System.Data.DataTable
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   DataTable dt = excel.ToDataTable();

Example 5 - Working with different sheets
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   excel.SetActiveSheet("Sheet2");
   excel.SetActiveSheet(1); /// The sheet collection in a workbook is 0-based
   excel.DeleteSheet("Sheet3");
   excel.CreateSheet("Sheet99");
   excel.SaveFile();

Example 6 - Manipulating Cells
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   excel.WriteCellValue(0, 0, "This is A1");
   excel.WriteCellValue(1, 1, "This is B2 - bolded and italicized", true, true);
   excel.HighlightCell(1, 1, ExcelColors.Yellow);
   excel.SetCellFontColor(1, 1, ExcelColors.Blue);
   excel.SaveFile();

Example 7 - Converting Excel Cell Names to 0-based Row and Column
   ExcelCell cell = new ExcelCell("B23");
   ExcelWrapper excel = new ExcelWrapper(@"c:\ExcelFile.xls");
   excel.WriteCellValue(cell.Row, cell.Column, "Cell B23 is set!");
   excel.SaveFileAs(@"c:\ExcelFile2.xls");

Example 8 - Working with formatting and styles
   ExcelWrapper excel= new ExcelWrapper("h:\\wrapperTest.xls");            
   ExcelStyle style = new ExcelStyle();
   style.BackColor = ExcelColors.Blue;
   style.ForeColor = ExcelColors.Yellow;
   style.IsBold = true;
   style.IsItalics = false;
   style.BorderType = BorderTypes.Dashed;
   style.BorderTop = true;
   style.BorderBottom = true;
   style.FontFace = CommonFonts.Calibri;
   style.FontSize = 15;
   excel.WriteCellValue(6, 2, "6, 2 coordinates", style);
So, there you have it. This is my NPOI wrapper, along with some examples. I'll probably add more to this post as it grows, but for now, this is what I've got.

Thursday, March 17, 2011

The Vendor, The Consumer, and Loose Coupling

I recently had an epiphany that has made me a better programmer. Seeing as the intent of this blog is to share my experiences in my journey with object oriented programming, and considering this discovery relates to object oriented programming, I thought I would share.

We all know that code should be loosely coupled. That is, objects should avoid knowing too much about unrelated objects. If an object shouldn't need another object to exist and function, then why should that object know about it at all?

There's a lot of nuance in this concept, but part of it goes back to programming in layers.

So what happens when this nuance comes into play?

For a few years now, I've had a solution that worked well enough, but violated loose coupling principles.

I've dealt with this lack of eloquence for a time, partly because I didn't really have a better solution, and partly because it worked well enough.

But I've got a solution now, and it was something that I knew about all along: Event Handlers.

An Event Handler is the perfect solution for loosening up the coupling of your objects, because it puts the flow in the proper perspective. In code, there are vendors, consumers, and states. The Vendor in code is the code that has methods, attributes, etc. The Consumer is the code that uses another object's methods, attributes, etc. A state is sort of an abstract concept that describes when something happens - for example, when an object is out of work to do, or when a button gets clicked, or when an item is added to a collection.

A Vendor should generally have to know as little as possible about the consumer. This is a principle that my code has often failed to follow religiously. And the effect is tightly coupled code.

Consider a System.Windows.Forms.Button object. The Button object knows very little about other controls on a form, and it certainly doesn't know anything about your custom controls that do all sorts of funky stuff. And yet, you can make it so that a click of a button has an impact on all the other controls on the form.

How can this be?

Well, the short answer is by hooking an event handler up to the Clicked event of the button.

This is something a programmer learns within about 10 hours of programming in WinForms or WPF or whatever UI tool he (or she) has. But there's a bigger principle at play that a 10 hour old programmer probably doesn't know, and that is this loose coupling principle.

The Button has a Clicked event that some consumer (in this case, the Form that the Button is on) can subscribe to, and pass a delegate function to. So when the event fires (the button gets clicked), the consumer delegate that is hooked up to the vendor's (Button's) event gets called, and any behavior defined in the consumer's custom delegate gets executed.

This model is a beautiful one, and the underlying principle opens up a whole world of more eloquent solutions.

Consider (some version) of my real life example.

In an application I built about 8 months ago, I have an object called ActivityMonitor. The ActivityMonitor object keeps track of what's going on in the application, and exposes that information when a TCP/IP client requests it.

So, the ActivityMonitor looks something like this:

class ActivityMonitor
{
ActivityCollection _activities;

public void AddActivity(string activity)
{
Activity a = new Activity(activity);
_activities.Add(a);
}

public string GetActivityString()
{
return _activities.ToString();
}
}

So, before considering Event Handlers as a solution for this problem, when non-related objects did something, and I wanted the ActivityMonitor to know about it, I would send the ActivityMonitor to the consumer object as a parameter. This is a tightly coupled pattern, and as I see it now, something I'll avoid in the future.

For example:

public class BigSpecialObject
{
ActivityMonitor _monitor;

public BigSpecialObject(ActivityMonitor monitor)
{
_monitor = monitor;
}

public void DoStuff()
{
_monitor.AddActivity("About to do a bunch of stuff");
///Do a bunch of stuff
_monitor.AddActivity("Done doing a bunch of stuff");
}
}

When I look at BigSpecialObject, I can infer a few things:

1. The consumer of BigSpecialObject knows about ActivityMonitor
2. BigSpecialObject is tightly coupled with ActivityMonitor
3. It didn't have to be this way

The issue of state comes into play. Presumably, at instantiation time, BigSpecialObject is not done doing stuff. After BigSpecialObject.DoStuff() is called, at some point, BigSpecialObject is done doing stuff. This is a great example of where an EventHandler is not only appropriate, but it is the optimal solution.

So, with a couple extra lines of code, I can really "bust some heads" (to use a Ghostbusters reference):


public class BigSpecialObject
{
public event EventHandler WorkRequested;
public event EventHandler WorkCompleted;


public BigSpecialObject()
{
/// Notice that BigSpecialObject
/// knows jack-squat about ActivityMonitor
}

public void DoStuff()
{
this.alertWorkRequest();
///Do a bunch of stuff
this.alertWorkDone();
}

private void alertWorkRequest()
{
if (this.WorkRequested == null)
return;

this.WorkRequested(this, new EventArgs());
}

private void alertWorkDone()
{
if (this.WorkCompleted == null)
return;

this.WorkCompleted(this, new EventArgs());
}
}

So, the consumer of BigSpecialObject would take advantage of this loose coupling by doing the following:

public class Consumer
{
ActivityMonitor _monitor;

BigSpecialObject _bso;

public Consumer()
{
this.initialize();
}

private void initialize()
{
_monitor = new ActivityMonitor();
_bso = new BigSpecialObject();

_bso.WorkRequested += new EventHandler(onBSOWorkRequested);
_bso.WorkCompleted += new EventHandler(onBSOWorkCompleted);
}

private void onBSOWorkRequested(object sender, EventArgs e)
{
_monitor.AddActivity("About to do a bunch of stuff");
}

private void onBSOWorkCompleted(object sender, EventArgs e)
{
_monitor.AddActivity("Done doing a bunch of stuff");
}

public void DoSomeConsumption()
{
_bso.DoStuff();
}
}

So, there you have it. Pattern versus Antipattern. Having the Consumer class alert the ActivityMonitor when the work is done is better, because that means that BigSpecialObject is more reusable. And even though we added more lines of code (believe it or not, I try to find solutions to problems that minimize lines of code), this is a better solution, because of the looser coupling.

In looking at code I've written in the past, it's completely littered with examples of tight coupling. Well, I guess it's time to get to work!

Tuesday, March 1, 2011

Stock Predictor C# Utility

This blog references an application and source code.

To download this application, Click Here

To download the source, Click Here

Evidently, some people buy and sell securities on markets in the attempt to make money. These folks are called traders, and everyone is looking for the next big strategy to estimate what a particular security (also called Stock) is going to do in the future.

I'm not particularly interested in this, but a friend of mine brought up a method many traders use to predict when it is time to buy or sell a stock. The strategy involves looking at the current stock price, and comparing it to its 200-day moving average. When the stock price is below the 200 day moving average, has a history of being above its 200 day moving average, and is on an upward trend, that is a good indication that it might be time to buy.

When some version of the converse is true (stock is above the 200 day average, has a history of being below the 200 day moving average, and is on a downward trend), then it's time to sell.

Below is a screenshot of the utility I built



I'm not an active trader, and I know very little about how this whole capitalism thing works, but I know how to build an application that can represent this paradigm. I built this in C# Winforms. So, that's what I did.

I used the ZedGraph library to build a graph for me, and I built some logic around some of the concepts.

A few warnings:
1. I use Yahoo's stock history to grab prices. Usage of this tool may or may not violate Yahoo's terms of use agreement.
2. I make no guarantees about the correctness of the data derived from this tool, and this tool in no way constitutes a recommendation of a security transaction.
3. The code quality of this tool is not as high as it could be. It was sort of a rush job, because I didn't have much time to work on it before I had to start working on my next project.

I hope someone can use this little tool and get some value out of it. Or, if you decide to improve on it, let me know, and I'll post any fixes you make.

Wednesday, February 16, 2011

User Experience

Tim, Hero Extraordinaire
I just completed a massive (well massive for the company I work for, anyway) process of importing some 6 million records into the CRM software we use. It was an arduous process, but the work of one person (me, of course), over the course of about a week, saved a half a dozen people several weeks, or even months of data entry work. It was a major win for both the company and the department I work in.

The Shocker
Because I'm a bit of a narcissist (or maybe even a diligent worker), I took some time to get a pulse check on how some of the end users were feeling about the software transition - of course, I was expecting to get pats on the back, offers of free beer, first-born children, and a lifetime of gratitude. What I got instead was feedback from several people that it had been challenging and frustrating experience for them, even some of those people who were saved all that tedious time of data entry!

Wait a minute. What? What are you talking about? I saved you people zillions of keypresses, and that's your response? That this has been a frustrating process? Don't you people remember the last transition like this that you had to deal with (before I joined the company, of course)? Printouts of data entry items had to be put in garbage bags for queuing because you ran out of boxes!

So, the narcissist (or diligent worker...however you wish to label me) in me felt quite inclined to probe into why this was a frustrating experience.

I got responses like:

...The screen layout isn't quite right

...When I press the tab key, it takes me to a different field than I want it to.

...This screen's background color gives me a headache.

If you've ever built any sizeable pieces of software, you know that these issues raised by the end-users are so mundane, that it's hard to give them priority over the bigger pieces of functionality that need to be built. Background color, control placement, tab index...all of these things often end up being afterthoughts. Finding the right decision tree algorithm, or appropriate layering of your application's architecture are much sexier problems to solve.

But to the end user, they make up a huge element of their experience.

Lessons Learned
I don't really have any pearls of wisdom to provide around this. After all, I'm certainly not an artist, and user interface elements are not particularly a strength of mine. But I did take something important from this process.

Firstly, when you take work away from someone who never knows they were going to have to do that work in the first place, the work you take away from them is invisible to them. They don't care, and neither would you if you were in their position. But that's the burden you bear as a person whose role it is to create and optimize processes that benefit end users.

Secondly, pay attention to the user experience.

I didn't build the CRM the company uses. The things the users mentioned about their experience were all configurations of the software, and after paying attention to what the end-users were saying and changing the settings to what were optimal for them, we ended up getting a lot fewer complaints about this process.

Aftermath
Since the CRM transition, I've pushed out a couple new features in an internal application that the marketing department uses, and for those features, before they got rolled out to production, I spent some time looking over the shoulder of an end-user who was testing it for me. The 20 minutes I spent doing that led me to make a few minor tweaks to the user experience, and ended up saving the user several mouse clicks and about 30 seconds every time they used the features I built. This 30 seconds, multiplied by a few hundred times per year, will be paying dividends for years. And the user will enjoy using the software because it doesn't subconsciously make them want to start the building on fire because of all the darn mouseclicks their main software requires.

Conclusion
There are lots of themes in software development. If we're not busy creating solutions that look for problems, we're spending a lot of time solving problems for ourselves, rather than for the consumers of our products. User experience matters more than we think, and minor tweaks to the user interface help us to "put a bow" on our masterpieces. As software developers, it's important to make sure that we pay attention to the user experience, and find opportunities to decrease mouse clicks, make the user interface more aesthetically pleasing, and create a user interface flow that is intuitive.

Tuesday, February 8, 2011

A PHP Data Access Layer that uses Custom DataTable

In my last post about PHP Code that looks like .Net Code, I demonstrated a DataTable object that I built in PHP.

I'm really digging the potential that has bought me, so I created a DataAccessLayer object that has a method that can return query results as a DataTable.

To download the updated "library" I'm building, Click Here.

The method I built is in the class.DataAccessLayer.php file. The contents of the method are below:

///Gets the query results as a datatable
public function QueryResultsAsDataTable($query)
{
$this->connectToDatabase();

if( substr($query, 0, 6) == "SELECT" )
{
$currentRow = 0;
if(!strpos($query,"'"))
{
$query = mysql_real_escape_string($query);
}

$this->_dataSet = mysql_query($query, $this->_connection);

$returnDataTable = new DataTable();

if($this->_dataSet)
{

if (mysql_num_rows($this->_dataSet) > 0 && $currentRow == 0)
{

for ($i=0; $i < mysql_num_fields($this->_dataSet); $i++)
$dt->AddColumn(mysql_field_name($this->_dataSet, $i));

}


while($this->row = mysql_fetch_array($this->_dataSet))
{
$dr = $returnDataTable->NewRow();


for( $this->columnCount = 0; $this->columnCount < mysql_num_fields($this->_dataSet); $this->columnCount++ )
{
$columnName = mysql_field_name($this->_dataSet, $this->columnCount);
$dr->AddValue($columnName, $this->row[$this->columnCount]);
}


$returnDataTable->AddRow($dr);

$currentRow++;
}


return $returnDataTable;
}

throw new Exception('Error getting data: ' .mysql_errno($this->_connection) . ': ' . mysql_error($this->_connection));
}
}


So, I could issue the following code to my data access layer, and I would be able to iterate through the results:


$dt = $_dataAccessObject->QueryResultsAsDataTable("SELECT firstname, lastname FROM people WHERE id=3");

$firstName = "";
$lastName = "";

for($i = 0; $i < $dt->Rows->GetCount(); $i++)
{
$firstName = $dt->Rows->GetValue("firstname");
$lastName = $dt->Rows->GetValue("lastname");
print("FirstName is " . $firstName . " - LastName is " . $lastName);
}


So there you have it, a simple data access layer that returns a (somewhat) sophisticated object in a quick, non-obscure way.

Monday, February 7, 2011

PHP Code that Looks like .Net code

This blog references PHP files available for download here: Click Here

In programming, I think it's important to learn. Learning can involve new aspects of your chosen language, or new languages. I've been thinking about my PHP days lately. I really know a lot more about programming now then I did when I was actively programming in PHP.

When I go back and look at some of the code I was producing in PHP, I cringe a bit at all of the principles I see myself violating. Most of the PHP code I wrote were big balls of mud.

So, as an academic experiment, as well as to try to re-sharpen some of my PHP skills, I decided to write some base libraries that I may (or may not) eventually use someday.

As with any language (programming or verbal), or with any foreign concept, the first thing to do is to find metaphors that allow us to relate the themes of the thing we're learning to the things we already know. The classic example in programming is the Hello World application.

Well, I'm not too interested in creating a "Hello World" for PHP, because I'm already familiar with a little bit of how PHP works. But what I wanted to do was to build some PHP code that looks (at least a little bit) like .Net code.

So, after thinking a little bit about what libraries I use most in .Net, I centered around System.Data, in particular DataTable (and DataRow, DataColumn, DataRowCollection, and DataColumnCollection). I haven't created DataSet yet, because there's only so much time in the day.

After thinking a bit more about how I would build some of this PHP code that looks like (has a similar interface) the System.Data .Net library, I concluded that I wanted to have a base Collection object. So I created it, and put it into a file called System.Collections.php. Below is a diagram of the code I created:



As you can see, in my PHP code, a DataTable has a DataRowCollection and a DataColumnCollection, as well as a couple of the external interface behaviors of .Net System.Data.DataTable (NewRow() and AddRow()); however, because of core differences between C# (or I guess .Net languages) and PHP, there are a few changes. Below are the issues that I encountered in this mini-project:

1. I couldn't figure out if I could use an indexer type property to have square brackets ([]) to represent an indexed item in a collection. Therefore, I left that for later.

2. There are some pretty serious differences between C# and PHP in terms of how they type variables. In C#, generally you generally define a variable as a type on the left side of the variable name, and you initialize it on the right side (eg int a = 3; OR Person p = new Person()). You don't have to do that in PHP, so you can initialize a variable as any type, which throws a bit of a wrench into the model of how I see the world.

3. PHP (at least 5.0) supports a lot classic OOP concepts such as inheritance, interfaces, polymorphism, Exception handling, etc. Obviously, the syntax of how to do this is different than it is in C#, so I dealt with some of the pains of the languages' differences.

4. PHP (from my understanding) does not support generic variables. So I can't use List variables the way I like to in C#. So, that's why DataRow and DataRowCollection both inherit from my Collection class.

But, I think my finished product gives me a basis to build on for later. So, if I wanted to initialize a DataTable in PHP code, I would include a reference to System.Data.php, and do something like below:
 
include 'System.Data.php';


$dt = new DataTable();
$dt->AddColumn("FirstName");
$dt->AddColumn("LastName");
$dt->AddColumn("Age");


$dr = $dt->NewRow();
$dr->AddValue("FirstName", "Tim");
$dr->AddValue("LastName", "Claason");
$dr->AddValue("Age", 29);


$dt->AddRow($dr);

Notice in the above that, unlike c#, I can't do $dr["FirstName"]. Maybe there's a way to do this in PHP, but in my (short) research time, I didn't find it.

Obviously, this interface isn't exactly the same as .Net's System.Data, but it's close enough to give me a decent metaphor between PHP and C#.

Please note that I haven't tested this code yet, so there are no guarantees as to whether or not it works. If it does work, expect more blog entries on this topic.

To download the PHP Code I've got so far, Click Here

Wednesday, February 2, 2011

Consistency in Coding (And Databases)

I've been thinking about consistency in coding, and coding standards. The thoughts I've been having on this concept have been sparked by my basic human desire to have consistency in my life, along with work I've been doing with the CRM that my company uses.

I've been doing a whole bunch of work on the database of this CRM, basically copying records across a couple dozen tables. I've noticed lots of inconsistencies and design problems in the database, and I thought I would enumerate what those problems were, and try to find the lessons that I can take from those design problems:

1. The database employs the use of natural keys, as opposed to surrogate keys. An example of a natural key is the combination of a first name, middle name, last name, and social security number to identify the uniqueness of a record. An example of a surrogate key is giving a unique number to each record, and targeting that unique number for database selects, updates, etc. I believe that employing natural keys is a poor design decision, but not everyone agrees with me on that; however, I will never ever ever build a database that relies on natural keys.

2. There are a few tables in the database that rely on surrogate keys - an unnecessary inconsistency because the table has fields in it that allow it to link to its parent table via the natural key

3. There are major inconsistencies in column naming. Some date columns are suffixed with _ISO, some abbreviate Date with "DT" and some spell "DATE" out in the column name. There are other columns that have similar naming inconsistencies, too, such as "Account" and "Acct" or "Code" versus "CD" or "Event" versus "EVT". I am always mixing these up, and all that confusion (which I'm sure occurs internally in the company that makes this CRM, as well) could have been avoided by simply making naming conventions more consistent.

4. Data Duplication. The database has datetime stamps to indicate when the record was created, and it also has an integer representation of the time it was created. I'm sure this integer representation is a carryover from legacy code and technology, but the column is still there, taking up space, and wasting resources.

5. Lack of normalization. The database has a UDF recorded value table that has about 200 columns, and at least 2/3 of those columns are always null. That table should be refactored/pivoted, which would reduce design complexity, and the amount of code required to represent that table -- not to mention memory required to store that data.

6. Too many tables with too many columns. The average table seems to have at least 75 columns (with several having more than 200)...that's a database design smell, in my mind (and nose).

There are other things too, but you get the picture.

So, why do these problems happen? In my experience working with software design, there's a number of reasons:

1. Legacy technical debt. Carryover from older technologies that didn't have some of the bells and whistles we have now are a major reason why inconsistencies occur in the software - it's usually easier to write a wrapper to interact with old technologies than it is to completely re-write them.

2. Too many people. Different people working in different areas of the product (along with no naming conventions/standards/whatever). If there isn't a set standard on how datetime columns, bit columns, recurring-theme columns are named in the database, then there's going to be inconsistencies.

3. Learning. No one knows everything, and I know less than most people; however, I'm learning all the time, and I am constantly finding things I did months or years ago that I would do different now. This happens all the time in software, and I see plenty of examples in the database where it appears this happened.

4. Urgency to the market. Let's face it, software is meaningless/worthless if it never gets out of design, construction, or testing phases. Not to mention that companies have payrolls to meet. There are always trade-offs between design excellence and need to get a product to the market. And sometimes those who control the money decide that they'd rather deal with higher support costs 3-6 months from now than higher development costs today.

5. Dysfunctional development process. When the development process doesn't support what the software is trying to do, and the team isn't "optimized" to use developers' strengths in the right way, it can lead to "cowboy programming".

What to do about it?

There's all kinds of things to do. In fact, there's people and companies who make their living off of solutions to these problems. I don't have all the answers, but here's what I do to avoid some of these problems:

1. Define a clear coding standard. You can borrow from Microsoft, online forums, or convene with other developers to make decisions on how members, attributes, methods, interfaces, classes, etc should be named. There are also plenty of online and book resources on style guidelines.

2. Keep learning. Learning things, in the short term, leads to inconsistencies, but helps push your code, database, or whatever, to be as good as it can be

3. Incremental refactoring. Rome wasn't built in a day, and neither is a 100,000 line application. Baby steps is the way to go. Slowly, but deliberately, refactor to achieve consistency.

4. Choose the right software development method. Waterfall, SCRUM, TDD, XP, some combination of all of these...Team buy-in, and a methodology that suits the skill set of your team (including your business analysts and QA people, along with developers) is important to produce quality products.

5. Pragmattic Programming. I love the book "The Pragmattic Programmer," because it outlines the type of developer we should all strive to be. I wrote a blog on "Writing Code That Writes Code". In it, I demonstrated an application that helps me produce very consistent code.

Like everything else in life, whether it be good health, a good family life, a good career, etc, there's no silver bullet. Practicing fundamentals, self-improvement, humility, and understanding that you will never be perfect is the way forward.

Tuesday, February 1, 2011

ID3 Decision Tree in C#

This blog references an executable and source code available for download. To download the referenced executable, Click Here

To download source, Click Here

It seems like just about anything "new and shiny" can distract me from the last "new and shiny" thing that I decide to devote all my energy to learning about and/or building. Case in point: AI (artificial intelligence). There are all kinds of ways to use AI, and there's all kinds of subtechnologies that make up the field of Artificial Intelligence.

Having said that, the first technology that has been easy enough for me to get my head around is "Decision Tree Learning." A decision tree is basically an algorithm that takes a set of collected data, the outcomes of each collection of inputs, and builds a tree to demonstrate the best input variable for a particular output. From my college days, I had lots of statistics classes that were very similar to this concept - particularly, regression analysis.

A decision tree ends up looking a bit like the image below:



An example of a decision tree implementation is, if you want to gather a bunch of information about what inputs affect whether or not it's going to rain, a decision tree can build a graphical representation on whether it will rain or not based on the inputs you provided. Inputs for a decision tree may include cloudiness, temperature, relative humidity, what the weather report says, etc. The decision tree algorithm should calculate the input that best guesses whether or not it's going to rain, and then finds the next best variable, etc, until a tree has been built that demonstrates a decision tree for determining whether or not it's going to rain.

To demonstrate with words: If the weather report says it will rain, and if the relative humidity is high, and it's very cloudy, then the output will be rain.

Well, you get the idea...I don't completely have my head around the various ways to build a decision tree. In fact, I'm quite a novice when it comes to demonstrating decision tree algorithms.

The reason that I'm writing this blog entry at all is because I found a pretty decent implementation of an ID3 decision tree in C# at codeproject.com. But when trying to use it to suit my needs, I wasn't able to make the original code suit my needs. There were a few problems with the code that I felt compelled to fix.

I don't know much about fellow who wrote this particular C# ID3 algorithm except that his screen name is "Roosevelt" and he's from Brazil - and the source code is commented in Portugese. And he wrote it a long time ago.

The funny thing is that this was really the only C# code I could find on the subject, and this code was written 7 1/2 years ago. ID3 is not the most recent technology in Decision trees (evidently, C4.5 is a more recent iteration of decision tree learning). Sure, Java code exists, but I'm not a Java developer, so some of the base libraries referenced made it difficult for me to translate the Java code to C#.

So, I decided to see if I could take Roosevelt's code and make it less rigid (you see, all of the source data and attributes are statically defined in the code. There's no way to configure the data without recompiling, and that just won't do). In my iteration, the decision tree can be built dynamically based on the source data - it does not rely on statically defined concepts within the code, anymore (and output is not in Portugese, either).

I did some other refactoring of the code as well, and made it a bit better - probably still not quite right, but I think it's quite a bit better.

To download the executable I built, Click Here

To download source, Click Here

Now that I've gotten my fill of this "new and shiny" thing, I can get back to the last "new and shiny" thing I was working on, and maybe some day (hopefully sooner than 7 1/2 years from now), someone who is zealous enough to improve my code can do so, and share it with the world. For now, this is my contribution.

Friday, January 28, 2011

Writing Code That Writes Code

This blog references an executable and source code available for download. To download the referenced executable, Click Here

To download source, Click Here

The Pragmatic Programmer, which is a really nifty book that outlines principles for how a programmer should conduct himself (or herself), tells us to "write code that writes code."

There are several reasons one would want to have code that writes code:

1. We tend to write code in a consistent way.
2. If we don't write code in a consistent way, we should
3. Because of 1 & 2, reducing the amount of lines of code we have to physically write can increase our productivity, because it takes less physical work to create more functionality.

When I'm building a database application in C#, my business logic classes often look very similar. When considering how I would build an application that would write code for me, I started by looking at the consistencies in my code. I found the following:

1. I mostly write database applications, and business classes that represent (more or less) table definitions in my database. For example, for a CRM application that manages marketing campaigns, I might have a table called "Person" that represents people in my database; I may also have a table called "MarketingCampaign" that allows me to keep track of my marketing campaigns. I may have a link table called "MarketingRecipient" that links "Person" to "MarketingCampaign" (in other words, the MarketingRecipient table has a record for each person a marketing campaign targeted). To represent these business concepts in code, I would probably build *at least* 3 classes (probably more because I'm careful to avoid violating the Open-Closed principle). But these classes would probably be: Person, MarketingCampaign, and MarketingHistory (which is a collection of MarketingCampaign objects). I don't use ORM (object-relational mapping), because I don't like to surrender as much as it seems ORM asks us to surrender for the sake of convenience.

2. Most of the business layer classes I build have a load() method and a Save() method, and they often look similar; however, not similar enough to over-rely on inheritance.

3. I often have a need to represent business layer classes as a collection. These collections often look very similar - again, not similar enough to use inheritence (IMHO).

4. I make an effort to document (comment) every private member, public attribute, constructor, and method.

5. I often have a need to have a corresponding public attribute for every private member I have. Obviously this is not always the case.

Armed with these bits of information, I embarked on building a rinky-dink code generator that suits my needs.

So, this little application looks as follows:



As you can see, this app is quite simple, and the top input control is for "Class Name". This, as you might guess, asks you to enter a class name.

In the below example, I create a class name of "Person". Once I click the "Create" button, the panel below is activated.


At this point, I can start adding attributes to the class. I can manually type in the data type I want, or there is a set of primitive types in the dropdown.



Once I'm done adding attributes to the class, I click the "Generate Code" button in the lower right corner:


And voila - we have code. Granted, there's not a ton of functionality or code smarts this buys me, but it does do the following:

1. It does quite a bit of typing of members and attributes - this can be a big time saver when talking about 10-50 classes being created
2. It comments for me. Maybe not the best comments, but I like to comment all my members, attributes, and methods
3. It saves me from having to rewrite the same things over and over again, which helps me to abide by the "Don't repeat yourself" principle.

Granted, I often have to change things that are producted from this little app, but as mentioned above, this little app has turned out to be a big time saver. I'm sure there are better code generators out there, but for now, this works for me.

To download the referenced executable, Click Here

To download source, Click Here

Sunday, January 23, 2011

Another Layers of Programming Post

The first blog I wrote on the layers of programming, I feel like I really faltered on describing what "Business Logic" is.

Part of the reason for this is because the concrete definition of it wasn't as clear in my mind as it is now, and part of it was that I had more of an affinity towards the data access layer at the time, and I didn't find explaining business logic to be as exciting.

Times have changed, and I find business logic to be much more exciting than I did then. I also find this concept of "n-tier programming" to be one of the most challenging concepts to get right in object oriented programming.

So, I'll do a quick recap of the three main tiers in n-tier programming:

Data Access Layer: The code responsible for getting data from the database and delivering it to the business logic layer.

User Interface Layer: The code responsible for rendering output to the end-user. The user interface layer is mostly composed of user interface controls that render a user interface -- these controls can be the parent form that other controls are attached to, textboxes, comboboxes, radio buttons, and checkboxes for user input, buttons for user submission, list boxes and data grids for outputting collections of data...you get the idea.

Business Logic Layer: The code responsible for solving the business problems that software is being built to solve.

There are all kinds of implications for what the business logic layer is. Because object oriented programming helps us represent reality so well, I usually have my business logic layer define the concepts of the system I'm building.

For instance, suppose we're building a customer relationship management software (CRM).

Suppose this CRM is supposed to:
1. Keep track of our customers
2. Help us manage our inventory
3. Keep sales history for our customers

Obviously we could add more features to this CRM, but I think the above is adequate for describing what Business Logic does.

There's a lot of ways (methodologies) to build software - usually software is built after requirements have been written.

So, I'll preface these definitions with the fact that I'm building this business logic without having a requirements document.

But in this CRM, we need concepts defined, such as: what is a customer? what is a product? What is a sale? How do these concepts interact with one another?

The answers to the above questions really define what business logic is. For instance, one of the classes in my business logic layer may be Customer. The customer class would look like this:

public class Customer
{
private string _firstName;

private string _lastName;

private string _emailAddress;

private string _address;

public string FirstName
{
get { return _firstName; }
set { _lastName = value; }
}

public string LastName
{
get { return _lastName; }
set { _lastName = value; }
}

public string EmailAddress
{
get { return _emailAddress; }
set { _emailAddress = value; }
}

public string Address
{
get { return _address; }
set { _address = value; }
}
}

The above is a truncated version of what the Customer class would look like, but the above would also be the first lines of code that would come in the business logic layer, because all of the features of the CRM focuses around the concept of "Customer."

The next concept to define in our Business Logic is the concept of an Inventory Product. I would imagine that an inventory product would look a bit like the code below:


public class InventoryProduct
{
private string _partNumber;

private string _productName;

private bool _isActive;

public string PartNumber
{
get { return _partNumber; }
set { _partNumber = value; }
}

public string ProductName
{
get { return _productName; }
set { _productName = value; }
}

public bool IsActive
{
get { return _isActive; }
set { _isActive = value; }
}
}

So, the next question to answer is how does a Customer interact with an InventoryProduct? I would argue that a class called "ItemPurchase" would need to get created. An ItemPurchase would basically be a link between a Customer and an InventoryProduct, with a couple other attributes, such as a PurchaseDate and a Quantity, and maybe a couple other things, depending on what kind of information is getting collected about it.

Purchase would look as follows:

public class ItemPurchase
{
private Customer _customer;
private InventoryProduct _product;
private int _quantity;
private DateTime _date;

public ItemPurchase(Customer customer, InventoryProduct product, int quantity)
{
_customer = customer;
_product = product;
_quantity = quantity;
_date = DateTime.Now;
}

///Other methods to save the purchase
}

This link between Customer and InventoryProduct is great, but we've got another problem: the problem is that the ItemPurchase object only covers one item. Obviously, a customer can purchase multiple things at one time, so we need another collection class. Maybe it could be called "ShoppingCart." I'll exclude the code for this because this is getting to be kind of a long blog entry, but I think you get the idea.

The shopping cart would serve to link an entire purchase to a customer...which leads us to another collection - PurchaseHistory. A PurchaseHistory object would be a list of ShoppingCart objects, and this PurchaseHistory object would get linked to a Customer object, and the Customer class would HAVE-A PurchaseHistory object in it. This relationship in the business logic also lines up very well with how the database would be designed (not covering database design in this post).

So, since I'm creating an PurchaseHistory object, which is essentially a List of ShopingCart objects, AND since I don't want to violate Open-Closed principle, I would start by creating a InventoryProductCollection object, and have PurchaseHistory inherit from that:

public class ShoppingCartCollection : List <ShoppingCart>
{

}

public class PurchaseHistory : ShoppingCartCollection
{
Customer _customer;

public PurchaseHistory(Customer customer)
{
_customer = customer;
this.populate();
}

private void populate()
{
///Go to the data access layer to grab information about this customer's purchase history
}
}

Notice in PurchaseHistory, the default constructor has a parameter of Customer. This makes sense to me because a PurchaseHistory, as we've defined it so far, does not seem to exist outside the context of a Customer. There's a lot of conversation that can be had around this, and how a PurchaseHistory will exist, but for this example, we'll just assume this will always be the case.

Because a customer will HAVE-A PurchaseHistory, I would modify the Customer class as follows:

public class Customer
{
///All of the code above

private PurchaseHistory _purchases;

public PurchaseHistory Purchases
{
get
{
if(_purchases == null)
{
_purchases = new PurchaseHistory(this);
}
return _purchases;
}
}
}

Notice in the getter in Customer, I check to see if the _purchases member is null - this is a trick called "Lazy initialization," which helps me avoid instantiating a PurchaseHistory object until I need it. This may be the right thing to do or the wrong thing to do, depending on all kinds of other factors (size and optimization of database indexes, how often a Customer's PurchaseHistory will be accessed, how often a Customer's PurchaseHistory will be changed, etc, etc, etc).

So, the business logic we've defined will:

1. Get its data from the database
2. Deliver it to the user interface
3. The data will (presumably) be changed in the user interface
4. The request to save to the database will come from the user interface
5. Prompting our business logic to tell the data access layer to save the changed data to the database.

That's business logic as I see it, and this is where a programmer generally earns his (or her) wage. The ability to see the business concepts as relationships, and then represent them in code is a real skill that takes years of tweaking to get right. There's all kinds of pitfalls and obstacles that come with almost any strategy one uses when building this stuff, ranging from scalability issues, extensibility issues, code management issues because of poor design or overly complicated design, performance problems because of use (or lack thereof) of lazy initialization, and a host of other things.

But, this is the world as I see it, and I hope it helps.

Monday, January 3, 2011

The Liskov Substitution Principle

Where the Open-Closed princple advocates inheritance to build good code, the Liskov Substitution principle says "Not so fast!" Inheritance is a great way to create hierarchies in programming, but with great power comes great responsibility. Officially, the Liskov Substitution Principle states: Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it. There are 2 main risks that one assumes by using inheritance:

1. Inappropriate behavior when casting occurs
2. Overly broad generalizations about child class behavior

Again, Martin relies on shapes to demonstrate how inheritance can create problems, this time using a square and a rectangle as the demonstration points.

In his example, he describes a Rectangle as a parent class, and a Square as a child class of Rectangle. Below is C# code that emulates behavior Martin describes:


public class Rectangle
{
protected double _width;
protected double _height;

///
/// Just to keep track of the object
///

protected string _objectName;

public Rectangle()
{
_objectName = "Rectangle";
}

public virtual void SetWidth(double w)
{
_width = w;
}

public virtual void SetHeight(double h)
{
_height = h;
}

public double GetWidth()
{
return _width;
}

public double GetHeight()
{
return _height;
}

public void Output()
{
Console.WriteLine(_objectName + " attributes: Height=" + _height + " | Width=" + _width);
}
}

public class Square : Rectangle
{
public Square()
{
_objectName = "Square";
}

public override void SetWidth(double w)
{
_width = w;
_height = w;
}

public override void SetHeight(double h)
{
_height = h;
_width = h;
}
}



The above code looks clean, elegant, etc, but there's a problem. And this problem comes when, later on in the development process, someone builds code that takes the parent class as a parameter, expecting


class SomeImplementation
{
public void DoStuff()
{

Square square = new Square();
manipulateShape(square);

}

private void manipulateShape(Rectangle rectangle)
{
rectangle.SetHeight(2);
rectangle.SetWidth(4);

if (rectangle.GetWidth() * rectangle.GetHeight() == 8)
return;

throw new Exception("Error with rectangle dimensions");
}

}


In the above example, if a Rectangle object were passed to manipulateShape(), everything would be fine; however, in this case, when a Square object (which also IS-A rectangle) was passed to manipulateShape(), it throws an exception, because the Width * Height would equal 16, and not 8. The real problem here is that we over-rely on the parent's behavior, and as the body of code grows, managability is a major concern, because other users of code ought to rely on expected behavior, without having to always factor the interaction that occurs in the inheritance chain.

Ultimately, the developer who wrote the manipulateShape() method is expecting a Rectangle, but not one of its children (which is a reasonable thing to do); however, the real issue in this design is the overreliance on virtual/overridden methods: a square is a rectangle based on its public interface, but not on its behavior.

Consider a modification to the example, where a Rectangle is generated from some outside method:


Rectangle someUnknownShape = getShape();
manipulateShape(someUnknownShape);


If the getShape() method returns anything other than a top-level Rectangle, the exception gets thrown.

Because the behavior of Square is not consistent with Rectangle, the ultimate solution is to not have Square inherit from Rectangle. The safer alternative, in C#, as I see it, is to create an interface that both Square and Rectangle implement:


public interface IBoxShape
{
void SetWidth(double width);
void SetHeight(double height);
void GetWidth();
void GetHeight();
}



If both Square and Rectangle implemented the IBoxShape interface, then the above manipulateShape() method could keep its parameter as Rectangle, and the behavior of the application would perform as expected, and the developer(s) would not be tempted to send inappropriate objects to manipulateShape().

Going back to the Liskov Substitution Principle: Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.

The manipulateShape() method in its original state ought to be able to expect that the area of the rectangle, after manipulating it, will be 8 - that behavior ought to be transparent, and changing behavior in child classes by use of virtual and overridden methods should be avoided.

The Open-Closed Principle

The Open-Closed Principle states, quite concisely, that classes should be "Open for extension, but closed for modification." In other words, the attributes and behaviors of a class (ie the building blocks of a class) should be thoughtfully placed where they belong in the inheritance chain. I don't think I could really overstate how many times I've violated this principle. In fact, in looking back at code I've written in the past, I see all kinds of examples where I violate it. But this is definitely one of the principles that has strongly impacted how I write code.

The example Uncle Bob used to describe this principle was a Shape example. The implementation of the Open-Closed is via inheritance, and eventually, polymorphism.

Martin starts with an antipattern to describe Open-Closed. Below is an example in C# that is similar to Martin's:

We start with some base citizens in our code base:

 
public class Shape
{
double Area;
///...Other shape code
}

public class Square : Shape
{
///...Square code
}

public class Circle : Shape
{
///....Circle code
}

public class SomeImplementation
{
public void DrawCircle()
{
///...Draw a circle
}

public void DrawSquare()
{
///...Draw a square
}
///...Some code in Some Implementation

public void DrawAllShapes(List shapeList)
{
for(int i = 0; i < shapeList.Count; i++)
{
Shape currentShape = shapeList[i];
if(currentShape is Square)
DrawSquare();
else if(currentShape is Circle)
DrawCircle();
}
}
}


The above code violates the Open-Closed Principle because it could not accomodate new shapes. So, adding new shapes would entail making a change to DrawAllShapes() every time a new shape got added to the program's vernacular. In the original article, Shape, Square, and Circle were Structs, not Classes, which made the antipattern much more ugly-looking. I didn't use a struct in my example, because overusing Structs was never a real problem in my learning how to program.

The better solution, which does not violate the Open-Closed Principle is shown below:

 
public abstract class Shape
{
double Area;

public abstract void Draw();

///...Other shape code
}

public class Square : Shape
{
public override void Draw()
{
///...Do Square drawing
}
}

public class Circle : Shape
{
public override void Draw()
{
///...Do Circle drawing
}

}

public class SomeImplementation
{
///...Some code in Some Implementation

public void DrawAllShapes(List shapeList)
{
foreach(Shape currentShape in shapeList)
currentShape.Draw();
}
}



In the above code, the SomeImplementation class uses polymorphism to draw any shape in the passed shape list. So, as more shapes are added to the application, the DrawAllShapes() method does not need to change (the Closed portion of the principle), and the new shape can simply be a new class, inherited from Shape, or one of its children (the Open portion of the principle).

As a personal aside, I used to violate this principle all the time, because I would create "Collection classes" that would inherit from a List object, and every time a need for a new type of the same collection occurred, I would add a new constructor to represent that type of collection; the better solution would have been to inherit from the base collection. So, the below code is what I used to do:

 
public class SomeClass
{
}

public class SomeClassCollection : List
{
Category _category;
Person _person;

public class SomeClassCollection()
{
_category = null;
_person = null;
populate();
}

public class SomeClassCollection(Category category)
{
_category = category;
_person = null;
populate();
}

public class SomeClassCollection(Person person)
{
_person = person;
_category = null;
populate();
}

private void populate()
{
if(_category == null && _person == null)
{
///Do type 1 collection population
}
else if(_category == null && _person != null)
{
///Do type 2 collection population
}
else if(_category != null && _person == null)
{
///Do type 3 collection population
}
}
}


The better way to build the above code to avoid violating the Open-Closed Principle would have been to extend SomeClassCollection:

 
public class SomeClassCollection : List
{

protected virtual void populate()
{

}
}

public CategoryCollection : SomeClassCollection
{
public CategoryCollection(Category category)
{

}
override void populate()
{
///Populate based on category
}
}

public CategoryCollection : SomeClassCollection
{
public CategoryCollection(Person person)
{

}
override void populate()
{
///Populate based on person
}
}

SOLID Principles

I recently stumbled onto a collection of articles written by an author named Robert Martin way back in 1996 for a now defunct magazine called "The C++ Report" (The C++ Report was evidently followed by the "Journal of Object-Oriented Programming", which doesn't seem to exist anymore either). These articles address core principles of object oriented design, and seem as true today as they were all those years ago. Of course, a few of the references Martin used in his articles may be a bit outdated, but the spirit and intent behind the articles is clear, and unlike a lot of articles I read that were written much more recently than Martin's articles, these articles are unmuddied by the complexities and interactions between all of the various technologies, subtechnologies, wrappers, APIs, etc that exist these days.

Martin (affectionately known as "Uncle Bob") wasn't really inventing any new concepts with his articles; rather, he was simply synthesizing information created by others into a paradigm that was appropriate for the time and available technology. But Robert Martin's articles were so successful and widely adopted, that they are still at the core of most object oriented design that is in use today, including later concepts, such as design patterns. Evidently, Martin is still at it, owning a company called "Object Mentor" which does company/enterprise-level coaching of these, and other object oriented design principles. He's also written a number of books on the topic, as well. Today, Martin is considered a legend in the programming world. He even has a blog, which can be found at http://blog.objectmentor.com/articles/category/uncle-bobs-blatherings

I'm about half way through reading this series of articles Martin wrote, and the information I've gleened from them have drastically (or at least, significantly) impacted how I approach design and refactoring (in fact, some of the principles demonstrated in these articles have sparked several refactoring sessions). Martin's SOLID principles are made up of: Single Responsibility Principle, The Open-Closed Principle, The Liskov Substitution Principle, the Interface Segregation Principle, and the Dependency Inversion Principle.

If nothing else my next few blogs will be a way for me to take the information I gathered from the core OOP principles Uncle Bob put forward (hereafter referred to as SOLID principles). And perhaps, someone out there will read them, and be compelled to apply them in their own programming.

Followers

Search This Blog

Powered by Blogger.