Steal this tech: 2013

Monday, September 9, 2013

A Very Simple Demonstration of Node.js/Express and SQLite

Obligatory Boring Preamble

At the risk of disappointing all zero of my loyal followers, today we'll take a break from the C#.NET ecosystem and try something totally different.

Server side JavaScript got pretty popular about two years ago (about 2011 - in case you're reading this from the future), and it has matured to the point where new-"ish" tech adopters like me are willing to give it a try - actual tools exist, people are using it for real projects in production, and it's no longer attracting as much attention from "latest thing" fad seekers. (Don't get me wrong. I love new things - but I have learned to love new things that are *useful* for doing *work*.)

So for today's installment, let's create an app that brings these two great tastes together, and as usual, share the goods for you to (hopefully) steal.

The Technology Stack

Like most of the stuff I have the luxury of posting on a public blog, this is going to be pretty basic.

For development, we're going to use:

Node.js
Eclipse (free multi-language IDE available here)
Nodeclipse (free Eclipse plugin for developing Node apps, available here)
Node Package Manager (NPM) (free package manager for Node.js modules available here)

And once we've got NPM installed and working, we're going to use it to install a few packages:

Express.js ("a minimal and flexible node.js web application framework")
Jade (a templating engine for generating our HTML user interface)
node-sqlite3 (Asynchronous, non-blocking SQLite3 bindings for Node.js)

The Finished Product

We'll end up with a small Node.js app that allows you to create and view SQLite3 databases, and modify some of their elements (create new tables and columns). The app will also demonstrate a few elements of Express.js - the routing of get and post requests, as well as some basic features of templating with Jade.

If you want to play with it yourself, you can clone the project files from my Git repository after following along with the setup we'll do.

Setup

First, download and install Node.js from http://nodejs.org/.

Next, install Eclipse (and any pre-requisites like Java) from http://eclipse.org/recommenders/download/.

Once Eclipse is installed, install the Nodeclipse plugin:

Go to the Eclipse's Help menu and choose Install New Software
In the "Work With" field, paste this URL: http://www.nodeclipse.org/updates/
Click the 'Add' button

Building the dev environment

When you install Node.js, it puts an icon in the main Start menu, as well as a program group.

This one will let you run Node.js server-side JavaScript in a console and see the output directly.

So, typing "Hello World" at the prompt will echo "Hello World". If you've ever used Firebug for trying out JavaScript, this does more or less the same thing.

What you'll probably end up using more often is the node command prompt:

This lets you start/stop/restart Node.js applications as well as use the Node Package Manager (hereafter referred to as npm). For starters, let's just see what versions of everything we've got:

So far so good!

Next let's create a new Node project in Eclipse, so that we can start installing modules and get a skeleton site up.

In Eclipse, choose New > Node > Node Express Project > Next

Give you project a name and click Finish. Look at the output in Eclipse's console window:

Note that the Nodeclipse plugin has already done some work for us: it used npm to install 2 local node modules: Express (the web framework we'll be using to route requests and responses), and Jade (the templating engine we'll use to generate HTML output to the browser).

(Note, if your output looks garbled you might have Eclipse's text file encoding set to the wrong character set. Set it to UTF-8 to get the output shown above. See here for more).

You could have done this yourself by opening a Node.js command prompt, navigating to the root of your project directory, and typing:

npm install express

npm install jade

Ok. It's time to do our first "smoke test" (fire it up, see if it smokes). While still in the Node.js command prompt, type:

node app.js

Open a browser, and open the URL http://localhost:3000

You should see the following:

And if you look back at your console output, you'll see some activity:

Looking at the project structure in Eclipse, we can see that (for now), app.js is where the action is. It's the entry point into our application, as well as containing the code that sets up the environment and starts the http listener.

We also see that the Nodeclipse plugin generated a few other files and directories:

views/layout.jade - the basic page template that other jade views extend
views/index.jade - the default landing page for our app
routes/index.js - the code that tells Node.js/Express to render the index page
Stylesheets, etc.

Adding a simple page

First, create a new express route. In your Eclipse project:

Right click on routes
Choose New > Node Source File
Call it 'test.js' and click Finish
Copy and paste the contents of index.js into test.js
Replace exports.index with exports.get
Replace res.render('index', { title: 'Express' }); with res.render('test', { title: 'Test Page' });

Next, create a new jade template:

Right click on views
Choose New > General > File
Call it 'test.jade' and click Finish
Copy and paste the contents of index.jade into test.jade
Replace p Welcome to #{title} with p This is a #{title}

Next, register the route as a module dependency in app.js.

Open app.js
After the line var path = require('path'); add var test = require('./routes/test');
After the line app.get('/users', user.list); add app.get('/test', test.get);

Try it out - stop the Node.js app by typing <CTRL> C, then run node app.js again. This time, navigate to http://localhost:3000/test. If it all went Ok, you should see:

A few notes about what we did. For simplicity, I used the word 'test' for everything - the name of the route, view, variable, etc. But you don't have to! Node.js/Express doesn't use "convention over configuration". The things to know are:

Whatever you name the view file needs to match the call to render it. So you could have a file called wombat.jade and call it from test.js.get() with
res.render('wombat', { title: 'Test Page' });
You can call the variable in app.js anything you want, as long as its pointing to the right file (and calls the same methods). So
var aardvark = require('./routes/test'); would work, as long as you called the aardvark.get() method.
You can call the method in test.js anything you want, as long as you register it in app.js. So
exports.jabiru(){...}
would work as long as you changed the line in app.js to
app.get('test', test.jabiru);
You can map the URL from anything you want. So app.get('/Mah/RESTFul/Test', test.get); would work. This is pretty powerful (well, if you grew up with much more awful oldschool ways to map URLs to code).

You can also use wildcards, e.g., app.get('/Mah/*/Test', test.get);
or placeholders for variables, so that app.get('/products/:productName/details', product.details); would pass whatever is in the section of the URL between 'products/' and '/details' as a variable for the product.details method to use.

Let's try one more tweak to our test view/route before we move on to the good stuff - let's add a post method to accept data from the browser.

Add the following method to test.js:

exports.post = function(req, res){
 
    var name = req.body.name;
 
    console.log("Received POST data: " + name);
 
    res.render('test', { title: 'Test Page designed *exclusively* for ' 
        + name });
 
};

Map the method in app.js. Add:
```
app.post('/test', test.post);
```

Extend the jade template. The complete page should read:


extends layout

block content
  h1= title
  p This is a #{title}
  
  div
    form(method='post')
      p Enter your name:
      input(type='text',name='name',id='name')        
      input(type='submit',name='submit',value='Submit')

Note that jade is indent-sensitive! Indented elements are considered to be "children" of the elements before them. So the jade file above will result in the following HTML:


<!DOCTYPE html>
<html>
  <head>
    <title>Test Page</title><link rel="stylesheet" href="/stylesheets/style.css">
    </head>
  <body>
    <h1>Test Page</h1>
    <p>This is a Test Page</p>
    <div>
      <form method="post">
        <p>Enter your name:</p>
        <input type="text" name="name" id="name">
        <input type="submit" name="submit" value="Submit">
      </form>
    </div>
  </body>
</html>

Ok. Restart node (<CTRL> C, then node app.js from the command prompt) and refresh the page. Enter your name and click Submit - but first I should warn you. Prepare to be amazed!

Alright. Now for the good stuff. Let's get sqlite cooking!

Adding SQLite to the mix with node-sqlite3

If you don't have it already, download and install SQLite, and Python. Now we need a wrapper - e.g., an interface between the database and Node.js. Luckily there's node-sqlite3.

Let's use npm to install node-sqlite3:

Open the Node.js command prompt from Start > All Programs > Node.js > Node.js command prompt.
Navigate to your project directory (probably something like c:\users\[username]\workspace\[project]).
Type the following command:

npm install sqlite3

If you would like to run the sqlite3 unit tests provided (always a good idea), you'll need to install mocha as well. (IMPORTANT NOTE! You must first navigate to the Sqlite3 directory inside your project - it should be in c:\users\[username]\workspace\[project]\node_modules\sqlite3 - otherwise the tests won't execute...).

npm install mocha
npm test

If everything went well you should see output similar to the following:

Let's create a SQLite database and see if we can spit some data out to the browser.

Create a new database

Open a plain old Windows command prompt and navigate to the root of your project directory. We're going to create a new database called test.db, add a table called 'foo', and insert a few rows of dummy data for testing purposes. Like so:


C:\Users\bricej\workspace\nodetest>sqlite3 test.db
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table foo(id integer primary key asc, some_text text);
sqlite> insert into foo(some_text) values('howdy');
sqlite> insert into foo(some_text) values('there');
sqlite> insert into foo(some_text) values('folks!');
sqlite>

(I will spare us all the rant about always, ALWAYS having a surrogate key as a table's primary key, other than to say you should ALWAYS ALWAYS use a surrogate key. The fact that there is even a debate about this makes me sad. If you ever use a 'natural' key, you are signing up for some serious and painful data surgery at some point, buddy!)

Ok, now we're going to rinse and repeat the steps we used to create our fabulous test post page above.

First, create a route, called table_view.js:

/**
 * Table view
 */
var sqlite3 = require('sqlite3').verbose();
var db;

exports.get = function(req, res){
 console.log("Serving " + __filename);
 
 var db_name = req.params.db_resource;
 var table = req.params.table;
 
 var dbpath = db_name;
 db = new sqlite3.Database(dbpath, function(err){
  if (err) res.render('index', {title: 'Error: ' + err}); 
 });
 
 db.all(" select * from " + table, function(err, rows) {
        
        var message = rows.length > 0 ? "Viewing " 
             + db_name + '/' + table 
             : "No data found in table '" + table + "' in " + db_name;
        
        res.render('table_view', {message: message, rows: rows});
        db.close();
    });
};

Next, create a jade view called table_view.jade:


extends layout

block content
  h1= message
    
    h3 Data
    table
      - for (var r in rows)
          tr
            td= rows[r].id
            td= rows[r].some_text

  br

Finally, add a reference and a mapping to the app.js

var table_view = require('./routes/table_view');

and

app.get('/database/view/:db_resource/:table', table_view.get);

Now, if you restart the app and go to http://localhost:3000/database/view/test.db/foo you will see:

As in the previous example, the JavaScript file does most of the "heavy lifting" of querying the database and making the output available to the view. The jade view formats the output using a (hopefully) easy to understand templating syntax. The app (and Express framework) handle the routing of the request, so that we can create nice, RESTful URLs to our code.

You can see how easy it is to use these simple components to feed data to a browser - and can imagine extending the form example above to build more sophisticated data querying capabilities.

A few more additions and tweaks

Let's expand what we've got so far to meet the following requirements:

Add the ability to create a new database on the fly without having to use the command line
Add the ability to list existing databases
Add the ability to add tables to any database in the list
Add the ability to add columns to any table

While we're at it, let's refactor our code so that:

The database specific stuff is in its own subdirectory, so that the root doesn't look so cluttered.
Verbose console messages are only printed while in "debug" mode.

When we're done, our file tree should look like this:


Project Root
|   app.js
|   package.json
|   README.md
|   
|       
|---databases
|       test.db
|       
|---node_modules
[Ommitted for space]
|                   
|---public
|   |---stylesheets
|           style.css
|           
|---routes
|   |   echo.js
|   |   index.js
|   |   user.js
|   |   
|   |---database
|           db_create.js
|           db_edit.js
|           db_list.js
|           table_edit.js
|           table_view.js
|           
|---utils
|       date_time.js
|       environment.js
|       reflection.js
|       
|---views
    |   echo.jade
    |   index.jade
    |   layout.jade
    |   
    |---database
            db_create.jade
            db_edit.jade
            db_list.jade
            table_edit.jade
            table_view.jade

We'll also update the app.js so that pages can handle both get and post requests (where appropriate).

To save you the trouble, I've published a version of the project that does all these things: https://github.com/jasonbrice/nodeling

If you clone/fork/download it, you should be able to import it straight into Eclipse:

Try it out and let me know if it worked in the comments section...

To Do

To make this more like a "real" project, we should do a few things:

Sanitize and validate inputs!
Add more querying tools.
Use reflection so that object properties are not hard-coded.
Package utils code as user modules.
Write unit tests for each page.
Usability: add breadcrumbs, etc.

Other odds & ends

In getting ready to push this to Github, I did a quick inventory of files, sizes, etc. to make sure I wasn't pushing any unnecessary junk up. So I ran:

npm ls

And saw that at the end of the output, there was this:


¦     +-- uglify-to-browserify@1.0.1
+-- sqlite3@2.1.15 extraneous

npm ERR! extraneous: sqlite3@2.1.15 C:\Users\bricej\workspace\nodeling\node_modules\sqlite3
npm ERR! not ok code 0

Huh? Sqlite is TOO a dependency! In fact, if I use npm prune to remove extraneous node_modules, then try and start the app, I get an error. So what gives?
It turns out that some piece in the install mechanism (npm? sqlite3's installer?) forgot to update the package.json. So in order to get node and npm to recognize it as such, I hand-updated the package.json to the following:


{
  "name": "nodeling",
  "version": "0.0.1",
  "private": true,
  "scripts": {
    "start": "node app.js"
  },
  "dependencies": {
    "express": "3.3.7",
    "jade": "*", 
    "sqlite3" : "*"
  }
}

This says that the app is dependant on a version (any version) of sqlite3.

Also, the "right way" to bundle the utility files like date_time.js and reflection.js would be to make them local node modules and add them as dependencies.

Additional Resources

Source code for this demo is here: https://github.com/jasonbrice/nodeling

This site is invaluable in understanding the node-sqlite3 wrapper: https://github.com/developmentseed/node-sqlite3

Eclipse and Nodeclipse help: http://www.nodeclipse.org/community/q-and-a

Everything you need to know about Node Package Manager: http://howtonode.org/introduction-to-npm

A great tutorial focused on Node.js and Express: http://evanhahn.com/understanding-express-js/

Another tutorial, using Expresso, Jade, MongoDB: http://dailyjs.com/2010/11/22/node-tutorial-4/

Another tutorial, using more or less same as above: http://blog.ijasoneverett.com/2013/03/a-sample-app-with-node-js-express-and-mongodb-part-1/

These tutorials are good to check out since they did similar things to this one, but show a few slightly different approaches. It's always helpful (to me) to understand how other people approach similar problems.

Monday, July 22, 2013

Fun with multi threading

This one will be short 'n sweet.

Background

It so happens that I have a project that involves decompiling a certain vendor's binary files into XML, then manipulating the data in that XML, the recompiling the XML back into the proprietary binary format.

The tool for de/re-compiling is provided by the vendor, and for our purposes let's say that it's a "black box" - we don't know how it does what it does, we just know that we give it an XML or binary file as input, and it spits out the opposite (sort of like that machine that could remove stars from the star-bellied Sneetches and vice versa).

In the cartoon version it was one machine. Use your imagination, people.

The Problem

The black box tool works Ok, but takes a second or two to spin up, do its thing, and produce a file of the expected output type. On my (pretty vanilla for 2013) dev laptop, it takes about 75 seconds to process 500 files.

Can we throw more threads at it?

As we all undoubtedly know, multi-threading is both the cause of, and solution to, all software performance problems (apologies to The Simpsons). Multi-threaded processes can be exceedingly difficult to debug, performance can actually be degraded if you do it wrong, and for some things it brings only modest performance gains at best.

So, will it work for my problem?

In theory it should. All of the complexity of what happens inside the black box is encapsulated away from my code and my poor brain. I just need to process lots of files, the more/faster the better.

This is the kind of thing the .NET Threadpool was made for.

The implementation

Well, I wish I could claim that I came up with the whole thing myself. But, standing on the shoulders of giants and all that, the oracle of sublime wisdom that is The Internet, and more importantly resources like StackOverflow, I found someone trying to solve basically the same problem.

But what about this bit?

I only used four threads in my example because that's how many cores I have. It makes little sense to be using 20 threads when only four of them can be processing at any one time. But you're free to increase the MaxThreads number if you like.

Good point. The implementation should make use of as many cores as it can, but any more than that doesn't do any good. So how do you do that? Oh StackOverflow, is there anything you don't know?

Ok. So here's my completed (and heavily blog-post-ified) code, with probably the bare minimum of flair:

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;
using System.IO;
using System.Diagnostics;
 
namespace AcmeCo.BigProject
{
    /*
     * As far as the rest of the app is concerned, XmlCompiler does all the work.
     * You just tell it what XML files to process, and let it worry about the rest.
     */
    public class XmlCompiler
    {
        List<string> Files;
 
        private static int CoreCount = 0;
 
        // No zero arg constructor: this class is pointless without files to process.
        public XmlCompiler(List<string> Files)
        {
            this.Files = Files;
 
            // Stolen from http://stackoverflow.com/questions/1542213/how-to-find-the-number-of-cpu-cores-via-net-c
            foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
            {
                CoreCount += int.Parse(item["NumberOfCores"].ToString());
            }
 
        }
 
        public void Process()
        {
            DoTheWork();
        }
 
        private static int MaxThreads = 8;
        private Semaphore _sem = new Semaphore(MaxThreads, MaxThreads);
 
        // Stolen from http://stackoverflow.com/questions/15120818/net-2-0-processing-very-large-lists-using-threadpool
        void DoTheWork()
        {
            MaxThreads = CoreCount;
            int ItemsToProcess = this.Files.Count;
 
            Console.WriteLine("Processing " + ItemsToProcess + " on " + CoreCount + " cores");
 
            int s = 0;
            for (int i = 0; i < ItemsToProcess; ++i)
            {
                Console.WriteLine("Processing " + i + " of " + ItemsToProcess + ": " + Files[i]);
 
                _sem.WaitOne();
 
                XmlCompileTarget target = new XmlCompileTarget(Files[i]);
 
                ThreadPool.QueueUserWorkItem(Process, target);
                ++s;
                if (s >= 19)
                    s = 0;
            }
 
            // All items have been assigned threads.
            // Now, acquire the semaphore "MaxThreads" times.
            // When counter reaches that number, we know all threads are done.
            int semCount = 0;
            while (semCount < MaxThreads)
            {
                _sem.Release();


                ++semCount;

            }
            // All items are processed
 
            // Clear the semaphore for next time.
            _sem.Release(semCount);
        }
 
        void Process(object o)
        {
            // do the processing ...
            XmlCompileTarget target = (XmlCompileTarget)o;
            target.Process();
 
            // release the semaphore
            _sem.Release();
        }
 
    }
 
    // A "unit" of work... this class' job is to hand the file to the processing
    // utility and raise an event when it's done.
    public class XmlCompileTarget
    {
        private string file;
 
        public XmlCompileTarget(string file)
        {
            this.file = file;
        }
 
        public void Process()
        {
            Compilefile();
        }
 
        public static event EventHandler<XmlProcessEventArgs> OnProgress = delegate { };
 
        protected virtual void Progress(XmlProcessEventArgs e)
        {
            OnProgress.Raise(this, e);
        }
 
        private void Compilefile()
        {
            if (!System.IO.File.Exists(file))
            {
                OnProgress(this, new XmlProcessEventArgs(file, "File not found!"));
                return;
            }
 
            OnProgress(this, new XmlProcessEventArgs(file, XmlUtilities.RunTool(@"Tools\XmlComp.exe", new FileInfo(file), null)));
        }
    }
 
    // The processing utility runs the vendor's XML compiler and
    // returns any output from that tool as a string.
    public class XmlUtilities
    {
        public static string RunTool(string ExecutablePath, FileInfo FileInfo, string Arguments)
        {
 
            Process p = new Process();
            ProcessStartInfo info = new ProcessStartInfo();
 
            if (!File.Exists(ExecutablePath) || !FileInfo.Exists)
            {
                Console.WriteLine("Error: File path not found - \r\n"
                    + ExecutablePath + " exists == " + File.Exists(ExecutablePath) + "\r\n"
                    + FileInfo + " exists == " + FileInfo.Exists);
                return null;
            }
 
            Console.WriteLine(Arguments);
 
            info.FileName = ExecutablePath;
            info.Arguments = string.IsNullOrEmpty(Arguments) ? "\"" + FileInfo.FullName + "\"" : Arguments;
            info.UseShellExecute = false;
            info.RedirectStandardOutput = true;
            info.RedirectStandardError = true;
            info.RedirectStandardInput = true;
 
            info.ErrorDialog = true;
            info.Verb = "runas";
            info.WindowStyle = ProcessWindowStyle.Hidden;
 
            p.StartInfo = info;
            p.Start();
 
            string output = p.StandardOutput.ReadToEnd();
            string error = p.StandardError.ReadToEnd();
 
            p.WaitForExit();
 
            StringBuilder sb = new StringBuilder();
            sb.Append(output);
 
            return sb.ToString();
        }
    }
 
    // Not much to see here...
    public class XmlProcessEventArgs : EventArgs
    {
        private string filename;
        private string output;
 
        public XmlProcessEventArgs(string filename, string output)
        {
            this.filename = filename;
            this.output = output;
 
        }
    }
 
    // Ah, why this?
    // Because it is a drag to continually have to add tons
    // of thread-safe invocations on every last UI 
    // element that might need updating as a result of the event
    // that was raised.
    // Isn't it better to make the *event* notification thread-safe, 
    // and let UI elements be their merry little selves on their own
    // merry little thread?
    // But we digress...
// Stolen from: http://stackoverflow.com/a/2150359/2124709
    public static class ExtensionMethods
    {
        /// <summary>Raises the event (on the UI thread if available).</summary>
        /// <param name="multicastDelegate">The event to raise.</param>
        /// <param name="sender">The source of the event.</param>
        /// <param name="e">An EventArgs that contains the event data.</param>
        /// <returns>The return value of the event invocation or null if none.</returns>
        /// <remarks>Usage: MyEvent.Raise(this, EventArgs.Empty);</remarks>
        public static object Raise(this MulticastDelegate multicastDelegate, object sender, EventArgs e)
        {
            object retVal = null;
 
            MulticastDelegate threadSafeMulticastDelegate = multicastDelegate;
            if (threadSafeMulticastDelegate != null)
            {
                foreach (Delegate d in threadSafeMulticastDelegate.GetInvocationList())
                {
                    var synchronizeInvoke = d.Target as System.ComponentModel.ISynchronizeInvoke;
                    if ((synchronizeInvoke != null) && synchronizeInvoke.InvokeRequired)
                    {
                        try
                        {
                            retVal = synchronizeInvoke.EndInvoke(synchronizeInvoke.BeginInvoke(d, new[] { sender, e }));
                        }
                        catch (Exception ex)
                        {
                            Console.WriteLine(ex.Message + Environment.NewLine + ex.StackTrace);
                        }
                    }
                    else
                    {
                        retVal = d.DynamicInvoke(new[] { sender, e });
                    }
                }
            }
 
            return retVal;
        }
    }
}

Benchmarking

What kind of impact did all of that have? Let's compare:

Operating System: Windows 7 Professional 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.130318-1533)
Processor: Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz (8 CPUs), ~1.6GHz
Memory: 8192MB RAM
Available OS Memory: 8124MB RAM
Page File: 8020MB used, 8224MB available

* No multi-threading (serial processing of each XML file), 500 files. Processing time: 75 seconds.
* Using code above (multi-threading on 4 cores), 500 files. Processing time: 15 seconds.

Well, we've cut our processing time to 20% of what it was originally. What happens if we add more cores? I happen to have access to a bigger box here somewhere...

Operating System: Windows 7 Professional 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.110408-1631)
Processor: Intel(R) Core(TM) i7 CPU X 990 @ 3.47GHz (12 CPUs), ~3.5GHz
Memory: 6144MB RAM
Available OS Memory: 6136MB RAM
Page File: 1932MB used, 12392MB available

* Using code above (multi-threading on 6 cores), 500 files. Processing time: 5 seconds.

Summary and Conclusions

I started with a bottleneck that required my code to run an external tool to process many thousands of files. After investigating running multiple instances of the external tool in parallel via multi-threading (and using hardware better suited for the job at hand), I was able to decrease the net runtime to ~ 7% of the time it did running sequentially in a single thread.

I can live with that.

Tuesday, July 9, 2013

Jumping into Micro ORM, Ctd

As you may remember from our last installment, I'm going to experiment with cobbling together a small proof of concept project using C# .NET, a Micro ORM (PetaPoco), and a lightweight database (Sqlite).

I've already got a Visual Studio 2010 project (targeted at the .NET Framework 4.0 runtime), so it's time to get PetaPoco.

Getting PetaPoco

At this point it appears that I have two options for getting PetaPoco - making it a project dependency with NuGet, or cloning the source from Git. For now, I'll go with the latter. (For a nice quick 'n easy into to Git, you can't get much simpler than this.)

After cloning to a local repository, I see a new solution and associated files in the new PetaPoco directory. I open the solution file in Visual Studio 2010, and it builds a .dll on the first try with a couple of (hopefully minor) warnings.

------ Build started: Project: PetaPoco, Configuration: Debug Any CPU ------ PetaPoco -> C:\Users\bricej\repos\PetaPoco\PetaPoco\bin\Debug\PetaPoco.dll------ Build started: Project: PetaPoco.Tests, Configuration: Debug Any CPU ------c:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Common.targets(1605,5): warning MSB3245: Could not resolve this reference. Could not locate the assembly "Npgsql". Check to make sure the assembly exists on disk. If this reference is required by your code, you may get compilation errors. PetaPoco.Tests -> C:\Users\bricej\repos\PetaPoco\PetaPoco.Tests\bin\Debug\PetaPoco.Tests.exe------ Build started: Project: PetaPoco.DevBed, Configuration: Debug Any CPU ------ PetaPoco.DevBed -> C:\Users\bricej\repos\PetaPoco\PetaPoco.DevBed\bin\Debug\PetaPoco.DevBed.exe========== Build: 3 succeeded or up-to-date, 0 failed, 0 skipped ==========

Setting up PetaPoco with Sqlite

At this point I don't have Sqlite downloaded/installed/configured/etc.

PetaPoco doesn't seem to say anything about Sqlite. But, reading from the comments in PetaPoco.Core.ttinclude, we see some examples of setting up DbProviders, and this helpful nugget:

Also, the providers and their dependencies need to be installed to GAC

That's good info, because it doesn't look like Sqlite is intended to live in the GAC (the Global Assembly Cache, where the .NET runtime may go looking for referenced dlls) by default:

All the "bundle" packages contain the "System.Data.SQLite.dll" mixed-mode assembly. These packages should only be used in cases where the assembly binary must be deployed to the Global Assembly Cache for some reason (e.g. to support some legacy application on customer machines).

So let's download the bundle version (If we already knew what we were doing, we might have gotten away with downloading just the binaries and using the gacutil...)

Next, we're going to need a .NET wrapper for Sqlite so that we can set it up as a datasource in Visual Studio. There appear to be multiple options, but let's try ADO.NET 2.0 Provider for SQLite.

After installing the Sqlite wrapper, I can now set up a connection to an existing database, or create a new Sqlite one. Let's create one:

Next let's add a table (using the sqlite command line tool) to see if our database is working:

Getting somewhere...

Running the built-in unit tests for PetaPoco didn't work out so hot: Permission to Rock - DENIED.

Hmm. Lots of invalid cast exceptions. It *seems* like the unit tests are able to connect to the database and are performing the setup/teardown steps. But clearly something fundamental is wrong.

Let's try and rule out the unit tests themselves as the problem. Since we already created a table 'foo' with two columns (int/varchar), let's create a Poco and see if we can query it:

   public class foo
    {
       public int id{get;set;}
       public string name { get; set; }
    }

Testing the query:

   // Create a PetaPoco database object
   var db = new PetaPoco.Database("sqlite");
 
   try
   {
        // Show all foo    
        foreach (var a in db.Query<foo>("SELECT * FROM foo"))
        {
             Console.WriteLine("{0} - {1}", a.id, a.name);
        }
            }
    catch (Exception ex) {
        Console.WriteLine(ex.Message + Environment.NewLine + ex.StackTrace);
    }

Outputs the following:

So our stack (C# .NET 4.0 / PetaPoco / Sqlite) clearly works, but there's something unsettling about those unit tests failing. I'm not committed to PetaPoco yet. It should be easy to try out another Micro ORM framework since 1) in theory ORM frameworks try to be as implementation agnostic as they can get away with, and 2) we haven't really written any code yet - so why not try something else?

Trying a different Micro ORM

Let's look for something that's actively maintained, and has a bit more friendly support/documentation for Sqlite.

MicroLite ORM? Actively maintained, lots of documentation, Sqlite examples...

Getting MicroLite ORM

Although you can clone the project from Github, the docs say to install from NuGet. Ok then. For those of you new to NuGet, start here. (If you've ever used a package manager on a *nix distro, this will be a pretty familiar concept.)

After you install MicroLite with NuGet (depending on which packages you choose) you should see something like this when you right-click on your solution and choose "Manage NuGet Packages for Solution."

Ok, now let's see if we can reproduce our simple SELECT query.

Our new foo, with MicroLite attributes:

    [Table("foo")]
    public class foo
    {
        [Column("id")]
        [Identifier(IdentifierStrategy.DbGenerated)]
        public int id { get; set; }
 
        [Column("name")]
        public string name { get; set; }
    }

And the code to do a simple query:

var sessionFactory = Configure
     .Fluently()
     .ForConnection(connectionName: "sqlite", 
         sqlDialect: "MicroLite.Dialect.SQLiteDialect")
     .CreateSessionFactory();
    using (var session = sessionFactory.OpenSession())
    {
        var query = new SqlQuery("SELECT * from foo");
        var foos = session.Fetch<foo>(query); // foos will be an IList<foo>
        foreach (foo a in foos) {
            Console.WriteLine("{0} - {1}", a.id, a.name);
        }
    }

That's a bit more coding than the simpler PetaPoco version. In fact it's starting to look a lot like Hibernate! (Well, that's unfair, but one of the original design goals was a much simpler code base than a full-ORM would require.)

Let's see what happens if we use MicroLite's built-in conventions (vs. configurations for table/column mappings). Refactoring the above code we get:

    //[Table("foo")]
    public class foo
    {
        //[Column("id")]
        //[Identifier(IdentifierStrategy.DbGenerated)]
        public int Id { get; set; }
 
        //[Column("name")]
        public string name { get; set; }
    }

And :

   #region new_code!
 
    Configure.Extensions() // If used, load any logging extension first.
        .WithConventionBasedMapping(new ConventionMappingSettings
            {
// default is DbGenerated if not specified.
                IdentifierStrategy = IdentifierStrategy.DbGenerated, 
                // default is true if not specified.
                UsePluralClassNameForTableName = false 
            });
            
    #endregion
 
    var sessionFactory = Configure
      .Fluently()
      .ForConnection(connectionName: "sqlite", 
            sqlDialect: "MicroLite.Dialect.SQLiteDialect")
      .CreateSessionFactory();

    using (var session = sessionFactory.OpenSession())
    {
        var query = new SqlQuery("SELECT * from foo");
        var foos = session.Fetch<foo>(query); // foos will be an IList<foo>
        foreach (foo a in foos)
        {
            Console.WriteLine("{0} - {1}", a.Id, a.name);
        }
    }

There's one small change I made that might be hard to notice. I made the foo.Id property camel case (it was foo.id previously). Why? Because MicroLite's convention-based mapping strategy requires that

"The class must have a property which is either class name + Id or just Id (e.g. class Customer should have a property called CustomerId or Id)."

(Side note - I also had to rename the column in the foo table to "Id" - yuck! I assume this is because MicroLite's convention-based mapping is case-sensitive on both ends - code and database objects.)

This post is starting to get long... so I'll snip out the part where I try full CRUD operations using both MicroLite and PetaPoco - but both work fine as expected.

We need a tie-breaker

Both Micro MicroLite and PetaPoco will serve as nice smallish ORM frameworks. MicroLite appears to be more recently maintained and has lots of features, but PetaPoco is super simple to use and requires (at least in simple CRUD examples) a lot less code.

Let's have a race and see what happens. How fast can each of these insert 10,000 rows? (Note: times are in milliseconds)

First, MicroLite:

  DateTime start = System.DateTime.Now;
            
            using (var session = GetFactory().OpenSession())
            {
                using (var transaction = session.BeginTransaction())
                {
                    for (int i = 0; i < 10000; i++) {
                        var foo = new foo();
                        foo.name = "MicroLite Insert Test " + i;
                        session.Insert(foo);
                }
                    transaction.Commit();
                }
            }
            Console.WriteLine("Elapsed: " + (System.DateTime.Now - start).TotalMilliseconds);

Outputs: Elapsed: 1811.1036

Next up, PetaPoco:

  DateTime start = System.DateTime.Now;
 
            // Create a PetaPoco database object
            var db = new PetaPoco.Database("sqlite");
 
            for (int i = 0; i < 10000; i++)
            {
                foo foo = new foo();
                foo.name = "PetaPoco Insert Test " + i;
                db.Insert("foo", "Id", foo);
            }
            Console.WriteLine("Elapsed: " + (System.DateTime.Now - start).TotalMilliseconds);

Outputs: Elapsed: 115555.6095

WOW! PetaPoco took nearly 2 minutes to insert what MicroLite did in under 2 seconds. (NOTE: See updates below regarding performance improvements)

What about read operations? I created similar code that reads the 10,000 rows created in the bulk insert test and adds them to a Dictionary<int, string>.

PetaPoco:
Run #1 Elapsed: 219.0125
Run #2 Elapsed: 125.0072
Run #3 Elapsed: 101.0057
Run #4 Elapsed: 96.0055
Run #5 Elapsed: 98.0056
Run #6 Elapsed: 121.0069
Run #7 Elapsed: 118.0068

MicroLite:
Run #1 Elapsed: 926.0529
Run #2 Elapsed: 355.0203
Run #3 Elapsed: 398.0228
Run #4 Elapsed: 351.0201
Run #5 Elapsed: 483.0276
Run #6 Elapsed: 347.0199
Run #7 Elapsed: 357.0204

Hmm. PetaPoco is actually quite a bit faster than MicroLite. Both appear to be doing some caching since the first run is significantly slower than subsequent runs.

Are these fair tests? Is there a way to make them more Apples-to-Apples? Let me know in the comments...

Summary and Conclusions

I started out looking for a stack that would let me run a.NET Winforms app on top of a Micro ORM for simple CRUD operations on a Sqlite database.

I found at least two that would be reasonably serviceable: PetaPoco (the original choice), and MicroLite ORM.

PetaPoco just "feels" simpler to use and less like the clunky Spring/Hibernate syntax I wanted to avoid. But it doesn't appear to be actively maintained. Then again, there are 500+ hits on Stackoverflow (always a useful measure ;-) for PetaPoco, and 6 for MicroLite.

Which one to use? Originally, I was leaning MicroLite due to PetaPoco's apparent performance issues - until I realized they weren't issues at all. So for now, I'm leaning towards PetaPoco. EDIT: see updates below, including adding Dapper to the mix.

Downloads

If you'd like to steal the tech in this example, a Visual Studio 2010 solution is here.

UPDATE - Now with Dapper

Well leave it to me to forgot the Granddaddy of them all, Dapper. If you don't know what Dapper is, it's the Micro ORM that StackOverflow uses. If you don't know what StackOverflow is, welcome to Earth, and please enjoy the show.

I didn't originally include Dapper because although I had heard of it, I didn't think of it as a Micro ORM (not for any particular reason - just my unfamiliarity with it).

I've added Dapper to the solution you can download (see Downloads), and you can judge for yourself how it compares. I, for one, was hoping for the magical unicorn combination of MicroLite speed and PetaPoco simplicity. Alas, not so (caveat: I am sure that I could be "doing it wrong," but as far as out-of-the-box and less than 1 hour of Googling goes... Edit: Yes, I was doing it wrong).

Dapper took over 77 seconds to insert 10,000 rows. Also, compare the syntax for inserting a new record.

Here's PetaPoco:

 foo foo = new foo();
 foo.name = "Created by PetaPoco";
 db.Insert("foo", "Id", foo);

Here's Dapper:

 string sqlQuery = "INSERT INTO foo(name) VALUES (@name)";
 conn.Execute(sqlQuery, new{ name="Created by Dapper" });

I don't say this to start a Holy War, but how would you like to be the gal who has to refactor the all the strings in the queries that use the 'foo' class when the business requirements come along to separate the 'name' property into 'FirstName' and 'LastName'? Or when the foo class grows to be a few dozen (hundred?) properties long? (Yes I know it's probably not good design if a class has that many properties but we've all been there.)

I am sure Dapper has many pluses going for it, but my expectations were (probably too) high.

Maybe in a future project I'll write a PetaPoco-like wrapper for Dapper, and finally be happy :-).

UPDATE #2 - Massive Performance Improvements

I just couldn't believe that StackOverflow's MicroORM was so slow, and was convinced I was doing it wrong. Well, I was. For doing the kind of work I was doing (lots of inserts), it turns out you are supposed to wrap the work in a TransactionScope.

So, adding the System.Transactions library to my project, and wrapping the inserts with the following:

      using (var transactionScope = new TransactionScope())
            {
              // do lots of transactions
             transactionScope.Complete();
            }

Dapper was able to insert 10,000 rows in a stunning 493 milliseconds. Folks, we have a clear winner with performance. Now if only the Dapper had cleaner insert/update syntax!

Likewise, PetaPoco has a similar mechanism:

 var db = new PetaPoco.Database("sqlite");
 using (var transaction = db.GetTransaction())
 {
    // do lots of transactions
    transaction.Complete();
  }

Which brings the PetaPoco bulk insert time down to right around 1 second (1087 millis).

So now we're back where we started. PetaPoco's speed is comparable to the others, and it's got very nice, simple syntax. And if I hit a gnarly roadblock, I know I at least have a couple of decent alternatives to fall back on.