Background
It so happens that I have a project that involves decompiling a certain vendor's binary files into XML, then manipulating the data in that XML, the recompiling the XML back into the proprietary binary format.The tool for de/re-compiling is provided by the vendor, and for our purposes let's say that it's a "black box" - we don't know how it does what it does, we just know that we give it an XML or binary file as input, and it spits out the opposite (sort of like that machine that could remove stars from the star-bellied Sneetches and vice versa).
In the cartoon version it was one machine. Use your imagination, people.
The Problem
The black box tool works Ok, but takes a second or two to spin up, do its thing, and produce a file of the expected output type. On my (pretty vanilla for 2013) dev laptop, it takes about 75 seconds to process 500 files.Can we throw more threads at it?
As we all undoubtedly know, multi-threading is both the cause of, and solution to, all software performance problems (apologies to The Simpsons). Multi-threaded processes can be exceedingly difficult to debug, performance can actually be degraded if you do it wrong, and for some things it brings only modest performance gains at best.So, will it work for my problem?
In theory it should. All of the complexity of what happens inside the black box is encapsulated away from my code and my poor brain. I just need to process lots of files, the more/faster the better.
This is the kind of thing the .NET Threadpool was made for.
The implementation
Well, I wish I could claim that I came up with the whole thing myself. But, standing on the shoulders of giants and all that, the oracle of sublime wisdom that is The Internet, and more importantly resources like StackOverflow, I found someone trying to solve basically the same problem.But what about this bit?
I only used four threads in my example because that's how many cores I have. It makes little sense to be using 20 threads when only four of them can be processing at any one time. But you're free to increase the MaxThreads number if you like.Good point. The implementation should make use of as many cores as it can, but any more than that doesn't do any good. So how do you do that? Oh StackOverflow, is there anything you don't know?
Ok. So here's my completed (and heavily blog-post-ified) code, with probably the bare minimum of flair:
using System; using System.Collections.Generic; using System.Text; using System.Threading; using System.IO; using System.Diagnostics; namespace AcmeCo.BigProject { /* * As far as the rest of the app is concerned, XmlCompiler does all the work. * You just tell it what XML files to process, and let it worry about the rest. */ public class XmlCompiler { List<string> Files; private static int CoreCount = 0; // No zero arg constructor: this class is pointless without files to process. public XmlCompiler(List<string> Files) { this.Files = Files; // Stolen from http://stackoverflow.com/questions/1542213/how-to-find-the-number-of-cpu-cores-via-net-c foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get()) { CoreCount += int.Parse(item["NumberOfCores"].ToString()); } } public void Process() { DoTheWork(); } private static int MaxThreads = 8; private Semaphore _sem = new Semaphore(MaxThreads, MaxThreads); // Stolen from http://stackoverflow.com/questions/15120818/net-2-0-processing-very-large-lists-using-threadpool void DoTheWork() { MaxThreads = CoreCount; int ItemsToProcess = this.Files.Count; Console.WriteLine("Processing " + ItemsToProcess + " on " + CoreCount + " cores"); int s = 0; for (int i = 0; i < ItemsToProcess; ++i) { Console.WriteLine("Processing " + i + " of " + ItemsToProcess + ": " + Files[i]); _sem.WaitOne(); XmlCompileTarget target = new XmlCompileTarget(Files[i]); ThreadPool.QueueUserWorkItem(Process, target); ++s; if (s >= 19) s = 0; } // All items have been assigned threads. // Now, acquire the semaphore "MaxThreads" times. // When counter reaches that number, we know all threads are done. int semCount = 0; while (semCount < MaxThreads) { _sem.Release();
++semCount; } // All items are processed // Clear the semaphore for next time. _sem.Release(semCount); } void Process(object o) { // do the processing ... XmlCompileTarget target = (XmlCompileTarget)o; target.Process(); // release the semaphore _sem.Release(); } } // A "unit" of work... this class' job is to hand the file to the processing // utility and raise an event when it's done. public class XmlCompileTarget { private string file; public XmlCompileTarget(string file) { this.file = file; } public void Process() { Compilefile(); } public static event EventHandler<XmlProcessEventArgs> OnProgress = delegate { }; protected virtual void Progress(XmlProcessEventArgs e) { OnProgress.Raise(this, e); } private void Compilefile() { if (!System.IO.File.Exists(file)) { OnProgress(this, new XmlProcessEventArgs(file, "File not found!")); return; } OnProgress(this, new XmlProcessEventArgs(file, XmlUtilities.RunTool(@"Tools\XmlComp.exe", new FileInfo(file), null))); } } // The processing utility runs the vendor's XML compiler and // returns any output from that tool as a string. public class XmlUtilities { public static string RunTool(string ExecutablePath, FileInfo FileInfo, string Arguments) { Process p = new Process(); ProcessStartInfo info = new ProcessStartInfo(); if (!File.Exists(ExecutablePath) || !FileInfo.Exists) { Console.WriteLine("Error: File path not found - \r\n" + ExecutablePath + " exists == " + File.Exists(ExecutablePath) + "\r\n" + FileInfo + " exists == " + FileInfo.Exists); return null; } Console.WriteLine(Arguments); info.FileName = ExecutablePath; info.Arguments = string.IsNullOrEmpty(Arguments) ? "\"" + FileInfo.FullName + "\"" : Arguments; info.UseShellExecute = false; info.RedirectStandardOutput = true; info.RedirectStandardError = true; info.RedirectStandardInput = true; info.ErrorDialog = true; info.Verb = "runas"; info.WindowStyle = ProcessWindowStyle.Hidden; p.StartInfo = info; p.Start(); string output = p.StandardOutput.ReadToEnd(); string error = p.StandardError.ReadToEnd(); p.WaitForExit(); StringBuilder sb = new StringBuilder(); sb.Append(output); return sb.ToString(); } } // Not much to see here... public class XmlProcessEventArgs : EventArgs { private string filename; private string output; public XmlProcessEventArgs(string filename, string output) { this.filename = filename; this.output = output; } } // Ah, why this? // Because it is a drag to continually have to add tons // of thread-safe invocations on every last UI // element that might need updating as a result of the event // that was raised. // Isn't it better to make the *event* notification thread-safe, // and let UI elements be their merry little selves on their own // merry little thread? // But we digress... // Stolen from: http://stackoverflow.com/a/2150359/2124709 public static class ExtensionMethods { /// <summary>Raises the event (on the UI thread if available).</summary> /// <param name="multicastDelegate">The event to raise.</param> /// <param name="sender">The source of the event.</param> /// <param name="e">An EventArgs that contains the event data.</param> /// <returns>The return value of the event invocation or null if none.</returns> /// <remarks>Usage: MyEvent.Raise(this, EventArgs.Empty);</remarks> public static object Raise(this MulticastDelegate multicastDelegate, object sender, EventArgs e) { object retVal = null; MulticastDelegate threadSafeMulticastDelegate = multicastDelegate; if (threadSafeMulticastDelegate != null) { foreach (Delegate d in threadSafeMulticastDelegate.GetInvocationList()) { var synchronizeInvoke = d.Target as System.ComponentModel.ISynchronizeInvoke; if ((synchronizeInvoke != null) && synchronizeInvoke.InvokeRequired) { try { retVal = synchronizeInvoke.EndInvoke(synchronizeInvoke.BeginInvoke(d, new[] { sender, e })); } catch (Exception ex) { Console.WriteLine(ex.Message + Environment.NewLine + ex.StackTrace); } } else { retVal = d.DynamicInvoke(new[] { sender, e }); } } } return retVal; } } }
Benchmarking
What kind of impact did all of that have? Let's compare:
Operating System: Windows 7 Professional 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.130318-1533)
Processor: Intel(R) Core(TM) i7 CPU Q 720 @ 1.60GHz (8 CPUs), ~1.6GHz
Memory: 8192MB RAM
Available OS Memory: 8124MB RAM
Page File: 8020MB used, 8224MB available
* No multi-threading (serial processing of each XML file), 500 files. Processing time: 75 seconds.
* Using code above (multi-threading on 4 cores), 500 files. Processing time: 15 seconds.
Well, we've cut our processing time to 20% of what it was originally. What happens if we add more cores? I happen to have access to a bigger box here somewhere...
Operating System: Windows 7 Professional 64-bit (6.1, Build 7601) Service Pack 1 (7601.win7sp1_gdr.110408-1631)
Processor: Intel(R) Core(TM) i7 CPU X 990 @ 3.47GHz (12 CPUs), ~3.5GHz
Memory: 6144MB RAM
Available OS Memory: 6136MB RAM
Page File: 1932MB used, 12392MB available
* Using code above (multi-threading on 6 cores), 500 files. Processing time: 5 seconds.
Summary and Conclusions
I started with a bottleneck that required my code to run an external tool to process many thousands of files. After investigating running multiple instances of the external tool in parallel via multi-threading (and using hardware better suited for the job at hand), I was able to decrease the net runtime to ~ 7% of the time it did running sequentially in a single thread.I can live with that.