UPDATE, 2013004: With a slightly modified code, and 3 threads (see basecompl_par2.go below), the numbers are now improved: 52%, resp. 80% speedup, depending on how you count, so, almost doubling the speed!
Note: Use this G+ thread if you want to comment/discuss!
Note II: Of course, in reality, for this kind of basic bioinformatics processing, one would use a pre-existing package, like BioGo instead!
Bioinformaticians process a lot of text based files (although the situation is improving and some better formats are emerging, slowly).
The text formats in question typically need to be parsed line-by-line, for practical reasons and for getting clean, modular and understandeable code. I was thus interested to see how much speedup one could gain for such a string processing task, by using the very easy concurrency / parallelization features of the Go programming language.
As a simple test code, I made a very "dumb" code that just computes the DNA base complement back and fourth 4 times (I only do the 4 common letters, no other logic), in order to emulate a 4-step string processing workflow.
This code does not do any creation of a new string, or string copying, as part of the processing, so it would not be a good imitaion of such processing (which exists, in the form of for example DNA-to-Protein sequence conversion etc). Thus I also created a version of the code that explicitly does string copy. How dumb it might be, I included it in the test.
Finally, I made a parallel version, that spawns go routines for each of the processing steps, which in turn are multiplexed upon 2 threads (set in the program), which turned out to be the optimal for my 1.6GHz Core i5 MacBook air.
The results are as follows:
The data in numbers (click for code):
That is: For the parallel version:
Even though one would of course dream about even better speedups, this is not too bad IMO, given that this is the 1st (or second) time in my life that I implement a parallel program (and I picked up Go just 1-2 weeks ago!) and given that this problem mostly consists of disk access and data shuffling, which is hard to parallellize anyway.