Open Source DCoT Application - Word Counter
I am in the process of performing some analysis on the posts on Daily Cup of Tech. One of the things that I want to do is a word count and frequency analysis on the entire blog.Now, I could go with good ol’ pen and paper and start counting every single word on the blog. But, that would take me quite a mount of time, not to mention that I would not learn anything in the process.
So, I decided to export the contents of my mySQL database the runs behind the scenes at DCoT to a text file and then download a word and frequency counter. Do you think I could find a word counter that would count all of the words in the file and then count how many times each word appears? No luck.
But, my bad fortune is your lucky day. I decided that since I couldn’t find anything like this, I’d make it myself. So. today I present you with the Daily Cup of Tech Word Counter!
The application is a self contained program that is fully portable to USB devices. You can download the program and the source code if you are interested. The program is written in AutoIt.
Here is a screenshot of my new baby:

Most of the program is self explanatory. You can sort the output alphabetically or by how frequent each word appears. You can also sort in ascending or descending order. You can count the words that you type or paste into the edit box or use a text file.
The delete options may be the only confusion portion. When you are counting words, you need to clean up the rough text a bit. Delete some punctuation, get rid on non-printable letters, or scrub out the non-standard English words. Each of these options selects a different one of these options. Control characters are things like carriage returns and line spacing. Punctuation is your standard punctuation that you will find in most documents. Extended characters are characters that you usually do not see regularly and are often used in some non-English languages.
The Use Spaces option will replace all deleted characters with spaces rather than deleting them. This can modify your outcomes so feel free to experiment.
When you are done counting your words, a complete list of all the words and how often they appeared will be presented in the edit box.
Feel free to play around with this and let me know if you find it to be useful.
18 Responses to “Open Source DCoT Application - Word Counter”
-
joe Says:
May 24th, 2008 at 9:28 pmIt’s not a free application, but I scored a free copy of PageFour “software for writers” that has a similar function, except it also records the frequency of phrases. I don’t think that you can export the list, however.
http://www.softwareforwriting.com/pagefour.html
“The unlicensed version is limited to 3 Notebooks with a total of 20 Pages per Notebook.”
Just in case you wanted to check out phrase frequencies as well.
-
joe Says:
May 24th, 2008 at 9:33 pmSource code link gets 404 error
-
Tim Fehlman Says:
May 24th, 2008 at 10:14 pm@joe,
Sorry for the 404. Error fixed.
Tim
-
Daniel Fackrell Says:
May 25th, 2008 at 1:57 amThis requires access to a *n?x-style shell and associated tools, but it should do pretty close to what you need:
cat {filename} | sed 's/[^[:alpha:]][^[:alpha:]]*/\
/g' | sort | uniq -c
If you want to ignore case, add this before ’sed’:
cat {filename} | tr [:upper:] [:lower:] | sed 's/[^[:alpha:]][^[:alpha:]]*/\
/g' | sort | uniq -c
And just in case, that’s two lines each, with the first line ending in a ‘\’ (and quoting the newline that follows it).
-
Tim Fehlman Says:
May 25th, 2008 at 8:50 am@Daniel Fackrell,
I love the *nix command line tools. I wish more people were familiar with them. They are very powerful and provide you with some very useful abilities. They are even relatively easy to install in Windows.
Unfortunately, as soon as you mention “command line”, many people glaze over.
Tim
-
Jimmy Rogers Says:
June 4th, 2008 at 6:52 pmI just tried it out (looking for a good word-counter) and my main complaint is the lack of a static location for your counted words. After it finishes analyzing, it would be good to see a field filled in with the word count, possibly at the top of the results page that outputs from the program. Not sure if I posted clearly, but I’m not a programmer so I don’t know much about the lingo.
-
Len Steele Says:
June 5th, 2008 at 8:39 amNoticed that the app separately counts the letter “s” in possessive words, e.g. “Bill’s”. Similarly, for “‘t”. Also, it would be nice to retain the total word count, seen at the start of individual word counting, at - say - the top of the final list. Also desirable would be the ability to resort the list. All that said, I just copied the list into Excel with the ” - ” as delimiter and did whatever sorting and summing I wanted, manually eliminating elements such as the “s” and “t” mentioned above.
-
Brad Isaac Says:
June 9th, 2008 at 7:35 pmGood job on this software. I like simple software that does one or two things really well - it works pretty fast and is accurate enough for blogging and other more casual writing.
Plus, I hadn’t heard of AutoIt, I’ll check it out.

-
Sabrine Says:
June 14th, 2008 at 4:17 amI am running this app now on my thesis, that indeed contains quite a lot of non-standard english words. But where does it get nonsense words like 64, te thabo, 3e, etc? Does it have something to do with me importing a regular word document into the app?
The app has been running now for 10 mins and is still not done. A progress bar would be nice. -
Daniel Fackrell Says:
June 14th, 2008 at 1:09 pm@Sabrine
The file formats used by many word processors will result in a lot of garbage when run through tools that expect plain text. I’d recommend copying the full contents of your thesis into a text-only editor like Notepad and saving from there to create a file that the word counter can ingest.
-
xindi Says:
June 18th, 2008 at 9:37 amwonderful. this was just what i was looking for, thank you!
-
kali Says:
March 8th, 2009 at 8:40 pmThank you. I had just downloaded it. I hope this will help me in my research
-
Ynah Says:
March 15th, 2009 at 9:19 pmThank you
-
E. Says:
March 19th, 2009 at 1:44 pmThanks alot for this great program, I have just used it on 480k words, no crash. AWESOME
Have a beer! I have just logged out of Paypal
E.
-
Sara Says:
May 14th, 2009 at 3:30 pmYou just made my day! This is great!
-
Lota Says:
October 9th, 2009 at 9:02 pmAfter I figured out how to make Estonian eatable for the program, it worked out fabulously. Beautiful and fantastic in it’s simplisity and very much missed.
Thank you.
-
Maz Says:
October 13th, 2009 at 8:37 pmPretty good…and it’s simple!
Thank You. -
Johan Swarts Says:
October 15th, 2009 at 9:05 amThe executable has ‘n virus!
And your source doesn’t work on my Windows 7 RC1 or Windows XP2 SP 2 edition in 64 or 32 bit. Please, please remove the virus? I’d very much would like to use your program.

