I am in the process of performing some analysis on the posts on Daily Cup of Tech. One of the things that I want to do is a word count and frequency analysis on the entire blog.Now, I could go with good ol’ pen and paper and start counting every single word on the blog. But, that would take me quite a mount of time, not to mention that I would not learn anything in the process.

So, I decided to export the contents of my mySQL database the runs behind the scenes at DCoT to a text file and then download a word and frequency counter. Do you think I could find a word counter that would count all of the words in the file and then count how many times each word appears? No luck.

But, my bad fortune is your lucky day. I decided that since I couldn’t find anything like this, I’d make it myself. So. today I present you with the Daily Cup of Tech Word Counter!

The application is a self contained program that is fully portable to USB devices. You can download the program and the source code if you are interested. The program is written in AutoIt.

Here is a screenshot of my new baby:

Most of the program is self explanatory. You can sort the output alphabetically or by how frequent each word appears. You can also sort in ascending or descending order. You can count the words that you type or paste into the edit box or use a text file.

The delete options may be the only confusion portion. When you are counting words, you need to clean up the rough text a bit. Delete some punctuation, get rid on non-printable letters, or scrub out the non-standard English words. Each of these options selects a different one of these options. Control characters are things like carriage returns and line spacing. Punctuation is your standard punctuation that you will find in most documents. Extended characters are characters that you usually do not see regularly and are often used in some non-English languages.

The Use Spaces option will replace all deleted characters with spaces rather than deleting them. This can modify your outcomes so feel free to experiment.

When you are done counting your words, a complete list of all the words and how often they appeared will be presented in the edit box.

Feel free to play around with this and let me know if you find it to be useful.

I was doing some research the other day on getting your self out of a difficult situation when you don’t have access to some vital system resources because you are running as a normal user and you lost your local admin password.

I discovered that there is a way to reset your user interface and run interactively as the LOCAL SYSTEM account. This is important because the LOCAL SYSTEM account has a lot of privileges available to it. According to Microsoft:

The system account and the administrator account (Administrators group) have the same file privileges, but they have different functions. The system account is used by the operating system and by services that run under Windows. There are many services and processes within Windows that need the capability to log on internally (for example during a Windows installation). The system account was designed for that purpose; it is an internal account, does not show up in User Manager, cannot be added to any groups, and cannot have user rights assigned to it. On the other hand, the system account does show up on an NTFS volume in File Manager in the Permissions portion of the Security menu. By default, the system account is granted full control to all files on an NTFS volume. Here the system account has the same functional privileges as the administrator account.

A little while back, some enterprising individuals discovered a way to run the LOCAL SYSTEM account interactively. Here are the instructions according to one website:

  1. Start > Run > cmd.exe > type: at 12:03 /interactive “cmd.exe” (replace 12:03 with a time 2 mins from now). > close command prompt
  2. New command prompt will open, when it does > Hit CTRL+ALT+DEL > find explorer.exe and End Process.
  3. At command prompt type: cd.. > type: explorer.exe

This all words fine except that it is a bit confusing for someone who does not understand how all this works. So, I thought I would make it easier for those who do not have my background. I created a little program in AutoIt that completely automated the process. Simply run the program, wait for a couple of minutes, and you’re running as the LOCAL SYSTEM account.

You can download this program and play with it all you want.

WARNING: I have tested this program to the best of my abilities but this does not mean it is perfect. I did not have any problems with it but that does not mean you will not. If something goes wrong, don’t blame me! You’ve been warned.

For those of you who are interested, here is the source code for this little program I wrote. Feel free to hack around and make it do different things:

#include <Date.au3>
If $CmdLine[0] = 0 Then
;No command line options
;First run
$RunTime = _DateTimeFormat(_DateAdd(’n', 1, _NowCalc()),5)
$Command = @ComSpec & ” /c AT ” & $RunTime & ” /interactive “”" & @ScriptFullPath & “”" 2″
Run($Command)
Else
;Second run
$Command = @ComSpec & ” /c taskkill /IM explorer.exe /F & ” & @WindowsDir & “/explorer.exe”
Run($Command)
EndIf

Update: Someone asked in the comments how you get back to your normal account. Simply log out and then log back in as yourself. You should be back to normal.

Building forms for your website is a real pain in the backside. There are so many different aspects of the form that you need to know in order to get everything just right.

When a friend from work asked me if I knew of some easier way to create forms, I realized that I didn’t have a quick and easy answer for him. So, a bit of time on Google and some typing later and I have created a nice list of useful form building resources.

  • pForm - a really good online form building tool that lets you create a customized form and then download the form for use in your website. Really easy to use and set up.
  • JotForm - offers a free and a premium account option. If you are looking at using the forms on your own website, there is very little reason to purchase the premium option. Provides a number of very professional tools that provides you with a maximum of flexibility. It even includes a really nice form building wizard.
  • PageBreeze - while this is not an online tool, it is an HTML editor that will allow you to create forms using a simple drag and drop method. You just need to do some basic HTML changes to connect it into your website.
  • FormMaker - much like our first two entries, FormMaker is also web based. One of the advantages that it has over the other forms is that it allows you to capture data and store it on their servers for later access. There is no charge for the form maker but it does require an account.
  • <STRONG><CONTACT> - An application designed specifically for creating contact forms. It will generate both the PHP and the CSS code required for the contact form to create the look and feel you want.

I know that all of the major IDEs out there have some sort of a form builder in them but if you are looking for a fast and dirty form, this may be a quick alternative.

If you have ever done a trace route on where information goes when it travels over the Internet to your computer, it is actually pretty staggering to see how far it goes. It is pretty funny sometimes to see an e-mail travels half way around the world and back just to get to your neighbor!

I got thinking about this and an idea came to mind that could change how we look at file sharing.

How It’s Done Now

Let’s say, for example, I want to share a 600 Mb ISO file with my neighbor. There are several different ways of doing this but let’s say that the easiest way to do this would be to transfer the file via a peer to peer program.

Copy Via Internet

This would cost both of us about 600 MB of bandwidth that we would have to pay each of our Internet provider’s for.

WiFi Transfer

Now, my neighbor gets smart when he realizes that both of us have WiFi networks available to our systems. So we connect our two WiFi networks and decide to transfer the file over our WiFi networks.

Copy Via WiFi

We’ve now transfered the file between the two of us and it didn’t use any of our Internet bandwidth. In fact, we were able to transfer that file much faster because we were not limited by the upload speed of our Internet connections (Upload speeds are usually less that download speeds.).

WiFi Hops

Now, let’s say that I want to get a file from a second neighbor. But, he is unfortunately outside the range of my WiFi signal. But, my first neighbor, who is located between us, is in range of both of our WiFi signals. He connect to both of our WiFi networks and acts like a relay connection between our two networks.

WiFi MultiHop

So, we have now transfered a file using WiFi between two systems that are outside of each other’s WiFi signal range.

Spreading the Quilt

Theoretically, the more people we connect together with this WiFi quilt, the more access we have to information and data.

Metro Coverage

Each one of these systems would have access to information on each of the other systems, allowing for data to be shared freely without the need or restrictions of their Internet connections.

Stitching Together the Quilt

Just like a quilt is made up of separate pieces of cloth, each metro WiFi network will be limited to the range of its WiFi area. This is where we use the Internet to make data available to each of the metro areas.

Internet Gateway and Cache

Whenever a system needs to go outside of its metro area to access information via the Internet, it can maintain a cache of the information so that it now becomes available to the rest of the metro area.

Concerns and Issues

This idea is not without its problems. For example, I do not think I would give just anyone unfettered access to my home computer or network. Rather, I would probably want to segment my network so that only a certain computer works on the shared.

Another problem is coverage. In order for this idea to work well, there needs to be a relatively large percentage of area covered to see a benefit. But, this might be one of those things where you do not see a lot of benefit at the beginning of the project but, as time goes on, it becomes more and more feasible and useful

The other issue is with this project is just how do people find the information that they want and how does the system go about determining which is the best route to access the information?

Discussion

Even though there are some definite challenges with this idea, I still think that it would be a really interesting thing to try and set up. It could even include some contributions from things like the OpenWRT project, DIY WISP, and cantennas.

But, I’ve talked enough.  Time for you to add your two cents.  What do you think of this type of an idea?

I have been putting some thought around creating a distributed file archive system with redundancy lately and I think that I have come up with a viable proof of concept. The entire process is manual at this point but with a bit of work, I think that I could automate it and make it usable.

What Is It?

The whole idea came to me from a comment left on a tumblog post. Essentially, JD asked about whether or not someone could point him in the right direction for something like this. I gave it some thought and I think I have a viable model.

Essentially, the question was asked whether or not we could use all of the unused storage on all of the workstations and laptops in a small enterprise environment as a backup or archive solution. To me, this seemed like a logical use of resources, especially for a small IT shop where the budgets are small or for a home with a now common one computer per person setup.

On the surface, this seemed like a wonderful idea but there were issues.

No Redundancy

The biggest issue that I saw with a solution that uses this concept is the hard drive. Workstations are typically single drive systems. There is rarely any redundancy in place for these drives. If that drive fails, your data is gone.

Now, if this is a simple backup solution, this may be less of an issue because, since the data is a copy to begin with, you already have a copy of the data. Things get a bit more risky when we are talking about an archive system.

The purpose of an archive is to move data to a storage location for later access. By definition, you do not have a copy back where the original was located. Now what should we do?

Parchive to the Rescue

The answer to this problem is to use parchive files for redundancy. What are parchive files? Here is what the Parchive Project says about parchive files:

The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal. Our new goal with version 2.0 of the specification is to improve. It extends the idea of version 1.0 and takes the recovery process beyond the file-level barrier. This allows for more effective protection with less recovery data, and removes some previous limitations on the number of recoverable parts.

How The System Would Work

Let’s use a common scenario to examine how to use parchive files to create a redundant archive storage grid.

Let’s say, for example, that you have six computers on your network, your computer and five others. Your connections to these computers would look something like this:

Network

Let’s also assume that you have write access to a share on each of these computers.

Now, you want to archive your data by distributing it on each of the systems. For our example, we are going to assume that you have a 697 MB file called ubuntu.iso that you want to archive and each system has 150 MB of free disk space.

You compress the file to save disk space. You now have a file ubuntu.zip that is 681 MB in size.

You now split the Data.zip file into five equally sized files. You are now left with the following files:

  • ubuntu.zip.001
  • ubuntu.zip.002
  • ubuntu.zip.003
  • ubuntu.zip.004
  • ubuntu.zip.005

Each file is 136 MB in size.

You place one file on each computer. So:

  • ubuntu.zip.001 on Computer 1
  • ubuntu.zip.002 on Computer 2
  • ubuntu.zip.003 on Computer 3
  • ubuntu.zip.004 on Computer 4
  • ubuntu.zip.005 on Computer 5

This creates a total of 681 MB of used storage.

Accounting for Hard Drive Failure

This scenario works well as long as nothing goes wrong! But, if you were to lose the hard drive on just one of the workstations, all of the data in ubuntu.iso is gone!

One option would be to put duplicate files on each system. So, you could do the following:

  • ubuntu.zip.001 and ubuntu.zip.002 on Computer 1
  • ubuntu.zip.002 and ubuntu.zip.003 on Computer 2
  • ubuntu.zip.003 and ubuntu.zip.004 on Computer 3
  • ubuntu.zip.004 and ubuntu.zip.005 on Computer 4
  • ubuntu.zip.005 and ubuntu.zip.001 on Computer 5

This would require 1,362 MB of storage to ensure that if one of the systems crashed, you would be able to recover all of your data.

But, if we were to create parchive files, the amount of data that we would have to store would become significantly less. In our example, we would need to create five parchive files with a redundancy of 25%. One parchive volume file and the main par file would accompany each file. The file distribution would look like this:

  • ubuntu.zip.001, ubuntu.zip.vol000+94.PAR2, and ubuntu.zip.par2 on Computer 1
  • ubuntu.zip.002, ubuntu.zip.vol094+94.PAR2, and ubuntu.zip.par2 on Computer 2
  • ubuntu.zip.003, ubuntu.zip.vol188+93.PAR2, and ubuntu.zip.par2 on Computer 3
  • ubuntu.zip.004, ubuntu.zip.vol281+93.PAR2, and ubuntu.zip.par2 on Computer 4
  • ubuntu.zip.005, ubuntu.zip.vol374+93.PAR2, and ubuntu.zip.par2 on Computer 5

The total required amount of disk space would be approximately 854 MB! This is 508 MB less disk storage than the previous solution, a savings of 37.3%!

The More, The Merrier

The nice thing about this solution is that the more workstations that you have, the less redundant overhead that you require. See the table below:

Workstation Count Redundancy Overhead
2 100.00%
3 50.00%
4 33.33%
5 25.00%
10 11.11%
25 5.26%
50 2.04%
100 1.01%

The Math

There are a lot of calculation that are being made for these configurations. All of these configurations are based on the number of archive locations. For these calculations, let’s assume that the number of archive locations is represented by a and the compressed file size in bytes is represented by z.

The number of files (f) equals the number of archive locations (a). This should be used for both splitting the compressed file and determining the number of parchive files to create.

We also need to plan how redundant we want our system to be. So, the number of locations that can be dead is represented by d. Please note that is it is very important that d < a (i.e. the number of archive locations must be greater than the number of dead locations).

Redundancy

The percentage of redundancy (r%) required can be calculated as follows:

r% = d / (a - d) * 100

Total Storage Required

The total storage (s) required for an individual file:

s = z * r% + z

Split File Size

Size of each file in bytes (b) when the compressed file is split:

b = z / f

Using the Calculations in QuickPar

I use QuickPar to create the parchive files. Here is a screenshot to show you where these calculations come in place in the QuickPar application.

Par Calculations

Perform Your Own Manual Proof of Concept

Here is how you can do your own proof of concept for this type of a system:

Archiving

  1. Download the software that you will work with. I use QuickPar to create parchive files and 7-Zip for file compression and splitting. I use these because they are freely available on the Internet.
  2. Create archive locations. Since this is a proof of concept, they can be locations on remote systems or different folders on your own computer.
  3. Compress your file using 7-Zip.
  4. Determine the size of each file for splitting (b) and spit the file using 7-Zip.
  5. Create the parchive files using QuickPar and the calculations provided above.
  6. Move the files to your archive locations as indicated in the example above.
  7. Move (or if your are interested in living dangerously, delete) the original file and the compressed file.

Recovery

  1. Copy all but one location’s worth of the split and parchive files back into the original location. This will simulate the failure of one system.
  2. Open up the main parchive file (this is the smallest file) in QuickPar.
  3. Rebuild the lost/damaged files in QuickPar.
  4. Recombine the files in 7-Zip.

Conclusion

Once again, this is a proof of concept just to show how a system like this would work. My next step would be to get AutoIt fired up and use command line versions of 7-Zip and QuickPar to automate the entire process.

So, what do you think of this idea? How could you use it in your environment? Let me know in the comments.

Next Page »