As most of you can tell, I have greatly reduced the advertising noise from Daily Cup of Tech. (Thanks, Kiltak!) I have also tweaked some other ads so that I can monitor them better.
So, I was just checking something today and image my surprise when I saw this:
For those of you who are not familiar with the Google Adsense website, the top row beside Adsense for Content is supposed to be the sum of all the rows below it plus some rows that are not shown. In any case, the number for page impressions and clicks should be equal to or greater than the sum of all the numbers below it.
Unfortunately, the Clicks column shows a sum of 2 clicks while some quick math tells us that the sum of this column should be at least 3 (1+2)!
Now, I know this seems like a really small and insignificant thing but if they are making these type of mistakes with simple equations, could they be making other mistakes as well? Is it possible that they are making mistakes in the check amounts that they are sending out?
What has your experience been with Google Adsense? Have you seen anything like this? Do you think this could be an indicator that there are some bigger issues with the calculations that are being made and how Adsense is dealing with their numbers? Let us know in the comments.
I have been putting some thought around creating a distributed file archive system with redundancy lately and I think that I have come up with a viable proof of concept. The entire process is manual at this point but with a bit of work, I think that I could automate it and make it usable.
What Is It?
The whole idea came to me from a comment left on a tumblog post. Essentially, JD asked about whether or not someone could point him in the right direction for something like this. I gave it some thought and I think I have a viable model.
Essentially, the question was asked whether or not we could use all of the unused storage on all of the workstations and laptops in a small enterprise environment as a backup or archive solution. To me, this seemed like a logical use of resources, especially for a small IT shop where the budgets are small or for a home with a now common one computer per person setup.
On the surface, this seemed like a wonderful idea but there were issues.
No Redundancy
The biggest issue that I saw with a solution that uses this concept is the hard drive. Workstations are typically single drive systems. There is rarely any redundancy in place for these drives. If that drive fails, your data is gone.
Now, if this is a simple backup solution, this may be less of an issue because, since the data is a copy to begin with, you already have a copy of the data. Things get a bit more risky when we are talking about an archive system.
The purpose of an archive is to move data to a storage location for later access. By definition, you do not have a copy back where the original was located. Now what should we do?
Parchive to the Rescue
The answer to this problem is to use parchive files for redundancy. What are parchive files? Here is what the Parchive Project says about parchive files:
The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet. We accomplished that goal. Our new goal with version 2.0 of the specification is to improve. It extends the idea of version 1.0 and takes the recovery process beyond the file-level barrier. This allows for more effective protection with less recovery data, and removes some previous limitations on the number of recoverable parts.
How The System Would Work
Let’s use a common scenario to examine how to use parchive files to create a redundant archive storage grid.
Let’s say, for example, that you have six computers on your network, your computer and five others. Your connections to these computers would look something like this:
Let’s also assume that you have write access to a share on each of these computers.
Now, you want to archive your data by distributing it on each of the systems. For our example, we are going to assume that you have a 697 MB file called ubuntu.iso that you want to archive and each system has 150 MB of free disk space.
You compress the file to save disk space. You now have a file ubuntu.zip that is 681 MB in size.
You now split the Data.zip file into five equally sized files. You are now left with the following files:
ubuntu.zip.001
ubuntu.zip.002
ubuntu.zip.003
ubuntu.zip.004
ubuntu.zip.005
Each file is 136 MB in size.
You place one file on each computer. So:
ubuntu.zip.001 on Computer 1
ubuntu.zip.002 on Computer 2
ubuntu.zip.003 on Computer 3
ubuntu.zip.004 on Computer 4
ubuntu.zip.005 on Computer 5
This creates a total of 681 MB of used storage.
Accounting for Hard Drive Failure
This scenario works well as long as nothing goes wrong! But, if you were to lose the hard drive on just one of the workstations, all of the data in ubuntu.iso is gone!
One option would be to put duplicate files on each system. So, you could do the following:
ubuntu.zip.001 and ubuntu.zip.002 on Computer 1
ubuntu.zip.002 and ubuntu.zip.003 on Computer 2
ubuntu.zip.003 and ubuntu.zip.004 on Computer 3
ubuntu.zip.004 and ubuntu.zip.005 on Computer 4
ubuntu.zip.005 and ubuntu.zip.001 on Computer 5
This would require 1,362 MB of storage to ensure that if one of the systems crashed, you would be able to recover all of your data.
But, if we were to create parchive files, the amount of data that we would have to store would become significantly less. In our example, we would need to create five parchive files with a redundancy of 25%. One parchive volume file and the main par file would accompany each file. The file distribution would look like this:
ubuntu.zip.001, ubuntu.zip.vol000+94.PAR2, and ubuntu.zip.par2 on Computer 1
ubuntu.zip.002, ubuntu.zip.vol094+94.PAR2, and ubuntu.zip.par2 on Computer 2
ubuntu.zip.003, ubuntu.zip.vol188+93.PAR2, and ubuntu.zip.par2 on Computer 3
ubuntu.zip.004, ubuntu.zip.vol281+93.PAR2, and ubuntu.zip.par2 on Computer 4
ubuntu.zip.005, ubuntu.zip.vol374+93.PAR2, and ubuntu.zip.par2 on Computer 5
The total required amount of disk space would be approximately 854 MB! This is 508 MB less disk storage than the previous solution, a savings of 37.3%!
The More, The Merrier
The nice thing about this solution is that the more workstations that you have, the less redundant overhead that you require. See the table below:
Workstation Count
Redundancy Overhead
2
100.00%
3
50.00%
4
33.33%
5
25.00%
10
11.11%
25
5.26%
50
2.04%
100
1.01%
The Math
There are a lot of calculation that are being made for these configurations. All of these configurations are based on the number of archive locations. For these calculations, let’s assume that the number of archive locations is represented by a and the compressed file size in bytes is represented by z.
The number of files (f) equals the number of archive locations (a). This should be used for both splitting the compressed file and determining the number of parchive files to create.
We also need to plan how redundant we want our system to be. So, the number of locations that can be dead is represented by d. Please note that is it is very important that d < a (i.e. the number of archive locations must be greater than the number of dead locations).
Redundancy
The percentage of redundancy (r%) required can be calculated as follows:
r% = d / (a - d) * 100
Total Storage Required
The total storage (s) required for an individual file:
s = z * r% + z
Split File Size
Size of each file in bytes (b) when the compressed file is split:
b = z / f
Using the Calculations in QuickPar
I use QuickPar to create the parchive files. Here is a screenshot to show you where these calculations come in place in the QuickPar application.
Perform Your Own Manual Proof of Concept
Here is how you can do your own proof of concept for this type of a system:
Archiving
Download the software that you will work with. I use QuickPar to create parchive files and 7-Zip for file compression and splitting. I use these because they are freely available on the Internet.
Create archive locations. Since this is a proof of concept, they can be locations on remote systems or different folders on your own computer.
Compress your file using 7-Zip.
Determine the size of each file for splitting (b) and spit the file using 7-Zip.
Create the parchive files using QuickPar and the calculations provided above.
Move the files to your archive locations as indicated in the example above.
Move (or if your are interested in living dangerously, delete) the original file and the compressed file.
Recovery
Copy all but one location’s worth of the split and parchive files back into the original location. This will simulate the failure of one system.
Open up the main parchive file (this is the smallest file) in QuickPar.
Rebuild the lost/damaged files in QuickPar.
Recombine the files in 7-Zip.
Conclusion
Once again, this is a proof of concept just to show how a system like this would work. My next step would be to get AutoIt fired up and use command line versions of 7-Zip and QuickPar to automate the entire process.
So, what do you think of this idea? How could you use it in your environment? Let me know in the comments.
A number of programs that used to work just fine in Windows XP have difficulty in Vista because of the new security model. When I discover one of these applications that requires me to run it as an administrator, I reconfigure the shortcut so that it will automatically run the application properly.
To do this, follow these steps:
Log in as a user with local administrative privileges.
Right click on the application shortcut and select Properties.
In the Properties window, click on the Advanced… button near the bottom.
Check off the Run as administrator box.
close the Advanced Properties box by clicking on the OK button.
Close the Properties box by clicking on the OK button.
The next time that you run the application from the modified shortcut, it will run with administrative privileges.