Friday, March 7, 2008

605.462 - Homework #3 Web Notebook - Part 3

Retrieving Occupation Employment Data (Acquire)

The U.S. Department of Labor's Bureau of Labor Statistics publish Occupational Employment and Wage Estimates that are free to download from this web site:

The data comes in the form of an Excel spreadsheet. To find the number of people employed in a particular occupation for a given State, I have downloaded the State Cross-Industry estimates spreadsheet for May 2006, which is the latest report available. The spreadsheet contains a table containing the field I require about the relation of occupational employment by State, which are State name, occupation identifier, total employed in occupation, and average hourly wage.

To get a list of an occupation's employment by state using Excel I use the filter tool to select my target occupation. This is most easily done by selecting a cell in the header row and choosing Auto-filter from the Data menu. Then I choose a value from the OCC_CODE column's filter menu. In the initial test I will select code 15-0000, which identifies the Computer and mathematical occupations.

I can now attempt to show the correlation by using a scatter plot. The scatter plot show how much one variable is affected by another. My question implied that if the number of people employed in an occupation is high the salaries might trend higher. The scatter plot shows the groupings correlate somewhat and the CORREL function actually show a fairly good correlation coefficient of 0.61.

But, what if I chose the percentage of the State's population employed in the occupation instead the mere count? Would that indicate a stronger positive or negative correlation? For the Computer and Math Occupation group an even stronger correlation was achieved, 0.84! I might be on to something here.

The problem you might see in this representation is that one cannot see the State values individually (they are reduced to a point representation on the scatterplot. Is that a big problem? I don't know. Oe can visually see the positive correlation by recognizing the tight grouping of the points in a straight-ish ascending diagonal line. The question is about the trend, but individual values might be useful. How can we do this? I will address this in my next post.

No comments: