After starting out trying to correlate hourly wage earnings to occupational demands I found that a more interesting question might be one that a high school senior or guidance counselor might ask about occupational prospects, "What occupation is the highest paid in the United States (and territories)?"

A visualization that would provide some answers to this question might show the average hourly wage earned for one of the 23 major occupations indicated by the Bureau of Labor Statistics and give an indication of how the wage compares with other occupations. It would also show what the high and low averages are for the "major occupation" within the U.S.

To provide some answers to this question I took the average hourly compensation provided by the 2006 occupational wage survey and filtered the results into a table that contained the following fields:

1. Occupation title

2. Lowest mean hourly wage amongst all States and territories

3. The State acronym with the lowest mean hourly wage

4. Highest mean hourly wage amongst all States and territories

5. The State acronym with the highest mean hourly wage

6. The average hourly wage for all States and territories

Fields 1 through 5 come from the May 2006 Occupational Employment and Wage Estimates: State Cross-Industry estimates spreadsheet. Field number 6 was matched by the OCC_CODE to the National Cross-Industry estimates for the same year.

Displaying this information proved to be a challenge. A scatter plot proved useful but required too much background knowledge.

I found a useful display by looking at Fatemeh Hajiha and Laurie Salmon's paper, "Employment Composition: Variation Across States and Metropolitan Areas" located at http://146.142.4.22/oes/2004/may/emp.pdf. Their graph, "Minimum, average, and maximum employment concentrations among States by major occupational group, May 2004" used a visualization of the minimum, average, and maximum employment concentrations by major occupation. This graph was my inspiration and a lot of credit goes to them for my final outcome.

In order to display the information found in the table above in the manner I found useful I used the GIMP imaging program, which has the identical functionality of Photoshop's image processing (but is free). The following is what I produced. The caption following the image explains what you are viewing.

Caption

This chart depicts the variation in hourly wages for the major occupations among the United States and its territories. For each occupational group, there is a band depicting the range of hourly wage for each group. The left edge is the minimum average wage for a particular State (such as, Puerto Rico), the right edge is the maximum hourly wage (as in, Washington, DC). There is a red dividing line that indicates the average amongst all States for the given occupation. The Y-axis show the occupations with the highest average hourly wage in descending order. The X-axis shows the hourly wage value.

There are a lot of things shown by the visualization. The occupation with the highest hourly wage is the Legal group, but it is not the highest paid occupation on average (which is the Management group). Farming and fishing occupations are on average the lowest paid occupation, but certain States (like Alaska) value it more than the national average indicates.

It becomes pretty apparent that the visualization could use an interactive user interface that could tell the user what the average values are by State, but time would not permit me learning a better visualization toolkit. I hope to show these types of result more clearly in the future.

## Monday, March 10, 2008

## Friday, March 7, 2008

### 605.462 - Homework #3 Web Notebook - Part 3

Retrieving Occupation Employment Data (Acquire)

The U.S. Department of Labor's Bureau of Labor Statistics publish Occupational Employment and Wage Estimates that are free to download from this web site: http://www.bls.gov/oes/oes_dl.htm.

The data comes in the form of an Excel spreadsheet. To find the number of people employed in a particular occupation for a given State, I have downloaded the State Cross-Industry estimates spreadsheet for May 2006, which is the latest report available. The spreadsheet contains a table containing the field I require about the relation of occupational employment by State, which are State name, occupation identifier, total employed in occupation, and average hourly wage.

To get a list of an occupation's employment by state using Excel I use the filter tool to select my target occupation. This is most easily done by selecting a cell in the header row and choosing Auto-filter from the Data menu. Then I choose a value from the OCC_CODE column's filter menu. In the initial test I will select code 15-0000, which identifies the Computer and mathematical occupations.

I can now attempt to show the correlation by using a scatter plot. The scatter plot show how much one variable is affected by another. My question implied that if the number of people employed in an occupation is high the salaries might trend higher. The scatter plot shows the groupings correlate somewhat and the CORREL function actually show a fairly good correlation coefficient of 0.61.

But, what if I chose the percentage of the State's population employed in the occupation instead the mere count? Would that indicate a stronger positive or negative correlation? For the Computer and Math Occupation group an even stronger correlation was achieved, 0.84! I might be on to something here.

The problem you might see in this representation is that one cannot see the State values individually (they are reduced to a point representation on the scatterplot. Is that a big problem? I don't know. Oe can visually see the positive correlation by recognizing the tight grouping of the points in a straight-ish ascending diagonal line. The question is about the trend, but individual values might be useful. How can we do this? I will address this in my next post.

The U.S. Department of Labor's Bureau of Labor Statistics publish Occupational Employment and Wage Estimates that are free to download from this web site: http://www.bls.gov/oes/oes_dl.htm.

The data comes in the form of an Excel spreadsheet. To find the number of people employed in a particular occupation for a given State, I have downloaded the State Cross-Industry estimates spreadsheet for May 2006, which is the latest report available. The spreadsheet contains a table containing the field I require about the relation of occupational employment by State, which are State name, occupation identifier, total employed in occupation, and average hourly wage.

To get a list of an occupation's employment by state using Excel I use the filter tool to select my target occupation. This is most easily done by selecting a cell in the header row and choosing Auto-filter from the Data menu. Then I choose a value from the OCC_CODE column's filter menu. In the initial test I will select code 15-0000, which identifies the Computer and mathematical occupations.

I can now attempt to show the correlation by using a scatter plot. The scatter plot show how much one variable is affected by another. My question implied that if the number of people employed in an occupation is high the salaries might trend higher. The scatter plot shows the groupings correlate somewhat and the CORREL function actually show a fairly good correlation coefficient of 0.61.

But, what if I chose the percentage of the State's population employed in the occupation instead the mere count? Would that indicate a stronger positive or negative correlation? For the Computer and Math Occupation group an even stronger correlation was achieved, 0.84! I might be on to something here.

The problem you might see in this representation is that one cannot see the State values individually (they are reduced to a point representation on the scatterplot. Is that a big problem? I don't know. Oe can visually see the positive correlation by recognizing the tight grouping of the points in a straight-ish ascending diagonal line. The question is about the trend, but individual values might be useful. How can we do this? I will address this in my next post.

Labels:
computer science,
data visualization,
grad school

## Thursday, March 6, 2008

### 605.462 - Homework #3 Web Notebook - Part 2

My original question may have been a little immature. I wanted to know what effect the growth of the computer/information science industry had on the annual income of other industry workers. After a little analysis, I found that this might be difficult to represent in my visualization.

I think that I will now try to show if there is a correlation between the number of employees in computer and mathematical occupations and the salaries paid to them.

HYPOTHESIS

My hypothesis is that States with higher numbers of these types of employees pay them higher wages due to greater competition.

A possibly more interesting question is what major categories show a positive correlation (possibly due to competition) an which States show negative correlations (possibly due to over saturation)?

I think that I will now try to show if there is a correlation between the number of employees in computer and mathematical occupations and the salaries paid to them.

HYPOTHESIS

My hypothesis is that States with higher numbers of these types of employees pay them higher wages due to greater competition.

A possibly more interesting question is what major categories show a positive correlation (possibly due to competition) an which States show negative correlations (possibly due to over saturation)?

Labels:
computer science,
data visualization,
grad school

## Monday, March 3, 2008

### 605.462 - Homework #3 Web Notebook - Part 1

This is my first "web notebook" entry with regard to homework #3 in Dr. Chlan's Data Visualization class. In this assignment I am supposed to formulate a question that can be expressed in a visualization.

This log is going to show what I had to do to construct the visualization.

First, I have to formulate the question by performing the following steps:

1) download one or more databases from U.S. Department of Labor's Bureau of Labor Statistics

2) pose a question that should be expressed in that data

The databases we were directed to were a set of Occupation Employment and Wage Estimates. I selected the 2002 - 2006 "State Cross-Industry estimates." These databases contains the total number of persons employed and their average annual salaries by occupation within each state.

The question I would like to ask is "Is there a relationship between the growth of the computer and mathematical science occupation and a higher overall average annual salary?" The "growth of the computer and mathematical science occupation" shall be measured by indicating an increase in total persons employed in the occupation.

Next, I have to do the following:

This log is going to show what I had to do to construct the visualization.

First, I have to formulate the question by performing the following steps:

1) download one or more databases from U.S. Department of Labor's Bureau of Labor Statistics

2) pose a question that should be expressed in that data

The databases we were directed to were a set of Occupation Employment and Wage Estimates. I selected the 2002 - 2006 "State Cross-Industry estimates." These databases contains the total number of persons employed and their average annual salaries by occupation within each state.

The question I would like to ask is "Is there a relationship between the growth of the computer and mathematical science occupation and a higher overall average annual salary?" The "growth of the computer and mathematical science occupation" shall be measured by indicating an increase in total persons employed in the occupation.

Next, I have to do the following:

- format of the dataset
- select the visualization system
- design and perform the transformations required to work with the visualization system

Labels:
computer science,
data visualization,
grad school

Subscribe to:
Posts (Atom)