Raw Data


๐Ÿ”—One TopoJSON Dataset

Basic US State Map - D3 - In order to implement U.S. state boundaries, I used the D3 data by author Michelle Chandra.

Data license was requested from the author's GitHub page.

Data Size
Total rows: NA
Total columns: NA
File size: 21KB

๐Ÿ”—Two CSV Datasets

In order to analyze the trend of the real estate market from different perspectives, I collected the data from Zillow Research Data.

Data license can be found here.

More specifically, I chose the subject of Home Listings and Sales from the website data page. I downloaded the dataset by filtering "Data Type" as "Median Sale Price - Seasonally Adjusted ($)" and "Geography" as "Metro & U.S.".

At the same time, I also chose Rental Listings from the website as my second analysis target. I downloaded the dataset by filtering "Home Type" as "Median List Price ($) - 2-Bedroom" and "Geography" as "Metro & U.S.".

Home Listings and Sales - "Zillow provides data on sold homes, including median sale price for various housing types, sale counts (for which thereโ€™s detailed methodology), and foreclosures provided as a share of all sales in which the home was previously foreclosed upon. There is also current and historical for-sale listings data, ranging from median list prices and inventory counts to share of listings with a price cut, median price cut size, age of inventory, and the days a listing spent on Zillow before the sale was final. Inventory and other housing data also are available for local markets."

Data Size
Total rows: 493
Total columns: 112
File size: 340KB

Rental Listings - "Zillow Rent Index (ZRI): A smoothed measure of the median estimated market rate rent across a given region and housing type. ZRI is a dollar-denominated alternative to repeat-rent indices. Zillow publishes ZRI and other housing data for local markets as well as its ZRI methodology."

Data Size
Total rows: 358
Total columns: 86
File size: 119KB

**According to the description of both datasets, Zillow has already done the seasonal adjustment by using The X-13ARIMA-SEATS Seasonal Adjustment from United States Census Bureau.


Raw Data Processing


1. D3 Map with States and Countries

a) I converted the raw data "D3 Map with States and Countries" from .topojson into .geojson format by using this online converter by Ian Johnson.
b) I manually edited all the full names of U.S. states into abbreviations by using Notepad ++. For example, "California" -> "CA". I do so in order to combine "State" column in the us_states.geojson file with my other 2 datasets from Zillow.

(file source: us_states.geojson)

2. Home Listings and Sales

a) I parsed one column with the format of [City, State] into two separate columns named "city" and "state" by using Python Pandas library. For example, [San Francisco, CA] -> [San Francisco], [CA].
b) In order to generate "longitude" and "latitude" columns, I imported the two datasets on Tableau, and the "longidue" and "latitude" are generated.
c) I generated one more "TenYrSalesDiff" column, which stands for ten-year U.S. house sales price difference of each city in the U.S., by using the difference between each house price from Dec 2011 and each house price from Mar 2009.
d) Moreover, in order to generate each city's yearly sales price difference, I generated 9 more columns, which are year by year sales price differences from Dec 2011 to Mar 2019.
e) For the second visualization, I generated the tenYrGrowRate, which stands for a ten-year U.S. sales price growth rate.
How to Calculate Growth Rate
Let:
P(A) = House A's Price 10 Years Before
g = Growth Rate in 10 Years
P'(A) = Present House A's Price
๐Ÿ‘‰๐ŸผPrice of House A after 10 Years is: P'(A) = P(A)*(1 + g)%
๐Ÿ‘‰๐ŸผFormula - Growth Rate, g, of House A after 10 Years is: g = P'(A)/P(A)*100 - 100


(file source: SalesDifference.csv)

3. Rental Listings

Since this dataset has overlapped columns with 2. Home Listings and Sales, all processing steps are the same as above.

(file source: RentalDifference.csv)