Getting Started¶
This module provides cached loading of open datasets from Faculty. To view the available datasets:
[1]:
from faculty_extras import opendata
[2]:
opendata.ls()
[2]:
['higgs_boson/README.md',
'higgs_boson/higgs.csv',
'higgs_boson/higgs_test.csv',
'higgs_boson/higgs_train.csv',
'higgs_boson/higgs_validate.csv',
'tutorials/supermarkets/.ipynb_checkpoints/modify-checkpoint.ipynb',
'tutorials/supermarkets/README.md',
'tutorials/supermarkets/lidl.csv',
'tutorials/supermarkets/waitrose.csv',
'uk_2011_census/census_by_outputarea.csv',
'uk_2011_census/census_variable_info.csv',
'uk_2011_census/outputarea_localauthority_mapping.csv',
'uk_2011_census/outputarea_lsoa_msoa_mapping.csv',
'uk_2011_census/outputarea_parliamentaryconstituency_mapping.csv',
'uk_2011_census/postcode_outputarea_mapping.csv',
'uk_2011_census/ukpostcodes.csv',
'uk_statistical_boundaries/geojson/local_authorities.json',
'uk_statistical_boundaries/geojson/lower_super_output_areas.json',
'uk_statistical_boundaries/geojson/middle_super_output_areas.json',
'uk_statistical_boundaries/geojson/output_areas.json',
'uk_statistical_boundaries/geojson/parliamentary_constituencies.json',
'uk_statistical_boundaries/topojson/uk_statistical_boundaries.json',
'us_flights/README.md',
'us_flights/us_flights_1987.csv',
'us_flights/us_flights_1988.csv',
'us_flights/us_flights_1989.csv',
'us_flights/us_flights_1990.csv',
'us_flights/us_flights_1991.csv',
'us_flights/us_flights_1992.csv',
'us_flights/us_flights_1993.csv',
'us_flights/us_flights_1994.csv',
'us_flights/us_flights_1995.csv',
'us_flights/us_flights_1996.csv',
'us_flights/us_flights_1997.csv',
'us_flights/us_flights_1998.csv',
'us_flights/us_flights_1999.csv',
'us_flights/us_flights_2000.csv',
'us_flights/us_flights_2001.csv',
'us_flights/us_flights_2002.csv',
'us_flights/us_flights_2003.csv',
'us_flights/us_flights_2004.csv',
'us_flights/us_flights_2005.csv',
'us_flights/us_flights_2006.csv',
'us_flights/us_flights_2007.csv',
'us_flights/us_flights_2008.csv',
'us_flights/us_flights_dtypes.json']
To load one of the datasets into a pandas DataFrame:
[3]:
df = opendata.load('uk_2011_census/census_by_outputarea.csv')
[4]:
df.head()
[4]:
OA | Total_Population | Total_Households | Total_Dwellings | Total_Household_Spaces | Total_Population_16_and_over | Total_Population_16_to_74 | Total_Pop_No_NI_Students_16_to_74 | Total_Employment_16_to_74 | Total_Pop_in_Housesholds_16_and_over | ... | u158 | u159 | u160 | u161 | u162 | u163 | u164 | u165 | u166 | u167 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | E00000001 | 194 | 99 | 115 | 115 | 173 | 148 | 148 | 102 | 173 | ... | 6 | 18 | 57 | 14 | 9 | 2 | 2 | 0 | 0 | 0 |
1 | E00000003 | 250 | 112 | 125 | 125 | 218 | 199 | 199 | 147 | 218 | ... | 10 | 24 | 74 | 32 | 6 | 2 | 1 | 2 | 1 | 5 |
2 | E00000005 | 367 | 217 | 241 | 241 | 337 | 304 | 304 | 241 | 337 | ... | 16 | 37 | 117 | 52 | 12 | 7 | 9 | 3 | 0 | 4 |
3 | E00000007 | 123 | 83 | 103 | 103 | 113 | 111 | 111 | 86 | 113 | ... | 4 | 18 | 36 | 20 | 9 | 0 | 2 | 0 | 0 | 1 |
4 | E00000010 | 102 | 78 | 79 | 79 | 97 | 86 | 86 | 59 | 97 | ... | 12 | 11 | 16 | 16 | 6 | 0 | 5 | 0 | 0 | 5 |
5 rows × 178 columns
That’s it! The data will be cached on disk so as to not download it again. In addition, it will be cached in memory for performance. If the file gets updated on Faculty, this module ensures you always have the latest version.