Explore Socrata data with ease.
socratadata
provides an easy-to-use interface for downloading data from Socrata open data portals powered by Rust. socratadata
improves upon the existing RSocrata
package by introducing support for the Socrata Discovery API and all Socrata datatypes.
Unlike RSocrata
, socratadata
does not support uploading or editing existing datasets.
Installation
You can install the development version of socratadata from GitHub with:
# install.packages("pak")
pak::pak("ryanzomorrodi/socratadata")
Example
Search for datasets
Use soc_discover()
to explore the datasets available on a Socrata data portal.
library(socratadata)
chi_datasets <- soc_discover(
domains = "https://data.cityofchicago.org",
only = "dataset"
)
print(chi_datasets)
#> # A tibble: 877 × 21
#> id name attribution owner_name provenance description created
#> <chr> <chr> <chr> <chr> <chr> <chr> <dttm>
#> 1 xzkq… Curr… City of Ch… cocadmin official "This data… 2011-09-27 00:00:00
#> 2 ijzp… Crim… Chicago Po… cocadmin official "This data… 2011-09-30 00:00:00
#> 3 ydr8… Buil… City of Ch… cocadmin official "This data… 2011-09-30 00:00:00
#> 4 85ca… Traf… City of Ch… Jonathan … official "Crash dat… 2017-10-19 00:00:00
#> 5 s6ha… Affo… City of Ch… cocadmin official "The renta… 2013-03-14 00:00:00
#> 6 4ijn… Food… City of Ch… cocadmin official "This info… 2011-08-08 00:00:00
#> 7 2ft4… Lobb… City of Ch… cocadmin official "All lobby… 2011-06-07 00:00:00
#> 8 i6bp… Chic… City of Ch… cocadmin official "List of a… 2010-12-22 00:00:00
#> 9 kn9c… Cens… U.S. Censu… Jamyia official "This data… 2012-01-05 00:00:00
#> 10 z8bn… Poli… Chicago Po… cocadmin official "Chicago P… 2010-12-22 00:00:00
#> # ℹ 867 more rows
#> # ℹ 14 more variables: data_last_updated <dttm>, metadata_last_updated <dttm>,
#> # categories <list>, tags <list>, domain_category <chr>, domain_tags <list>,
#> # domain_metadata <list>, column_names <list>, column_labels <list>,
#> # column_datatypes <list>, column_descriptions <list>, permalink <chr>,
#> # link <chr>, license <chr>
Or even search by category across many Socrata data portals.
transportation_datasets <- soc_discover(
categories = "transportation",
only = "dataset"
)
print(chi_datasets)
#> # A tibble: 877 × 21
#> id name attribution owner_name provenance description created
#> <chr> <chr> <chr> <chr> <chr> <chr> <dttm>
#> 1 xzkq… Curr… City of Ch… cocadmin official "This data… 2011-09-27 00:00:00
#> 2 ijzp… Crim… Chicago Po… cocadmin official "This data… 2011-09-30 00:00:00
#> 3 ydr8… Buil… City of Ch… cocadmin official "This data… 2011-09-30 00:00:00
#> 4 85ca… Traf… City of Ch… Jonathan … official "Crash dat… 2017-10-19 00:00:00
#> 5 s6ha… Affo… City of Ch… cocadmin official "The renta… 2013-03-14 00:00:00
#> 6 4ijn… Food… City of Ch… cocadmin official "This info… 2011-08-08 00:00:00
#> 7 2ft4… Lobb… City of Ch… cocadmin official "All lobby… 2011-06-07 00:00:00
#> 8 i6bp… Chic… City of Ch… cocadmin official "List of a… 2010-12-22 00:00:00
#> 9 kn9c… Cens… U.S. Censu… Jamyia official "This data… 2012-01-05 00:00:00
#> 10 z8bn… Poli… Chicago Po… cocadmin official "Chicago P… 2010-12-22 00:00:00
#> # ℹ 867 more rows
#> # ℹ 14 more variables: data_last_updated <dttm>, metadata_last_updated <dttm>,
#> # categories <list>, tags <list>, domain_category <chr>, domain_tags <list>,
#> # domain_metadata <list>, column_names <list>, column_labels <list>,
#> # column_datatypes <list>, column_descriptions <list>, permalink <chr>,
#> # link <chr>, license <chr>
Download data
Use soc_read()
to read a socrata dataset into R.
cta_ridership <- soc_read(
"https://data.cityofchicago.org/Transportation/CTA-Ridership-Daily-Boarding-Totals/6iiy-9s97/about_data"
)
print(cta_ridership)
#> # A tibble: 8,766 × 5
#> service_date day_type bus rail_boardings total_rides
#> <dttm> <chr> <dbl> <dbl> <dbl>
#> 1 2001-01-01 00:00:00 U 297192 126455 423647
#> 2 2001-01-02 00:00:00 W 780827 501952 1282779
#> 3 2001-01-03 00:00:00 W 824923 536432 1361355
#> 4 2001-01-04 00:00:00 W 870021 550011 1420032
#> 5 2001-01-05 00:00:00 W 890426 557917 1448343
#> 6 2001-01-06 00:00:00 A 577401 255356 832757
#> 7 2001-01-07 00:00:00 U 375831 169825 545656
#> 8 2001-01-08 00:00:00 W 985221 590706 1575927
#> 9 2001-01-09 00:00:00 W 978377 599905 1578282
#> 10 2001-01-10 00:00:00 W 984884 602052 1586936
#> # ℹ 8,756 more rows
Spatial data will be read as an sf
object.
chi_community_areas <- soc_read(
"https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas/igwz-8jzy/about_data"
)
print(chi_community_areas)
#> Simple feature collection with 77 features and 5 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -87.94011 ymin: 41.64454 xmax: -87.52414 ymax: 42.02304
#> Geodetic CRS: WGS 84
#> # A tibble: 77 × 6
#> the_geom area_numbe community area_num_1 shape_area
#> * <MULTIPOLYGON [°]> <dbl> <chr> <chr> <dbl>
#> 1 (((-87.65456 41.99817, -87.65574 … 1 ROGERS P… 1 51259902.
#> 2 (((-87.68465 42.01948, -87.68464 … 2 WEST RID… 2 98429095.
#> 3 (((-87.64102 41.9548, -87.644 41.… 3 UPTOWN 3 65095643.
#> 4 (((-87.67441 41.9761, -87.6744 41… 4 LINCOLN … 4 71352328.
#> 5 (((-87.67336 41.93234, -87.67342 … 5 NORTH CE… 5 57054168.
#> 6 (((-87.64102 41.9548, -87.64101 4… 6 LAKE VIEW 6 87214799.
#> 7 (((-87.63182 41.93258, -87.63182 … 7 LINCOLN … 7 88316400.
#> 8 (((-87.62446 41.91157, -87.62459 … 8 NEAR NOR… 8 76675896.
#> 9 (((-87.80676 42.00084, -87.80676 … 9 EDISON P… 9 31636314.
#> 10 (((-87.78002 41.99741, -87.78049 … 10 NORWOOD … 10 121959105.
#> # ℹ 67 more rows
#> # ℹ 1 more variable: shape_len <dbl>
You can even perform complex queries using Socrata Query Language (SoQL) via soc_query()
.
lower_west_side <- soc_read(
"https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-Community-Areas/igwz-8jzy/about_data",
query = soc_query(
where = "community LIKE 'LOWER WEST SIDE'"
)
)
print(lower_west_side)
#> Simple feature collection with 1 feature and 5 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: -87.68807 ymin: 41.8348 xmax: -87.63516 ymax: 41.86002
#> Geodetic CRS: WGS 84
#> # A tibble: 1 × 6
#> the_geom area_numbe community area_num_1 shape_area shape_len
#> * <MULTIPOLYGON [°]> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 (((-87.63516 41.85772, -… 31 LOWER WE… 31 81550724. 43229.
trips_to_lws_by_ca <- soc_read(
"https://data.cityofchicago.org/Transportation/Taxi-Trips-2013-2023-/wrvz-psew/about_data",
query = soc_query(
select = "pickup_community_area, count(*) as n",
where = glue::glue(
"within_polygon(dropoff_centroid_location, '{sf::st_as_text(lower_west_side$the_geom)}')"
),
group_by = "pickup_community_area",
order_by = "n DESC"
)
)
print(trips_to_lws_by_ca)
#> # A tibble: 78 × 2
#> pickup_community_area n
#> <dbl> <dbl>
#> 1 32 127474
#> 2 8 113797
#> 3 28 90983
#> 4 31 39509
#> 5 24 37789
#> 6 33 22793
#> 7 76 18006
#> 8 6 15160
#> 9 56 14142
#> 10 7 12191
#> # ℹ 68 more rows
Extract metadata
Access a dataset’s metadata using soc_metadata()
.
cta_ridership_meta <- soc_metadata(cta_ridership)
print(cta_ridership_meta)
#> ID: 6iiy-9s97
#> Name: CTA - Ridership - Daily Boarding Totals
#> Attribution: Chicago Transit Authority
#> Owner: CTA
#> Provenance: official
#> Description: This dataset shows systemwide boardings for both bus and rail
#> services provided by CTA, dating back to 2001. Daytypes are as follows: W =
#> Weekday, A = Saturday, U = Sunday/Holiday. See attached readme file for
#> information on how these numbers are calculated.
#> Created: 2011-08-12 15:40:31
#> Data last updated: 2025-04-29 16:34:39
#> Metadata last Updated: 2025-04-29 16:35:04
#> Domain Category: Transportation
#> Domain Tags: cta, public transit, and ridership
#> Domain fields:
#> • Data Owner: Chicago Transit Authority
#> Columns:
#> # A tibble: 5 × 4
#> column_name column_label column_datatype column_description
#> <chr> <chr> <chr> <chr>
#> 1 service_date service_date calendar_date ""
#> 2 day_type day_type text ""
#> 3 bus bus number ""
#> 4 rail_boardings rail_boardings number ""
#> 5 total_rides total_rides number ""
#> Permalink: https://data.cityofchicago.org/d/6iiy-9s97
#> Link:
#> https://data.cityofchicago.org/Transportation/CTA-Ridership-Daily-Boarding-Totals/6iiy-9s97
#> License: See Terms of Use
Or explore a dataset’s metadata using it’s url.
taxi_trips_meta <- soc_metadata(
"https://data.cityofchicago.org/Transportation/CTA-Ridership-Daily-Boarding-Totals/6iiy-9s97/about_data"
)
print(taxi_trips_meta)
#> ID: 6iiy-9s97
#> Name: CTA - Ridership - Daily Boarding Totals
#> Attribution: Chicago Transit Authority
#> Owner: CTA
#> Provenance: official
#> Description: This dataset shows systemwide boardings for both bus and rail
#> services provided by CTA, dating back to 2001. Daytypes are as follows: W =
#> Weekday, A = Saturday, U = Sunday/Holiday. See attached readme file for
#> information on how these numbers are calculated.
#> Created: 2011-08-12 15:40:31
#> Data last updated: 2025-04-29 16:34:39
#> Metadata last Updated: 2025-04-29 16:35:04
#> Domain Category: Transportation
#> Domain Tags: cta, public transit, and ridership
#> Domain fields:
#> • Data Owner: Chicago Transit Authority
#> Columns:
#> # A tibble: 5 × 4
#> column_name column_label column_datatype column_description
#> <chr> <chr> <chr> <chr>
#> 1 service_date service_date calendar_date ""
#> 2 day_type day_type text ""
#> 3 bus bus number ""
#> 4 rail_boardings rail_boardings number ""
#> 5 total_rides total_rides number ""
#> Permalink: https://data.cityofchicago.org/d/6iiy-9s97
#> Link:
#> https://data.cityofchicago.org/Transportation/CTA-Ridership-Daily-Boarding-Totals/6iiy-9s97
#> License: See Terms of Use