Sven Koppany Manager • over 6 years ago
Urban Institute Challenge + PRFAQ
CHALLENGE
The Urban Institute is hoping to create affordable housing roadmaps for cities struggling with displacement and gentrification. Unfortunately, they lack the foundational data needed on the kinds of buildings in different cities. They need a dataset that will allow research organizations and cities to create data-driven, affordable housing plans, monitor neighborhood change, and possibly create early warning systems for gentrification and displacement. We challenge you to create a generalizable methodology that takes satellite, LIDAR, and building footprint data input and outputs predicted building heights. https://github.com/UI-Research/bldg-heights-aws-hackathon
PRESS RELEASE
**New Machine Learning Methods Unlock Key Data for Affordable Housing Reform**
December 2, 2020—Shirley Green has lived in the same row house in Washington, DC’s Petworth neighborhood for the past 40 years. She and her children attended the same schools, played in the same playgrounds, and ate at the same restaurants. But this year, her rent has skyrocketed, and she may have to move out of her childhood home. Facing the looming threat of displacement, she and her fellow neighbors formed the Petworth Neighborhood Association to advocate for more protections for long-term residents. Using newly released public building data from the Urban Institute, the association advocated for and successfully won more affordable housing production and inclusive zoning policies. The association explained, “While we had anecdotal reports of new condo development and displacement of existing residents, we previously couldn’t quantify this neighborhood change, compare across neighborhoods, and justify our demands to decision makers. Urban’s data gave us the power to effectively advocate for ourselves!”
The national building height dataset was released publicly last month by the Urban Institute. In conjunction with Amazon Web Services’ Hackathon participants, they created a novel machine learning approach to generate building height data from input satellite data. Using this methodology, they calculated building heights for all cities in the United States and made the resulting dataset publicly available. The dataset truly democratizes data access and allows anyone to participate in the conversation around planning for housing equity and affordability. Previously, cities surprisingly didn’t have a good sense of what kind of buildings were in their jurisdictions. And although some of the largest cities, such as New York City, could afford to commission a building height dataset, most other cities and rural jurisdictions simply did not have the resources or data expertise. This prevented cities from developing detailed affordable housing plans and made it difficult for residents to understand how their neighborhoods were changing. These new data change all of that.
Over the past few months, there has been a sharp uptick in the number of cities that have released detailed affordable housing plans using Urban’s new building data. Usually, these reports are a time-intensive and costly undertaking for city planning departments. But according to Rob Velazquez, a city planner for the City of Memphis, “The open-source building height data has changed the game. We now have the foundational data needed to create accurate affordable housing roadmaps. We know now where and how to make investments in housing affordability at a regional scale. What used to be a process of mostly guesswork is now an accurate, efficient, data-driven enterprise!”
More impressively, because the underlying methodology is based on frequently updated satellite data, Urban researchers estimate they can update the building height data once a year and provide these data for free on an ongoing basis. This unlocks the possibility for real-time warning systems for displacement and gentrification that identify rapidly changing neighborhoods like Petworth. And as Petworth residents have proven, these data can truly change lives for the better.
FAQ
*This seems hard! How exactly can we predict building heights?*
On the simplest level, this is a data fusion task. We have data on building footprints and heights of points (LIDAR data) throughout the city. You simply need to come up with a smart merging scheme to combine these two datasets. Note that some buildings may have a lot of overlapping LIDAR points, but some will only have a few. You will have to think critically about whether to take the min/max/avg of the overlapping point heights, whether you should buffer the building footprints, etc. If you want you can complicate this task by building machine learning models / neural nets to predict building heights from input data, and/or use the satellite imagery data in unique/innovative ways, but that is not required for a successful submission.
*Are we required to use all three datasets listed above?*
No, in fact we expect that most teams won't. We do expect that everyone will use the building footprint data as those are the buildings we would like you to predict heights for. Exactly how you get the building heights is up to you and is left intentionally open ended, we've just provided the LIDAR data and the satellite imagery data as good reference starting points.
*What additional datasets can we bring in to help us with this task?*
Feel free to find and use any additional datasets that you believe will help you with this task. Keep in mind however that our goal is to generalize the techniques you develop to create building heights datasets for all cities across the US. So keep the auxiliary datasets you bring in as generalizable as possible. It is also fine if you don't bring in any auxiliary datasets and only work with the datasets we provide above.
*How can we check the accuracy of our predicted building heights?*
As mentioned above, the building footprint data we provide to you in this repo is a modified version of the original dataset from the DC Open Data Portal. The original dataset actually has the building heights appended. You can use the original dataset as test data to see how accurate your predictions are and to help you fine tune your approaches.
*If a building has a chimney that extends above the height of the roof (or another protrusion of that kind) should we predict the height of the chimney or the height of the roof?*
This is a judgment call. we recommend predicting the height of the roof and not the protrusion. However if the protrusion is very large and covers most of the roof, then it might make sense to predict the height of the protrusion. This might involve coming up with rules based on the area of the protrusion. You can use the original building height dataset (which has actual building heights) to guide your decisions
*If a building has multiple roof heights (eg part of the building is taller that another part) do you want us to predict the maximum height of the building? Average height? Or the separate height of each part of the building?*
Again this is a judgment call. We want only one height measurement for every building footprint. We think it's a safe bet to predict the height of the largest area of the roof, but again you can test different rules to see which is most accurate.
*Was every building that was originally captured in 2005 updated in 2010 in the building heights data?*
No, not necessarily. Only some of the buildings that the third party contract deemed as significantly changed were updated using 2010 imagery. But to make thing simpler you can assume that these are accurate building footprints as of 2015.
*Can we use the actual heights from the original footprint data in a training set to predict the heights in a test set sampled from the footprint data?*
Yes, feel free to use supervised learning approaches and use the original footprint data as a training set.
Comments are closed.
0 comments