Real estate price prediction using machine learning
A trainable model for predicting property prices
The client partnered with Yellow to create a machine learning solution for conducting predictive analysis of real estate prices. The task for the ML real estate model was to predict what the price of a real estate property will be in a month.
We were responsible for
- Building a machine learning model from scratch
- Project manager
- Data scientist
The US real estate market in numbers
housing units were for sale in 2021
of buyers find their new home online
is the homeownership rate in the United States
new houses were sold in the USA in 2021
is the average sales price of new homes
is the average number of days houses stay on the market
Factors that influence the real estate price
The home’s size
The home’s age
Renovations and repairs
Real estate price prediction: Professional opinion
Defining the property value for a real estate property is a complex process in the USA. In addition to using the available online tools to determine the value by yourself, there are also professional opinions that should be taken into account.
Comparative Market Analysis
Real estate professionals look for comparable homes in the area and define the value of a property based on how the market behavior of these properties. Comparable homes are chosen based on size, number of rooms, style, and recent sales price. Usually, this information can be found using a Multiple Listing Service (MLS), which is a database with a list of properties for sale.
Broker Price Opinion
A Broker Price Opinion (BPO) is another option for a person to get a professional opinion on a property. This can be an external or internal opinion and is usually issued by a professional broker who is familiar with the local market. This option is common for short sales, foreclosures, and providing buyers and sellers with a listing price.
Artificial intelligence and machine learning in real estate
The real estate industry is implementing artificial intelligence to enhance its predictive ability and improve performance.
Specialists use artificial intelligence and machine learning in commercial real estate to predict what properties will be for sale in 12 months so that realtors can efficiently hunt for new listings to meet buyers’ demands.
Natural language processing
The analysis language used in property descriptions can help in defining real estate prices since the most popular words are different for cheap and expensive properties.
According to McKinsey, nearly 60% of predictive power can be achieved by using nontraditional variables like proximity to luxury hotels or the number of coffee shops within a mile.
More possible machine learning use cases in real estate include:
Automated property management
“Smart home” systems
AI-augmented customer service
Enhanced matching of sellers and buyers
To conduct predictive analysis of real estate prices, we created a machine learning model. Here is the strategy we used to develop it.
In addition to the datasets provided by the client, we also added supplemental data samples to increase the model’s precision.
The features we used for training the model are historical changes of real estate prices, property locations, type of houses, neighbors, presence or absence of a pool, and other nontraditional variables.
We set up and tuned the necessary hyperparameters to validate the model's results, control its behavior, and maximize performance.
During the model training, we assessed and reevaluated the significance of each variable we were using for the model.
After the first version of the model was complete, we repeated the process to achieve maximum efficiency.
An XGBoost for Regression model was chosen for building a solution. As a result, we were able to achieve an accuracy of 91%, matching the client’s expectations.
We used the following technologies to develop machine learning for real estate.
Challenges and Solutions
Poor Initial Data
Unfortunately, the dataset used for this project was not sufficient for training the model. It also contained several mistakes that could influence the final results the model would provide.
Our specialists put all their effort into complementing the dataset with the necessary data. We used several available sources of information related to the real estate market in the US, as well as data related to the country’s economic conditions, in order to achieve a more representative data set. In this way, we improved the quality of the data for more accurate results.
Due to the initial quality of the dataset, we faced an issue with underfitting, namely that the model couldn’t find the underlying trends in the present data and provide accurate results.
In addition to adding more quantitative elements to the dataset, we also improved its quality by adding more relevant features. This enabled us to overcome the underfitting issue.