Real estate price prediction using machine learning

A trainable model for predicting property prices

Time:

3+ months

Type:

Machine learning

Industry:

Real estate

About

The Product

The client partnered with Yellow to create a machine learning solution for conducting predictive analysis of real estate prices. The task for the ML real estate model was to predict what the price of a real estate property will be in a month.

We were responsible for

Building a machine learning model from scratch

Project team

Project manager
Data scientist

The US real estate market in numbers

726,000

housing units were for sale in 2021

50%

of buyers find their new home online

65.5%

is the homeownership rate in the United States

770,000

new houses were sold in the USA in 2021

$453,700

is the average sales price of new homes

38

is the average number of days houses stay on the market

YWS > Works > CaseStudy > ML in Real Estate > Market In Numbers > Image

Factors that influence the real estate price

The home’s size

The home’s age

Location

Market conditions

Economic conditions

Renovations and repairs

Real estate price prediction: Professional opinion

Defining the property value for a real estate property is a complex process in the USA. In addition to using the available online tools to determine the value by yourself, there are also professional opinions that should be taken into account.

Comparative Market Analysis

Real estate professionals look for comparable homes in the area and define the value of a property based on how the market behavior of these properties. Comparable homes are chosen based on size, number of rooms, style, and recent sales price. Usually, this information can be found using a Multiple Listing Service (MLS), which is a database with a list of properties for sale.

Broker Price Opinion

A Broker Price Opinion (BPO) is another option for a person to get a professional opinion on a property. This can be an external or internal opinion and is usually issued by a professional broker who is familiar with the local market. This option is common for short sales, foreclosures, and providing buyers and sellers with a listing price.

Artificial intelligence and machine learning in real estate

The real estate industry is implementing artificial intelligence to enhance its predictive ability and improve performance.

AI-powered hunting

Specialists use artificial intelligence and machine learning in commercial real estate to predict what properties will be for sale in 12 months so that realtors can efficiently hunt for new listings to meet buyers’ demands.

Natural language processing

The analysis language used in property descriptions can help in defining real estate prices since the most popular words are different for cheap and expensive properties.

Nontraditional variables

According to McKinsey, nearly 60% of predictive power can be achieved by using nontraditional variables like proximity to luxury hotels or the number of coffee shops within a mile.

More possible machine learning use cases in real estate include:

Automated property management
“Smart home” systems
AI-augmented customer service
Enhanced matching of sellers and buyers

Our solution

To conduct predictive analysis of real estate prices, we created a machine learning model. Here is the strategy we used to develop it.

Gathering data

In addition to the datasets provided by the client, we also added supplemental data samples to increase the model’s precision.

Feature engineering

The features we used for training the model are historical changes of real estate prices, property locations, type of houses, neighbors, presence or absence of a pool, and other nontraditional variables.

Hyperparametrs tuning

We set up and tuned the necessary hyperparameters to validate the model's results, control its behavior, and maximize performance.

Studying variables

During the model training, we assessed and reevaluated the significance of each variable we were using for the model.

Iterating

After the first version of the model was complete, we repeated the process to achieve maximum efficiency.

An XGBoost for Regression  model was chosen for building  a solution. As a result, we were able to achieve an accuracy of 91%, matching the client’s expectations.

Technology Stack

We used the following technologies to develop machine learning for real estate.

Challenges and Solutions

Poor Initial Data

Challenge:

Unfortunately, the dataset used for this project was not sufficient for training the model. It also contained several mistakes that could influence the final results the model would provide.

Solution:

Our specialists put all their effort into complementing the dataset with the necessary data. We used several available sources of information related to the real estate market in the US, as well as data related to the country’s economic conditions, in order to achieve a more representative data set. In this way, we improved the quality of the data for more accurate results.

Underfitting

Challenge:

Due to the initial quality of the dataset, we faced an issue with underfitting, namely that the model couldn’t find the underlying trends in the present data and provide accurate results.

Solution:

In addition to adding more quantitative elements to the dataset, we also improved its quality by adding more relevant features. This enabled us to overcome the underfitting issue.

Real estate price prediction using machine learning

The Product

The US real estate market in numbers

726,000

50%

65.5%

770,000

$453,700

38

Factors that influence the real estate price

The home’s size

The home’s age

Location

Market conditions

Economic conditions

Renovations and repairs

Real estate price prediction: Professional opinion

Comparative Market Analysis

Broker Price Opinion

Artificial intelligence and machine learning in real estate

AI-powered hunting

Natural language processing

Nontraditional variables

More possible machine learning use cases in real estate include:

Our solution

Gathering data

Feature engineering

Hyperparametrs tuning

Studying variables

Iterating

Technology Stack

Challenges and Solutions

Poor Initial Data

Challenge:

Solution:

Underfitting

Challenge:

Solution:

Results