Buriihenry
3 min readMar 21, 2023

--

Salary Prediction for Software Developers

Photo by Radowan Nakif Rehan on Unsplash

Software developers are in high demand, and salaries can vary depending on several factors, including experience, location, and skillset. As a result, it can be challenging for software developers to know what to expect when negotiating their salaries. In this write-up, we will present a salary predictor web app using Stack Overflow’s developer survey dataset, which contains information about developers’ salaries and various factors that could influence them.

Data Collection and Preparation: Stack Overflow conducts an annual developer survey that collects data on various aspects of developers’ work, including demographics, employment status, education, experience, and salary. We will use the 2021 survey dataset, which is available on their website.

Link for this dataset: https://insights.stackoverflow.com/survey

Data Preprocessing:

Before we can build the model, we need to preprocess the data to prepare it for training. The Stack Overflow dataset contains information about software developers’ skills, experience, location, and salary. We will clean the dataset by removing any missing or irrelevant data. We will also convert categorical variables into numerical values using Label encoder.

Model Building:

We will use three different algorithms to build our salary predictor model:

  1. Linear Regression: We will start by building a simple linear regression model. Linear regression is a straightforward algorithm that uses a linear equation to predict the target variable. We will use the scikit-learn library to implement this algorithm.
  2. Random Forest: Random Forest is a more complex algorithm that uses multiple decision trees to make predictions. It is known for its high accuracy and ability to handle large datasets. We will use the scikit-learn library to implement this algorithm.
  3. Decision Trees: Decision Trees are another popular algorithm for regression problems. They are easy to interpret and can handle both categorical and numerical data. We will use the scikit-learn library to implement this algorithm.

Model Evaluation:

Once we have built our models, we need to evaluate their performance. We will use the mean squared error (MSE) and R-squared values to evaluate the models. The MSE measures the average squared difference between the predicted values and the actual values, while the R-squared value measures how well the model fits the data.

Deployment:

We will deploy our model using a Streamlit web app. Streamlit is an open-source library that allows us to create interactive web apps with Python. We will create a simple web interface where users can input their skills, experience, and location, and the app will predict their salary range using our trained models. We will also include some data visualizations to help users understand the factors that influence their predicted salary.

import streamlit as st
from predict_page import show_predict_page
from explore_page import show_explore_page


page = st.sidebar.selectbox("Select to Explore Or Predict", ("Predict", "Explore"))

if page == "Predict":
show_predict_page()
else:
show_explore_page()

Link to the complete project can be found here: https://github.com/buriihenry/Salary-Prediction-for-Software-Engineers

--

--