Scraping Mortgage Rates Using Python & Metabase

My wife and I have been house hunting for several months, and the mortgage rate market in the United States has been up and down. I’ve find myself looking at rates from local lenders daily, navigating to their websites and writing the rates down or mentally taking note of where they are at. It’s difficult to see trends and understand if rates are rising or falling over time since, although there are daily fluctuations, meaningful changes take place over weeks or months.

I like to outline the general steps I need to take before I start building anything. This keeps thinking about the code from getting in the way of the goals and let’s me think through the process and goals rather than tackling the first step in isolation.

Here’s what I needed to do:

  1. Retrieve & parse data from the lender’s website
  2. Write the data to a database
  3. Schedule the program to run daily
  4. Create a front end to view the rates

Retrieving & Parsing the Rates

My first milestone was to find a way to retrieve the mortgage rates from the public website where my lender publishes updates daily. I needed a way to retrieve and parse HTML!

I was familiar with the Beautiful Soup package for Python, but hadn’t used it aside from a small project when I started learning Python. I got started quickly and was able to fetch HTML from the web, but parsing the values wasn’t going well for me. I was trying to extract a few values from a table, and having difficulty getting a table cell’s value from multiple tables on the webpage. My efforts were getting me the values I wanted from the first table on the page, but I wasn’t able to extract the second table. I tried different looping approaches and using filters with a variety of keywords in my Beautiful Soup methods, but I was still only getting the first table’s values.

The approach the worked for me was to capture all the td values using the findall() method and parse the values I was after using substrings. This worked for me in the moment, but if the design of the webpage changes or the format of the table is adjusted my code may not parse the correct values from the table. I know there is a more sophisticated approach out there, and I might revisit this part of the program in the future as I learn more about Beautiful Soup and the intricacies of parsing HTML.

# Fetch mortgage rates
URL = "https://www.wingsfinancial.com/rates/mortgages-purchase"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
table = soup.find_all('td')

fr30 = float(str(table[0])[4:9])
fr30pt = float(str(table[2])[4:6])
ARM71 = float(str(table[25])[4:9])
ARM71pt = float(str(table[7])[4:6])

Writing to a Database

Now that I had the values, I needed to write them to the a database. I have a MySQL database I use for hobby projects and host on the cloud, so I decided to use that existing database. Since, I had done this in a previous project using a Raspberry Pi to record my office temperature using the mysql.connector package, I was able to refresh my memory and write the rest of the code to insert the values to a table.

# Create variables
source = "Wings"
ct = datetime.datetime.now()

# Connect to database
db = mysql.connector.connect(
    host = config.host,
    user = config.user,
    password = config.pw,
    database = config.db
)
cursor = db.cursor()

# Write to database
sql = "INSERT INTO `daily_rates` (source, timestamp, 30_year_fixed_rate, 30_year_fixed_points, 71_arm_rate,71_arm_point) VALUES (%s,%s,%s,%s,%s,%s)"
val = (source,ct,fr30,fr30pt,ARM71,ARM71pt)
cursor.execute(sql,val)

db.commit()

# Close database connection
cursor.close()
db.close()

After running a few tests to confirm the program was writing the right values to the database, I committed the project to GitHub and felt pretty accomplished! All that was left was to schedule the program to run daily!

Scheduling with cron

On my Raspberry Pi temperature monitoring project, I used a never ending loop with a time.sleep function within the Python code to “schedule” a run, essentially delaying the loop from iterating until a certain amount of time had elapsed.

while True:
    try:
        temperature_c = dhtDevice.temperature
        temperature_f = temperature_c * (9 / 5) + 32
        humidity = dhtDevice.humidity
        ct = datetime.datetime.now()
        sql = "INSERT INTO `temperature-sensor` (timestamp, room, temperature_f, humidity) VALUES (%s, %s, %s, %s)"
        val = (ct, room, temperature_f, humidity)
        mycursor.execute(sql, val)
        db.commit()
    time.sleep(60)

While this is simple in that everything is written in a single Python program, I realized this wasn’t a good approach since my Python program never stopped running. Using some system resources and most likely frying my cheap micro-SD card over time.

I didn’t need to my code to run 24/7 when it only needs to execute once per day. The MySQL database I use is hosted on a Linux server, and I decided to use cron to schedule my Python program to run daily at noon central time. This was my first time using cron and I once I understood the syntax of the configuration file it went fairly smoothly. The cron wikipedia page has a diagram that helped me most to understand the proper syntax of my job’s configuration.

After cloning my git project to the server’s /root directory, I set up the crontab file. I wanted to run the python program Monday - Friday at noon so my entry into the crontab file was:

0 12 * * 1-5 /usr/bin/python3 /root/mortgage-rate-scraper/script.py

The next day I was so excited to see the results of my first run, which I was sure executed without any problems! I logged in to the MySQL database and queried my table. The records were written to the database… but they were written to MySQL at 6am instead of noon. Timezones! I hadn’t considered that the linux server I scheduled the job on was on UTC time. A small adjustment to the crontab entry to account for the 6 hour offset and I was all set.

0 18 * * 1-5 /usr/bin/python3 /root/mortgage-rate-scraper/script.py

I’m going to do some more research on handling time zones and daylight savings time adjustments when using cron so I don’t need to keep adjusting twice per year. Overall, I am really happy with this approach, and it’s vastly superior to my previous never ending loop method!

Viewing the Results

The last step in this project was to create a user facing dashboard where I could see today’s rates and a simple trend of rates over time. For most of my personal projects, I like to use Metabase for this kind of reporting. It’s simple to configure, lightweight, and open source. Metabase also offers an embedding option that I’m going to try out for the first time in this post!

Here’s a link to the GitHub repository for this project if you want to check out the full Python code!