How We Generated Price Predictions

--

Reverb helps music makers of all levels connect with sellers all over the world ranging from small businesses, family-owned shops, and individuals to the world’s largest musical instrument retailers and well-known musicians.

Each day, thousands of new listings enter the Reverb marketplace — making prices across different products fluctuate as inventory changes. For individual sellers, price can be the best predictor of how quickly an item will sell, so having consistent and accurate pricing information is a must. We want to quickly answer questions like, “How should I price my item when I list it to sell it quickly?” and “How much is my gear worth?”

For buyers, obviously, price is the main comparison metric. Our goal is to help users quickly understand listing options and answer questions like, “Is this item well priced?” and “How does this compare to buying brand new?”

With this in mind, we sought to make available dynamic price data that change as inventory changes and that are responsive to both gradual and sudden price shifts. In this post we will cover the implementation of this initiative from beginning to end.

Before we begin, let’s define one of our key terms. Canonical products are highly granular “buckets” into which only listings for identical items are placed. For example, we have many Boss DS-1 distortion pedal listings that belong to a singular Boss DS-1 canonical product.

Prototyping

Our first step was to identify all of the relevant features, or properties, that influence the price of a product. Our data science team started by modeling the price distribution across different product conditions (new, mint, excellent, very good, good, fair, poor) for every listing in our top 15,000 canonical products for the last three years. After testing a variety of models and getting optimal results with an Extreme Gradient Boosted Trees regressor, we applied out-of-time validation which closely replicates the prediction environment in which future price shifts are unknown at the time of prediction.

Once we felt confident about the quality of our price predictions, we needed to expand those into reliable price ranges, as one single price point will not accommodate the needs of all sellers and buyers, and different products see different levels of price variation. To do this, we trained a new model on the out-of-sample (e.g. holdout) absolute residuals of the first model and used the output of both models to construct the price distribution for a given canonical product and condition.

Using the predicted price distributions, we can convert all sales into a normalized price and calculate the model accuracy over a given period. We took the output predictions of the model and compared against actual prices of live listings and as seen here, the actual distribution of prices lines up very well with the expected distribution.

Developing a Scalable ML Training Pipeline

We rely on Airflow to orchestrate our machine learning processes by defining a DAG (a Directed Acyclic Graph — think of it as an open chain where each link represents a specific task) for every new machine learning pipeline.

Most ML pipelines follow a similar sequence of events:

We decided this repeatable sequence should be easy to represent in a simple framework that would allow us to quickly define future ML pipelines in Airflow without a lot of new custom code. We created project-agnostic function wrappers around the modeling code so that we could easily execute this operation from the DAG level without much new code:

Once we had a defined DAG for each model, we needed to ensure they were executed in the correct sequence as one depends on the output of the other. For this we defined a third “meta” DAG in charge of executing the DAGs in order:

We are currently still using Airflow 1.10.12 so we had to define a few custom pieces to trigger and watch DAGs:

meta-dag triggering each model dag in sequence
our custom implementation to trigger a given dag

We found that the out-of-the-box implementation of External Task Sensor was unreliable as Airflow sometimes momentarily sets the task instance state as `None` in between state changes, resulting in the sensor failing sporadically. To get around this, we wrote our own sensor that pulls the exact execution date of the task from the task instance’s and doesn’t rely on status:

Once we complete training the model, we create a new “deployment” — meaning we make the model active for production traffic.

Delivering Quick Predictions

We wanted to make these price trends available to our ~100,000 canonical products while keeping the site snappy for users. Instead of making live requests to the prediction server to get a price prediction for each, we chose to generate predictions for all upfront via a batch request to the prediction service and persist them in our Redshift cluster as a newly “active” prediction dataset; refreshing them once a day to account for daily fluctuations.

We now needed to make these predictions available in our microservice in charge of talking to the rest of the Reverb ecosystem. We leveraged AWS’s dblink to easily make thousands of rows available to the service’s Postgres instance:

Whenever Airflow finishes loading the most recent batch of predictions to Redshift, a call to the service is fired off, triggering a celery task that will create a new materialized view on top of the dblink table, effectively pulling in the latest predictions.

Finally, we made all price ranges queryable via Reverb’s GraphQL layer:

What’s Next

Our Seller Outreach team has already integrated these price trends in personalized emails to our sellers and will be helping users with estimating the value of their gear with the upcoming Gear Collections feature.

Interested in solving challenges like this one and making the world more musical? We’re hiring! Visit Reverb Careers to see open positions and learn more about our team, values, and benefits.

The trademarks referenced in this post are trademarks of their respective owners. Use of these trademarks do not state or imply that Reverb is affiliated with, endorsed, sponsored, or approved by the trademark owners.

--

--