Time Series Classification with multiple rows per date

Question

I have a time series data set with the lifecycle of 9000 different B2B sales leads. What I call lifecycle consists of a dataset with one registry per day for every different sales Lead identifier with 4 predictive variables (DAYS_SINCE_START, LEAD_ID, CUSTOMER_INTEREST, MARKET, TYPE_SERVICE) and one response variable (OUTCOME). The response variable outcome can have 2 different values: Won (1) or Lost (0).

A mock example of the data frame would be the following:

As it can be seen, some leads “die” before others, this is because we receive the final outcome of the customer at that day of its lifecycle (we won or lost the lead), so that lead identifier drops from the dataset.

My mission is to create one single model that could be able to define the outcome of a new sales lead that is entering to the data set, and the prediction would be done 30 days after current time (Why 30 days after? The first 30 days after the resources of the company have already been assigned). How would I model this?

Brian Spiering · Answer

This problem can be modeled with a standard binary classification since you labeled data. Time can be considered a feature which can be encoded relative to outcome date. Random forest would be a good algorithm to try.
You need to rearrange the data to be tidy data where each LEAD_ID is in a single with row with all features for that id in the same row.

Time Series Classification with multiple rows per date

One Answer

Add your own answers!

Ask a Question