Interpreting decision tree results after target encoding

Question

I am not sure how to interpret the results of my decision tree after I had used target encoding, could someone clarify? The example below doesn't need target encoding just for explanation of my confusion here.
For instance I am trying to classify if a fruit is rotten or not given its age and fruit type. I use target encoding for the fruit column:

I then get the following decision tree with default sklearn decision tree classifier parameters:

I believe after encoding I have lost information about fruit type and I can only say that if fruit_target <= 0.841 then the fruit is rotten if smaller, else not rotten. But then how do i interpret 0.841; what does it mean?

Sammy · Accepted Answer

I believe after encoding I have lost information about fruit type and I can only say that if fruit_target <= 0.841 then the fruit is rotten if smaller, else not rotten. But then how do i interpret 0.841; what does it mean?

Recall what the target encoding actually is in this example: it is the share of rotten fruits per fruit type, e.g. $75 %$ of data points with fruit == pear are estimated to be rotten (I say "estimated" because it depends on the type of target encoding whether this an exact number or an estimate).
Accordingly, you can infer from the decision tree that a data point will be classified as rotten iff its fruit type has more than $0.841 = 84.1%$ rotten data points in the training set.

Interpreting decision tree results after target encoding

One Answer

Add your own answers!

Ask a Question