-
-
Notifications
You must be signed in to change notification settings - Fork 272
BART: Categorical example #663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:22Z Use az.style.use("arviz-darkgrid") Remove plt.rcParams["figure.dpi"] = 300 |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:23Z The pdp plot, together with the Variable Importance plot, confirms that |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:23Z Add to the next section and compare with the PPC plot or remove it |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:24Z So far we have a very good result concerning the classification of the species based on the 5 covariables. However, if we want to select a subset of covariable to perform future classifications is not very clear which of them to select. Maybe something sure is that
Unfortunatelly, the partial dependence plots show a very wide dispersion, making results look suspicious. One way to reduce this variability is adjusting 3 independent trees, below we will see how to do this and get a more accurate result. |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:25Z Fitting independent trees
The option to fit independent trees with pymc-bart is set with the parameter pmb.BART(..., separate_trees=True, ...). As we will see, for this example, using this option doesn't give a big difference in the predictions, but helps us to reduce the variability in the ppc and get a small improvement in the in-sample comparison. In case this option is used with bigger datasets you have to take into account that the model fits more slowly, so you can obtain a better result at the expense of computational cost. The following code runs the same model and analysis as before, but fitting 3 independent trees. Compare the time to run this model with the previous one PabloGGaray commented on 2024-05-23T16:00:54Z It's ok the "3" in "but fitting 3 independent trees."? aloctavodia commented on 2024-05-23T16:06:16Z Well, it is 3 independent "sum of trees". Better to remove the "3" |
View / edit / reply to this conversation on ReviewNB aloctavodia commented on 2024-05-23T13:55:26Z Now we are going to reproduce the same analyses as before. |
It's ok the "3" in "but fitting 3 independent trees."? View entire conversation on ReviewNB |
Well, it is 3 independent "sum of trees". Better to remove the "3" View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-05-23T16:12:08Z Line #7. Hawks = pd.read_csv(pm.get_data("marketing.csv"))[ Is this a copy/paste error? I assume we don't want marketing.csv. PabloGGaray commented on 2024-05-23T17:10:33Z Yes, it's a copy/paste error, thanks. |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-05-23T16:12:10Z Second sentence needs some cleanup/rewording. Maybe something like:
Still, none of the variables have a marked separation among the species distributions such that they can cleanly separate them. |
View / edit / reply to this conversation on ReviewNB fonnesbeck commented on 2024-05-23T16:12:11Z First sentence needs some rewording. Perhaps something like:
It may be that some of the input variables are not informative for classifying by species, so in the interest of parsimony and in reducing the computational cost of model estimation, it is useful to quantify the importance of each variable in the dataset. |
Yes, it's a copy/paste error, thanks. View entire conversation on ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. What do you think @fonnesbeck?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some minor formatting and doc related comments. It can be merged as is, mostly trying to bring more attention to pymc-bart as library
Closes pymc-devs/pymc-bart#100
📚 Documentation preview 📚: https://pymc-examples--663.org.readthedocs.build/en/663/