Skip to content

BART: Categorical example #663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 30, 2024
Merged

Conversation

PabloGGaray
Copy link
Contributor

@PabloGGaray PabloGGaray commented May 22, 2024

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:22Z
----------------------------------------------------------------

Use az.style.use("arviz-darkgrid")

Remove plt.rcParams["figure.dpi"] = 300


Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:23Z
----------------------------------------------------------------

The pdp plot, together with the Variable Importance plot, confirms that Tail is the covariable with the smaller effect over the predicted variable. In the Variable Importance plot Tail is the last covariable to be added and does not improve the result, in the pdp plot Tail has the flattest response.


Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:23Z
----------------------------------------------------------------

Add to the next section and compare with the PPC plot or remove it


Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:24Z
----------------------------------------------------------------

So far we have a very good result concerning the classification of the species based on the 5 covariables. However, if we want to select a subset of covariable to perform future classifications is not very clear which of them to select. Maybe something sure is that Tail could be eliminated. At the beginning when we plot the distribution of each covariable we said that the most important variables to make the classification could be Wing, Weight and, Culmen, nevertheless after running the model we saw that Hallux, Culmen and, Wing, proved to be the most important ones. 

Unfortunatelly, the partial dependence plots show a very wide dispersion, making results look suspicious. One way to reduce this variability is adjusting 3 independent trees, below we will see how to do this and get a more accurate result.


Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:25Z
----------------------------------------------------------------

Fitting independent trees

The option to fit independent trees with pymc-bart is set with the parameter pmb.BART(..., separate_trees=True, ...). As we will see, for this example, using this option doesn't give a big difference in the predictions, but helps us to reduce the variability in the ppc and get a small improvement in the in-sample comparison. In case this option is used with bigger datasets you have to take into account that the model fits more slowly, so you can obtain a better result at the expense of computational cost. The following code runs the same model and analysis as before, but fitting 3 independent trees. Compare the time to run this model with the previous one


PabloGGaray commented on 2024-05-23T16:00:54Z
----------------------------------------------------------------

It's ok the "3" in "but fitting 3 independent trees."?

aloctavodia commented on 2024-05-23T16:06:16Z
----------------------------------------------------------------

Well, it is 3 independent "sum of trees". Better to remove the "3"

Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

aloctavodia commented on 2024-05-23T13:55:26Z
----------------------------------------------------------------

Now we are going to reproduce the same analyses as before.


Copy link
Contributor Author

It's ok the "3" in "but fitting 3 independent trees."?


View entire conversation on ReviewNB

Copy link
Member

Well, it is 3 independent "sum of trees". Better to remove the "3"


View entire conversation on ReviewNB

Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-05-23T16:12:08Z
----------------------------------------------------------------

Line #7.        Hawks = pd.read_csv(pm.get_data("marketing.csv"))[

Is this a copy/paste error? I assume we don't want marketing.csv.


PabloGGaray commented on 2024-05-23T17:10:33Z
----------------------------------------------------------------

Yes, it's a copy/paste error, thanks.

Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-05-23T16:12:10Z
----------------------------------------------------------------

Second sentence needs some cleanup/rewording. Maybe something like:

Still, none of the variables have a marked separation among the species distributions such that they can cleanly separate them.


Copy link

review-notebook-app bot commented May 23, 2024

View / edit / reply to this conversation on ReviewNB

fonnesbeck commented on 2024-05-23T16:12:11Z
----------------------------------------------------------------

First sentence needs some rewording. Perhaps something like:

It may be that some of the input variables are not informative for classifying by species, so in the interest of parsimony and in reducing the computational cost of model estimation, it is useful to quantify the importance of each variable in the dataset.


Copy link
Contributor Author

Yes, it's a copy/paste error, thanks.


View entire conversation on ReviewNB

Copy link
Member

@aloctavodia aloctavodia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. What do you think @fonnesbeck?

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some minor formatting and doc related comments. It can be merged as is, mostly trying to bring more attention to pymc-bart as library

@aloctavodia aloctavodia merged commit 1f9ec4d into pymc-devs:main May 30, 2024
2 checks passed
@PabloGGaray PabloGGaray deleted the bart-categ branch May 30, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add categorical example
3 participants