-
Notifications
You must be signed in to change notification settings - Fork 118
Description
Environment Details
- Copulas version: 0.12.2
- Python version: 3.11
- Operating System: Linux
Error Description
As first described in #469, it seems that whenever Copulas is asked to print parameters for a fitted GaussianKDE distribution, it just prints out a copy of the data that was fitted.
In the code below, the final column (column z) is fitted to a GaussianKDE distribution.
from copulas.datasets import sample_trivariate_xyz
from copulas.multivariate import GaussianMultivariate
data = sample_trivariate_xyz()
dist = GaussianMultivariate()
dist.fit(data)
parameters = dist.to_dict()
univariates = parameters['univariates']
print(univariates[2]){'dataset': [0.638689008563623, 1.058121237066397, 0.3725063445214631, 0.687369594994837, -0.8810681732344304, -0.7121672205062004, 5.050261904362624, ...
'type': 'copulas.univariate.gaussian_kde.GaussianKDE'
The data seems to be just be the exact values in column z
Expected Behavior
It's unexpected that the entire column's data would be reported at this step.
I would expect that when printing out the distribution, it would only show the 'type' of distribution and nothing else.
print(univariates[2]){ 'type': ''copulas.univariate.gaussian_kde.GaussianKDE' }
It seems like the "parameters" are set to the data in fit portion:
Copulas/copulas/univariate/gaussian_kde.py
Lines 166 to 172 in 356be32
| def _fit(self, X): | |
| if self._sample_size: | |
| X = gaussian_kde(X, bw_method=self.bw_method, weights=self.weights).resample( | |
| self._sample_size | |
| ) | |
| self._params = {'dataset': X.tolist()} | |
| self._model = self._get_model() |
Ideally, the _params assigned to the GaussianKDE should be None, GaussianKDE is non-parametric distribution. Whatever info we need to save the state of the GassianKDE should be saved under a different name and not exposed as parameters.