比特派app最新版下载苹果|sklearn pca
比特派app最新版下载苹果|sklearn pca
sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation
sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation
Install
User Guide
API
Examples
Community
Getting Started
Tutorial
What's new
Glossary
Development
FAQ
Support
Related packages
Roadmap
Governance
About us
GitHub
Other Versions and Download
More
Getting Started
Tutorial
What's new
Glossary
Development
FAQ
Support
Related packages
Roadmap
Governance
About us
GitHub
Other Versions and Download
Toggle Menu
PrevUp
Next
scikit-learn 1.4.1
Other versions
Please cite us if you use the software.
sklearn.decomposition.PCA
PCA
PCA.fit
PCA.fit_transform
PCA.get_covariance
PCA.get_feature_names_out
PCA.get_metadata_routing
PCA.get_params
PCA.get_precision
PCA.inverse_transform
PCA.score
PCA.score_samples
PCA.set_output
PCA.set_params
PCA.transform
Examples using sklearn.decomposition.PCA
sklearn.decomposition.PCA¶
class sklearn.decomposition.PCA(n_components=None, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)[source]¶
Principal component analysis (PCA).
Linear dimensionality reduction using Singular Value Decomposition of the
data to project it to a lower dimensional space. The input data is centered
but not scaled for each feature before applying the SVD.
It uses the LAPACK implementation of the full SVD or a randomized truncated
SVD by the method of Halko et al. 2009, depending on the shape of the input
data and the number of components to extract.
It can also use the scipy.sparse.linalg ARPACK implementation of the
truncated SVD.
Notice that this class does not support sparse input. See
TruncatedSVD for an alternative with sparse data.
For a usage example, see
PCA example with Iris Data-set
Read more in the User Guide.
Parameters:
n_componentsint, float or ‘mle’, default=NoneNumber of components to keep.
if n_components is not set all components are kept:
n_components == min(n_samples, n_features)
If n_components == 'mle' and svd_solver == 'full', Minka’s
MLE is used to guess the dimension. Use of n_components == 'mle'
will interpret svd_solver == 'auto' as svd_solver == 'full'.
If 0 < n_components < 1 and svd_solver == 'full', select the
number of components such that the amount of variance that needs to be
explained is greater than the percentage specified by n_components.
If svd_solver == 'arpack', the number of components must be
strictly less than the minimum of n_features and n_samples.
Hence, the None case results in:
n_components == min(n_samples, n_features) - 1
copybool, default=TrueIf False, data passed to fit are overwritten and running
fit(X).transform(X) will not yield the expected results,
use fit_transform(X) instead.
whitenbool, default=FalseWhen True (False by default) the components_ vectors are multiplied
by the square root of n_samples and then divided by the singular values
to ensure uncorrelated outputs with unit component-wise variances.
Whitening will remove some information from the transformed signal
(the relative variance scales of the components) but can sometime
improve the predictive accuracy of the downstream estimators by
making their data respect some hard-wired assumptions.
svd_solver{‘auto’, ‘full’, ‘arpack’, ‘randomized’}, default=’auto’
If auto :The solver is selected by a default policy based on X.shape and
n_components: if the input data is larger than 500x500 and the
number of components to extract is lower than 80% of the smallest
dimension of the data, then the more efficient ‘randomized’
method is enabled. Otherwise the exact full SVD is computed and
optionally truncated afterwards.
If full :run exact full SVD calling the standard LAPACK solver via
scipy.linalg.svd and select the components by postprocessing
If arpack :run SVD truncated to n_components calling ARPACK solver via
scipy.sparse.linalg.svds. It requires strictly
0 < n_components < min(X.shape)
If randomized :run randomized SVD by the method of Halko et al.
New in version 0.18.0.
tolfloat, default=0.0Tolerance for singular values computed by svd_solver == ‘arpack’.
Must be of range [0.0, infinity).
New in version 0.18.0.
iterated_powerint or ‘auto’, default=’auto’Number of iterations for the power method computed by
svd_solver == ‘randomized’.
Must be of range [0, infinity).
New in version 0.18.0.
n_oversamplesint, default=10This parameter is only relevant when svd_solver="randomized".
It corresponds to the additional number of random vectors to sample the
range of X so as to ensure proper conditioning. See
randomized_svd for more details.
New in version 1.1.
power_iteration_normalizer{‘auto’, ‘QR’, ‘LU’, ‘none’}, default=’auto’Power iteration normalizer for randomized SVD solver.
Not used by ARPACK. See randomized_svd
for more details.
New in version 1.1.
random_stateint, RandomState instance or None, default=NoneUsed when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int
for reproducible results across multiple function calls.
See Glossary.
New in version 0.18.0.
Attributes:
components_ndarray of shape (n_components, n_features)Principal axes in feature space, representing the directions of
maximum variance in the data. Equivalently, the right singular
vectors of the centered input data, parallel to its eigenvectors.
The components are sorted by decreasing explained_variance_.
explained_variance_ndarray of shape (n_components,)The amount of variance explained by each of the selected components.
The variance estimation uses n_samples - 1 degrees of freedom.
Equal to n_components largest eigenvalues
of the covariance matrix of X.
New in version 0.18.
explained_variance_ratio_ndarray of shape (n_components,)Percentage of variance explained by each of the selected components.
If n_components is not set then all components are stored and the
sum of the ratios is equal to 1.0.
singular_values_ndarray of shape (n_components,)The singular values corresponding to each of the selected components.
The singular values are equal to the 2-norms of the n_components
variables in the lower-dimensional space.
New in version 0.19.
mean_ndarray of shape (n_features,)Per-feature empirical mean, estimated from the training set.
Equal to X.mean(axis=0).
n_components_intThe estimated number of components. When n_components is set
to ‘mle’ or a number between 0 and 1 (with svd_solver == ‘full’) this
number is estimated from input data. Otherwise it equals the parameter
n_components, or the lesser value of n_features and n_samples
if n_components is None.
n_samples_intNumber of samples in the training data.
noise_variance_floatThe estimated noise covariance following the Probabilistic PCA model
from Tipping and Bishop 1999. See “Pattern Recognition and
Machine Learning” by C. Bishop, 12.2.1 p. 574 or
http://www.miketipping.com/papers/met-mppca.pdf. It is required to
compute the estimated data covariance and score samples.
Equal to the average of (min(n_features, n_samples) - n_components)
smallest eigenvalues of the covariance matrix of X.
n_features_in_intNumber of features seen during fit.
New in version 0.24.
feature_names_in_ndarray of shape (n_features_in_,)Names of features seen during fit. Defined only when X
has feature names that are all strings.
New in version 1.0.
See also
KernelPCAKernel Principal Component Analysis.
SparsePCASparse Principal Component Analysis.
TruncatedSVDDimensionality reduction using truncated SVD.
IncrementalPCAIncremental Principal Component Analysis.
References
For n_components == ‘mle’, this class uses the method from:
Minka, T. P.. “Automatic choice of dimensionality for PCA”.
In NIPS, pp. 598-604
Implements the probabilistic PCA model from:
Tipping, M. E., and Bishop, C. M. (1999). “Probabilistic principal
component analysis”. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 61(3), 611-622.
via the score and score_samples methods.
For svd_solver == ‘arpack’, refer to scipy.sparse.linalg.svds.
For svd_solver == ‘randomized’, see:
Halko, N., Martinsson, P. G., and Tropp, J. A. (2011).
“Finding structure with randomness: Probabilistic algorithms for
constructing approximate matrix decompositions”.
SIAM review, 53(2), 217-288.
and also
Martinsson, P. G., Rokhlin, V., and Tygert, M. (2011).
“A randomized algorithm for the decomposition of matrices”.
Applied and Computational Harmonic Analysis, 30(1), 47-68.
Examples
>>> import numpy as np
>>> from sklearn.decomposition import PCA
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
>>> pca = PCA(n_components=2)
>>> pca.fit(X)
PCA(n_components=2)
>>> print(pca.explained_variance_ratio_)
[0.9924... 0.0075...]
>>> print(pca.singular_values_)
[6.30061... 0.54980...]
>>> pca = PCA(n_components=2, svd_solver='full')
>>> pca.fit(X)
PCA(n_components=2, svd_solver='full')
>>> print(pca.explained_variance_ratio_)
[0.9924... 0.00755...]
>>> print(pca.singular_values_)
[6.30061... 0.54980...]
>>> pca = PCA(n_components=1, svd_solver='arpack')
>>> pca.fit(X)
PCA(n_components=1, svd_solver='arpack')
>>> print(pca.explained_variance_ratio_)
[0.99244...]
>>> print(pca.singular_values_)
[6.30061...]
Methods
fit(X[, y])
Fit the model with X.
fit_transform(X[, y])
Fit the model with X and apply the dimensionality reduction on X.
get_covariance()
Compute data covariance with the generative model.
get_feature_names_out([input_features])
Get output feature names for transformation.
get_metadata_routing()
Get metadata routing of this object.
get_params([deep])
Get parameters for this estimator.
get_precision()
Compute data precision matrix with the generative model.
inverse_transform(X)
Transform data back to its original space.
score(X[, y])
Return the average log-likelihood of all samples.
score_samples(X)
Return the log-likelihood of each sample.
set_output(*[, transform])
Set output container.
set_params(**params)
Set the parameters of this estimator.
transform(X)
Apply dimensionality reduction to X.
fit(X, y=None)[source]¶
Fit the model with X.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)Training data, where n_samples is the number of samples
and n_features is the number of features.
yIgnoredIgnored.
Returns:
selfobjectReturns the instance itself.
fit_transform(X, y=None)[source]¶
Fit the model with X and apply the dimensionality reduction on X.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)Training data, where n_samples is the number of samples
and n_features is the number of features.
yIgnoredIgnored.
Returns:
X_newndarray of shape (n_samples, n_components)Transformed values.
Notes
This method returns a Fortran-ordered array. To convert it to a
C-ordered array, use ‘np.ascontiguousarray’.
get_covariance()[source]¶
Compute data covariance with the generative model.
cov = components_.T * S**2 * components_ + sigma2 * eye(n_features)
where S**2 contains the explained variances, and sigma2 contains the
noise variances.
Returns:
covarray of shape=(n_features, n_features)Estimated covariance of data.
get_feature_names_out(input_features=None)[source]¶
Get output feature names for transformation.
The feature names out will prefixed by the lowercased class name. For
example, if the transformer outputs 3 features, then the feature names
out are: ["class_name0", "class_name1", "class_name2"].
Parameters:
input_featuresarray-like of str or None, default=NoneOnly used to validate feature names with the names seen in fit.
Returns:
feature_names_outndarray of str objectsTransformed feature names.
get_metadata_routing()[source]¶
Get metadata routing of this object.
Please check User Guide on how the routing
mechanism works.
Returns:
routingMetadataRequestA MetadataRequest encapsulating
routing information.
get_params(deep=True)[source]¶
Get parameters for this estimator.
Parameters:
deepbool, default=TrueIf True, will return the parameters for this estimator and
contained subobjects that are estimators.
Returns:
paramsdictParameter names mapped to their values.
get_precision()[source]¶
Compute data precision matrix with the generative model.
Equals the inverse of the covariance but computed with
the matrix inversion lemma for efficiency.
Returns:
precisionarray, shape=(n_features, n_features)Estimated precision of data.
inverse_transform(X)[source]¶
Transform data back to its original space.
In other words, return an input X_original whose transform would be X.
Parameters:
Xarray-like of shape (n_samples, n_components)New data, where n_samples is the number of samples
and n_components is the number of components.
Returns:
X_original array-like of shape (n_samples, n_features)Original data, where n_samples is the number of samples
and n_features is the number of features.
Notes
If whitening is enabled, inverse_transform will compute the
exact inverse operation, which includes reversing whitening.
score(X, y=None)[source]¶
Return the average log-likelihood of all samples.
See. “Pattern Recognition and Machine Learning”
by C. Bishop, 12.2.1 p. 574
or http://www.miketipping.com/papers/met-mppca.pdf
Parameters:
Xarray-like of shape (n_samples, n_features)The data.
yIgnoredIgnored.
Returns:
llfloatAverage log-likelihood of the samples under the current model.
score_samples(X)[source]¶
Return the log-likelihood of each sample.
See. “Pattern Recognition and Machine Learning”
by C. Bishop, 12.2.1 p. 574
or http://www.miketipping.com/papers/met-mppca.pdf
Parameters:
Xarray-like of shape (n_samples, n_features)The data.
Returns:
llndarray of shape (n_samples,)Log-likelihood of each sample under the current model.
set_output(*, transform=None)[source]¶
Set output container.
See Introducing the set_output API
for an example on how to use the API.
Parameters:
transform{“default”, “pandas”}, default=NoneConfigure output of transform and fit_transform.
"default": Default output format of a transformer
"pandas": DataFrame output
"polars": Polars output
None: Transform configuration is unchanged
New in version 1.4: "polars" option was added.
Returns:
selfestimator instanceEstimator instance.
set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects
(such as Pipeline). The latter have
parameters of the form
possible to update each component of a nested object.
Parameters:
**paramsdictEstimator parameters.
Returns:
selfestimator instanceEstimator instance.
transform(X)[source]¶
Apply dimensionality reduction to X.
X is projected on the first principal components previously extracted
from a training set.
Parameters:
X{array-like, sparse matrix} of shape (n_samples, n_features)New data, where n_samples is the number of samples
and n_features is the number of features.
Returns:
X_newarray-like of shape (n_samples, n_components)Projection of X in the first principal components, where n_samples
is the number of samples and n_components is the number of the components.
Examples using sklearn.decomposition.PCA¶
Release Highlights for scikit-learn 1.4
Release Highlights for scikit-learn 1.4
A demo of K-Means clustering on the handwritten digits data
A demo of K-Means clustering on the handwritten digits data
Principal Component Regression vs Partial Least Squares Regression
Principal Component Regression vs Partial Least Squares Regression
The Iris Dataset
The Iris Dataset
Blind source separation using FastICA
Blind source separation using FastICA
Comparison of LDA and PCA 2D projection of Iris dataset
Comparison of LDA and PCA 2D projection of Iris dataset
Faces dataset decompositions
Faces dataset decompositions
Factor Analysis (with rotation) to visualize patterns
Factor Analysis (with rotation) to visualize patterns
FastICA on 2D point clouds
FastICA on 2D point clouds
Incremental PCA
Incremental PCA
Kernel PCA
Kernel PCA
Model selection with Probabilistic PCA and Factor Analysis (FA)
Model selection with Probabilistic PCA and Factor Analysis (FA)
PCA example with Iris Data-set
PCA example with Iris Data-set
Faces recognition example using eigenfaces and SVMs
Faces recognition example using eigenfaces and SVMs
Image denoising using kernel PCA
Image denoising using kernel PCA
Multi-dimensional scaling
Multi-dimensional scaling
Displaying Pipelines
Displaying Pipelines
Explicit feature map approximation for RBF kernels
Explicit feature map approximation for RBF kernels
Multilabel classification
Multilabel classification
Balance model complexity and cross-validated score
Balance model complexity and cross-validated score
Dimensionality Reduction with Neighborhood Components Analysis
Dimensionality Reduction with Neighborhood Components Analysis
Kernel Density Estimation
Kernel Density Estimation
Column Transformer with Heterogeneous Data Sources
Column Transformer with Heterogeneous Data Sources
Concatenating multiple feature extraction methods
Concatenating multiple feature extraction methods
Pipelining: chaining a PCA and a logistic regression
Pipelining: chaining a PCA and a logistic regression
Selecting dimensionality reduction with Pipeline and GridSearchCV
Selecting dimensionality reduction with Pipeline and GridSearchCV
Importance of Feature Scaling
Importance of Feature Scaling
© 2007 - 2024, scikit-learn developers (BSD License).
Show this page source
SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎
SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎首发于数据分析切换模式写文章登录/注册SKLEARN中的PCA(Principal Component Analysis)主成分分析法HarryYang致力于用算法评估管理,改善管理PCA(Principal Component Analysis)主成分分析法是机器学习中非常重要的方法,主要作用有降维和可视化。PCA的过程除了背后深刻的数学意义外,也有深刻的思路和方法。1. 准备数据集本文利用sklearn中的datasets的Iris数据做示范,说明sklearn中的PCA方法。导入数据并对数据做一个概览:import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
digits = datasets.load_digits()
X = digits.data
y = digits.target
X.shape,y.shape
((1797, 64), (1797,))将数据做一个分离,分离成训练数据集和测试数据集:from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,random_state = 666)
X_train.shape,X_test.shape
((1347, 64), (450, 64))2. 利用PCA对数据集进行降维将按照降维前,降维后的模型训练时间和score来了解PCA的作用首先不对数据集做处理,直接fit,同时对降维之前的fit过程计时,查看score:%%time
from sklearn.neighbors import KNeighborsClassifier
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train, y_train)
Wall time: 88 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
knn_clf.score(X_test, y_test)
0.9866666666666667通过sklearn中的PCA对数据集进行降维,查看降维后的运行时间和score:from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
pca.fit(X_train)
X_train_reduction = pca.transform(X_train)
X_test_reduction = pca.transform(X_test)
%%time
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train_reduction,y_train)
Wall time: 3 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
knn_clf.score(X_test_reduction,y_test)
0.6066666666666667从以上数据来看,时间的运行时间有明显降低,但是准确率不是我们可以接受的:#通过sklearn.PCA.explaine_variance_ration_来查看刚刚的2个纬度的方差爱解释度:
pca.explained_variance_ratio_
array([0.14566817, 0.13735469])
pca = PCA(n_components = X_train.shape[1])##计算所有纬度的方差解释度
pca.fit(X_train)
pca.explained_variance_ratio_
array([1.45668166e-01, 1.37354688e-01, 1.17777287e-01, 8.49968861e-02,
5.86018996e-02, 5.11542945e-02, 4.26605279e-02, 3.60119663e-02,
3.41105814e-02, 3.05407804e-02, 2.42337671e-02, 2.28700570e-02,
1.80304649e-02, 1.79346003e-02, 1.45798298e-02, 1.42044841e-02,
1.29961033e-02, 1.26617002e-02, 1.01728635e-02, 9.09314698e-03,
8.85220461e-03, 7.73828332e-03, 7.60516219e-03, 7.11864860e-03,
6.85977267e-03, 5.76411920e-03, 5.71688020e-03, 5.08255707e-03,
4.89020776e-03, 4.34888085e-03, 3.72917505e-03, 3.57755036e-03,
3.26989470e-03, 3.14917937e-03, 3.09269839e-03, 2.87619649e-03,
2.50362666e-03, 2.25417403e-03, 2.20030857e-03, 1.98028746e-03,
1.88195578e-03, 1.52769283e-03, 1.42823692e-03, 1.38003340e-03,
1.17572392e-03, 1.07377463e-03, 9.55152460e-04, 9.00017642e-04,
5.79162563e-04, 3.82793717e-04, 2.38328586e-04, 8.40132221e-05,
5.60545588e-05, 5.48538930e-05, 1.08077650e-05, 4.01354717e-06,
1.23186515e-06, 1.05783059e-06, 6.06659094e-07, 5.86686040e-07,
1.71368535e-33, 7.44075955e-34, 7.44075955e-34, 7.15189459e-34])
##将各个纬度的累计解释度和特征的数量绘制线形图如下:
plt.plot([i for i in range(X_train.shape[1])],
[np.sum(pca.explained_variance_ratio_[:i+1]) for i in range(X_train.shape[1])])
plt.show()其中sklearn已经有一个封装好的通过方差解释度来确定数据特征数量的超参数。以下按95%的解释度来计算特征数量:#其中sklearn中已经有封装一个函数
pca = PCA(0.95)
pca.fit(X_train)
PCA(copy=True, iterated_power='auto', n_components=0.95, random_state=None,
svd_solver='auto', tol=0.0, whiten=False)
#查看选择特征的数量
pca.n_components_
28
X_train_reduction = pca.transform(X_train)
X_test_reduction = pca.transform(X_test)
#查看各个特征的方差解释度:
pca.explained_variance_ratio_
array([0.14566817, 0.13735469, 0.11777729, 0.08499689, 0.0586019 ,
0.05115429, 0.04266053, 0.03601197, 0.03411058, 0.03054078,
0.02423377, 0.02287006, 0.01803046, 0.0179346 , 0.01457983,
0.01420448, 0.0129961 , 0.0126617 , 0.01017286, 0.00909315,
0.0088522 , 0.00773828, 0.00760516, 0.00711865, 0.00685977,
0.00576412, 0.00571688, 0.00508256])
%%time
##由此可看到fit时间明显降低
knn_clf = KNeighborsClassifier()
knn_clf.fit(X_train_reduction,y_train)
Wall time: 8 ms
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=5, p=2,
weights='uniform')
knn_clf.score(X_test_reduction,y_test)
0.98由此可以看到纬度有67减低到28之后,score的降低比较小,但是训练时间降低非常多。pca = PCA(n_components = 2)
pca.fit(X)
X_reduction = pca.transform(X)3. PCA在可视化中的应用虽然降低纬度后会损失信息量,降到二维进行可视化,是一个很好的方法:##将数据集的的前2个特征绘制散点图:
for i in range(10):
plt.scatter(X_reduction[y == i, 0],X_reduction[y==i,1],alpha=0.8)
plt.show()从上图可以看到,对多维度的数据,可以通过降纬后进行可是,让人对很多的数据有一个直观的了解。发布于 2020-01-29 16:38数据降维sklearnPrincipal Component Analysis赞同 212 条评论分享喜欢收藏申请转载文章被以下专栏收录数据分析通过量化和数据改善管
具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法_sklearn pca参数-CSDN博客
>具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法_sklearn pca参数-CSDN博客
具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法
最新推荐文章于 2024-03-08 23:20:52 发布
SGangX
最新推荐文章于 2024-03-08 23:20:52 发布
阅读量2.3w
收藏
215
点赞数
45
分类专栏:
多元统计分析
机器学习
文章标签:
python
机器学习
pca降维
数据分析
统计学
版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/weixin_44781900/article/details/104839136
版权
多元统计分析
同时被 2 个专栏收录
2 篇文章
0 订阅
订阅专栏
机器学习
1 篇文章
0 订阅
订阅专栏
转载请注明出处:https://editor.csdn.net/md?articleId=104839136
文章目录
主成分分析(PCA)Sklearn库中PCA一、参数说明(Parameters)二、属性(Attributes)三、方法(Methods)四、示例(Sample)五、参考资料(Reference data)
主成分分析(PCA)
主成分分析(Principal components analysis,以下简称PCA)的思想是将n维特征映射到k维上(k 这里主要针对用Sklearn库里的PCA,并解释里面的参数、属性、方法。 Sklearn库中PCA 一、参数说明(Parameters) sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False) 1. n_components:int, float, None or str 意义 :代表返回的主成分的个数,也就是你想把数据降到几维 n_components=2 代表返回前2个主成分 0 < n_components < 1代表满足最低的主成分方差累计贡献率 n_components=0.98,指返回满足主成分方差累计贡献率达到98%的主成分 n_components=None,返回所有主成分 n_components=‘mle’,将自动选取主成分个数n,使得满足所要求的方差百分比 2. copy : bool类型, False/True 默认是True 意义:在运行的过程中,是否将原数据复制。由于你在运行的过程中,是在降维,数据会变动。 这copy主要影响的是,调用显示降维后的数据的方法不同。 copy=True时,直接 fit_transform(X),就能够显示出降维后的数据。 copy=False时,需要 fit(X).transform(X) ,才能够显示出降维后的数据。 (fit_transform()方法后面会讲到!) 3. whiten:bool类型,False/True 默认是False 意义:白化。白化是一种重要的预处理过程,其目的就是降低输入数据的冗余性,使得经过白化处理的输入数据具有如下性质:(i)特征之间相关性较低;(ii)所有特征具有相同的方差。 4. svd_solver:str类型,str {‘auto’, ‘full’, ‘arpack’, ‘randomized’} 意义:定奇异值分解 SVD 的方法。 svd_solver=auto:PCA 类自动选择下述三种算法权衡。 svd_solver=‘full’:传统意义上的 SVD,使用了 scipy 库对应的实现。 svd_solver=‘arpack’:直接使用 scipy 库的 sparse SVD 实现,和 randomized 的适用场景类似。 svd_solver=‘randomized’:适用于数据量大,数据维度多同时主成分数目比例又较低的 PCA 降维。 二、属性(Attributes) 1. components_:返回最大方差的主成分。 2. explained_variance_:它代表降维后的各主成分的方差值。方差值越大,则说明越是重要的主成分。 3. explained_variance_ratio_:它代表降维后的各主成分的方差值占总方差值的比例,这个比例越大,则越是重要的主成分。(主成分方差贡献率) 4. singular_values_:返回所被选主成分的奇异值。 实现降维的过程中,有两个方法,一种是用特征值分解,另一种用奇异值分解,前者限制比较多,需要矩阵是方阵,而后者可以是任意矩阵,而且计算量比前者少,所以说一般实现PCA都是用奇异值分解的方式。 5. mean_:每个特征的经验平均值,由训练集估计。 6. n_features_:训练数据中的特征数。 7. n_samples_:训练数据中的样本数量。 8. noise_variance_:噪声协方差 三、方法(Methods) 1. fit(self, X,Y=None) #模型训练,由于PCA是无监督学习,所以Y=None,没有标签。 如: model=decomposition.PCA(n_components=2) model.fit(X) 2. fit_transform(self, X,Y=None)#:将模型与X进行训练,并对X进行降维处理,返回的是降维后的数据。 如: X_new=model.fit_transform(X) 3. get_covariance(self)#获得协方差数据 4. get_params(self,deep=True)#返回模型的参数 如: print(model.get_params()) 输出:{'copy': True, 'iterated_power': 'auto', 'n_components': 3, 'random_state': None, 'svd_solver': 'auto', 'tol': 0.0, 'whiten': False} 5. get_precision(self)#计算数据精度矩阵( 用生成模型) 6. inverse_transform(self, X)#将降维后的数据转换成原始数据,但可能不会完全一样 7. score(self, X, Y=None)#计算所有样本的log似然平均值 8. transform(X)#将数据X转换成降维后的数据。当模型训练好后,对于新输入的数据,都可以用transform方法来降维。 四、示例(Sample) import numpy as np from sklearn import decomposition,datasets iris=datasets.load_iris()#加载数据 X=iris['data'] model=decomposition.PCA(n_components=2) model.fit(X) X_new=model.fit_transform(X) Maxcomponent=model.components_ ratio=model.explained_variance_ratio_ score=model.score(X) print('降维后的数据:',X_new) print('返回具有最大方差的成分:',Maxcomponent) print('保留主成分的方差贡献率:',ratio) print('所有样本的log似然平均值:',score) print('奇异值:',model.singular_values_) print('噪声协方差:',model.noise_variance_) g1=plt.figure(1,figsize=(8,6)) plt.scatter(X_new[:,0],X_new[:,1],c='r',cmap=plt.cm.Set1, edgecolor='k', s=40) plt.xlabel('x1') plt.ylabel('x2') plt.title('After the dimension reduction') plt.show() 五、参考资料(Reference data) 主成分分析(Principal components analysis)-最大方差解释: https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html 主成分分析(Principal components analysis)-最小平方误差解释: https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020216.html 机器学习(七)白化whitening: https://blog.csdn.net/hjimce/article/details/50864602?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task scikit-learn源码之降维–PCA: https://zhuanlan.zhihu.com/p/53268659 Sklearn中的PCA: https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA.set_params 优惠劵 SGangX 关注 关注 45 点赞 踩 215 收藏 觉得还不错? 一键收藏 知道了 7 评论 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 主成分分析(PCA)主成分分析(Principal components analysis,以下简称PCA)的思想是将n维特征映射到k维上(k<n),这k维是全新的正交特征(新的坐标系)。这k维特征称为主元,是重新构造出来的k维特征,而不是简单地从n维特征中去除其余n-k维特征。实现这思想的方法就是降维,用低维的数据去代表高维的数据,也就是用少数几个变量代替原有的数目庞大的变量,把重复的信... 复制链接 扫一扫 专栏目录 PCA.rar_DEMO_PCA VS_VS PCA_VS下主成分分析PCA使用方法_主成分分析 09-21 一个主成分分析算法的典型使用demo,在vs平台下实现 主成分分析(PCA):主成分分析(PCA)-matlab开发 05-31 对实值数据进行主成分分析 (PCA)。 有两种方法可用:'eig' 和 'svd',它们分别通过特征值分解和奇异值分解来解决问题。 请注意“svd”在“经济”模式下运行。 7 条评论 您还未登录,请先 登录 后发表或查看评论 使用Sklearn学习降维算法PCA和SVD 理科男同学 10-24 4811 1,概述 1.1,什么是维度? 我们先来解释一下维度的概念。 对于数组和Series来说,维度就是功能shape返回的结果,shape中返回了几个数字,就是几维。索引以外的数据,不分行列的叫一维(此时shape返回唯一的维度上的数据个数),有行列之分叫二维(shape返回行x列),也称为表。一张表最多二维,复数的表构成了更高的维度。当一个数组中存在2张3行4列的表时,shape返回的是(更高维,行,列)。当数组中存在2组2张3行4列的表时,数据就是4维,shape返回(2,2,3,4)。 数组中的每 sklearn.decomposition.PCA 参数速查手册 02-10 1801 sklearn.decomposition.PCA 参数速查手册 调用 sklearn.decomposition.PCA(ncomponents=None, copy=True, whiten=False, svdsolver='auto', tol=0.0, iteratedpower='auto', randomstate=None)参数n_components 释义 PCA 算法中所要保... sklearn中主成分分析PCA参数解释 知识搬运者 08-20 737 【代码】sklearn中主成分分析PCA参数解释。 scikit-learn中PCA的使用方法 热门推荐 wepon的专栏 12-27 16万+ scikit-learn中PCA的使用方法 @author:wepon @blog:http://blog.csdn.net/u012162613/article/details/42192293 在前一篇文章 主成分分析(PCA) 中,我基于python和numpy实现了PCA算法,主要是为了加深对算法的理解,算法的实现很粗糙,实际应用中我们一般调用成熟的包,本文就结束 主成分分析(PCA)简介及sklearn参数 lty_sky的专栏 04-05 1891 1.PCA简介 PCA作为降维最重要的方法之一,在数据压缩消除冗余和数据噪音消除等领域都有广泛的应用。PCA的思想就是将高维数据投影到低维,一般基于两个标准选择投影方向: 基于最小投影距离 样本点到投影超平面的距离足够近 基于最大投影方差 样本点投影在超平面上的方差足够大,能够竟可能的分开,即方差最大方向的分解 ps:什么情况下需要进行降维? 数据集特征数... PCA(主成分分析)函数使用方法及参数详解 AI_dataloads的博客 10-08 1846 PCA 的目标是通过线性变换将原始数据投影到一个新的低维空间,使得投影数据的方差最大化。通过降维,可以减少数据的复杂性,简化模型的计算,同时尽可能保留原始数据的信息。 Python机器学习笔记 使用scikit-learn工具进行PCA降维 weixin_30347335的博客 04-04 2480 之前总结过关于PCA的知识:深入学习主成分分析(PCA)算法原理。这里打算再写一篇笔记,总结一下如何使用scikit-learn工具来进行PCA降维。
在数据处理中,经常会遇到特征维度比样本数量多得多的情况,如果拿到实际工程中去跑,效果不一定好。一是因为冗余的特征会带来一些噪音,影响计算的结果;二是因为无关的特征会加大计算量,耗费时间和资源。所以我们通常会对数据重新变换一下,再跑模型。数... sklearn.decomposition.PCA介绍 foxchopin的博客 09-22 1万+ sklearn.decomposition.PCA介绍
下面我们主要基于sklearn.decomposition.PCA来讲解如何使用scikit-learn进行PCA降维。PCA类基本不需要调参,一般来说,我们只需要指定我们需要降维到的维度,或者我们希望降维后的主成分的方差和占原始维度所有特征方差和的比例阈值就可以了。
现在我们对sklearn.decomposit 机器学习算法与Python实践(12) - sklearn中的 PCA 的使用 _提拉米苏的博客 01-03 2372 机器学习算法与Python实践(12) - sklearn中的 PCA 的使用 主成分分析_主成分分析_主成分分析PCA的matlab实现_ 09-30 主成分分析的3套不同方法,每一行均有中文注释 python主成分分析PCA完整代码以及结果图片 03-13 python主成分分析PCA完整代码以及结果图片 svd算法matlab代码-pca:主成分分析 05-27 svd算法matlab代码 PCA Matlab PCA algorithm sample code using SVD MATLAB_PCA.rar_pca_主成分_主成分PCA分析_主成分分析_主成分分析pca 07-14 利用MATLAB进行主成分分析代码,利用MATLAB自带主成分分析函数进行主成分分析,只有湘西注释。 深入浅出机器学习算法:主成分分析 02-24 当两个特征包含几乎一模一样的信息时,其中一个特征往往是可以剔除的(比如温度和体感温度变量)主成分分析PrincipalComponentAnalysis,PCA是最常用的降维方法之一,它可以尽可能提取众多维度中的有效信息,降低数据... matlab 实现主成分分析(PCA) 06-04 基于matlab实现PCA降维算法,可用于多维数据的损失最小化压缩,内附全代码 python基础9_序列类型 最新发布 qq_54503901的博客 03-08 281 我们可以发现字符串里面可以有多个字符, 同时每个字符呢,都有一个下标,什么是下标呢,就是每个字符的 编号,就跟我们上学时,是不是每个学生都有一个学号,通过学号,我就能找到你这个人。值得注意的是有一种情况叫做下标越界,什么叫做下标越界呢,简单来说就是我序列里面的下标只有10个下标,但是你获取值是通过下标12, 15等等这些没有的下标去获取,你可以这样理解, 你手里有一把手术刀,你用你魔法般的手法,把有序序列这个数据,想怎么切成几段,就切成几段,然后从中获取自己想要的值。元组也是属于有序的序列类型。 pca主成分分析 python 09-08 PCA(Principal Component Analysis)是一种常用的降维算法,可以用于对数据进行特征提取和数据可视化。下面是使用Python进行PCA主成分分析的步骤: 1. 首先,需要导入PCA模块,可以使用sklearn库中的PCA类来实现。具体的导入方式如下: ```python from sklearn.decomposition import PCA ``` 2. 接下来,需要准备数据并进行标准化处理。标准化数据是为了保证数据的均值为0,方差为1,使得不同维度的特征具有相同的重要性。可以使用sklearn库中的StandardScaler类来进行标准化处理。具体的代码如下: ```python from sklearn.preprocessing import StandardScaler # 假设数据集存储在X变量中 scaler = StandardScaler() X_scaled = scaler.fit_transform(X) ``` 3. 然后,可以创建PCA对象,并调用其fit_transform方法对数据进行降维。在创建PCA对象时,可以指定主成分的数量(n_components参数),也可以根据样本特征方差来自动确定降维后的维度数(n_components=None)。具体的代码如下: ```python pca = PCA(n_components=2) # 指定降维后的特征维度数目为2 X_pca = pca.fit_transform(X_scaled) ``` 4. 最后,可以通过访问PCA对象的属性来获取降维后的特征向量和解释方差比。具体的代码如下: ```python # 获取降维后的特征向量 components = pca.components_ # 获取解释方差比 explained_variance_ratio = pca.explained_variance_ratio_ ``` 以上就是使用Python进行PCA主成分分析的基本步骤和代码示例。通过PCA降维,可以将高维数据映射到低维空间,以达到降低数据维度和减少冗余信息的目的。 “相关推荐”对你有帮助么? 非常没帮助 没帮助 一般 有帮助 非常有帮助 提交 SGangX CSDN认证博客专家 CSDN认证企业博客 码龄5年 暂无认证 3 原创 111万+ 周排名 83万+ 总排名 2万+ 访问 等级 199 积分 2 粉丝 48 获赞 7 评论 222 收藏 私信 关注 热门文章 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 23887 (基础准备)多元相关与回归分析——一元线性相关与回归分析(一) python+numpy库 实现 799 机器学习笔记:python中scikit-learn库的K-Means类 621 分类专栏 无监督学习 1篇 聚类 1篇 机器学习 1篇 多元统计分析 2篇 最新评论 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 dengdengwo_123: 并不是在原来的feature中选取,而是重新得到了新的feature 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 fendon@l: 可以算一下所有特征方差,对比一下就知道选出来是哪几个特征了 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 Dream seeker_z: 主成分降维出来不会保留成分的标签或列名 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 weixin_44995786: 你好,请问解决了吗 具体介绍sklearn库中:主成分分析(PCA)的参数、属性、方法 9呀: 我也找半天,请问你解决了吗 您愿意向朋友推荐“博客详情页”吗? 强烈不推荐 不推荐 一般般 推荐 强烈推荐 提交 最新文章 机器学习笔记:python中scikit-learn库的K-Means类 (基础准备)多元相关与回归分析——一元线性相关与回归分析(一) python+numpy库 实现 2020年3篇 目录 目录 分类专栏 无监督学习 1篇 聚类 1篇 机器学习 1篇 多元统计分析 2篇 目录 评论 7 被折叠的 条评论 为什么被折叠? 到【灌水乐园】发言 查看更多评论 添加红包 祝福语 请填写红包祝福语或标题 红包数量 个 红包个数最小为10个 红包总金额 元 红包金额最低5元 余额支付 当前余额3.43元 前往充值 > 需支付:10.00元 取消 确定 下一步 知道了 成就一亿技术人! 领取后你会自动成为博主和红包主的粉丝 规则 hope_wisdom 发出的红包 实付元 使用余额支付 点击重新获取 扫码支付 钱包余额 0 抵扣说明: 1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。 余额充值 用Python (scikit-learn) 做PCA分析 - 知乎首发于数据应用学院切换模式写文章登录/注册用Python (scikit-learn) 做PCA分析数据应用学院原始图像(左)保留不同数量的方差我的上一个教程讨论了使用Python的逻辑回归(https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a)。我们学到的一件事是,你可以通过改变优化算法来加速机器学习算法的拟合。加速机器学习算法的一种更常见的方法是使用主成分分析 Principal Component Analysis (PCA)。如果你的学习算法太慢,因为输入维数太高,那么使用PCA来加速是一个合理的选择。这可能是PCA最常见的应用。PCA的另一个常见应用是数据可视化。为了理解使用PCA进行数据可视化的价值,本教程的第一部分介绍了应用PCA后对IRIS数据集的基本可视化。第二部分使用PCA来加速MNIST数据集上的机器学习算法(逻辑回归)。现在,让我们开始吧!本教程中使用的代码如下所示:“PCA的数据可视化的应用https://http://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Data_Visualization_Iris_Dataset_Blog.ipynb用PCA来加速机器学习的计算https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_to_Speed-up_Machine_Learning_Algorithms.ipynb”PCA在数据可视化的应用对于许多机器学习应用程序来说,能够可视化你的数据是很有帮助的。将2维或3维数据可视化并不那么困难。然而,即使在本教程的这一部分中使用的Iris数据集也是四维的。你可以使用主成分分析将四维数据减少到2维或3维这样你就能更好地绘制并理解数据。加载Iris数据集Iris数据集是scikit-learn附带的数据集之一,不需要从外部网站下载任何文件。下面的代码将加载iris数据集。import pandas as pdurl = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"# load dataset into Pandas DataFramedf = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target'])原版pandas df(特征+目标)标准化数据主成分分析受尺度影响,所以在应用主成分分析之前,需要对数据中的特征进行尺度分析。使用StandardScaler帮助你将数据集的特性标准化到单元尺度(均值= 0,方差= 1),这是许多机器学习算法实现最佳性能的要求。如果你想看到不缩放数据可能带来的负面影响,scikit-learn有一节是讲关于不标准化数据的影响的(https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py)。from sklearn.preprocessingimport StandardScalerfeatures = ['sepal length', 'sepal width', 'petal length', 'petal width']# Separating out the featuresx = df.loc[:, features].values# Separating out the targety = df.loc[:,['target']].values# Standardizing the featuresx = StandardScaler().fit_transform(x)标准化前后的数组x(由panda dataframe显示)主成分分析二维“投影”原始数据有4列(萼片长度、萼片宽度、花瓣长度和花瓣宽度)。在本节中,代码将原来的4维数据“投影”到2维中。我要指出的是,在降维之后,通常不会给每个主成分赋予一个特定的意义。新的成分只是变化的两个主要维度。from sklearn.decompositionimport PCApca = PCA(n_components=2)principalComponents = pca.fit_transform(x)principalDf = pd.DataFrame(data = principalComponents , columns = ['principal component 1', 'principal component 2'])PCA和保留前2个主成分finalDf = pd.concat([principalDf, df[['target']]], axis = 1)通过设置axis=1连接dataframe。finalDf是绘制数据之前的最后一个dataframe。 可视化二维”投影”这部分只是绘制二维数据。请注意下面的图表,这些类似乎彼此分离得很好。fig = plt.figure(figsize = (8,8))ax = fig.add_subplot(1,1,1) ax.set_xlabel('Principal Component 1', fontsize = 15)ax.set_ylabel('Principal Component 2', fontsize = 15)ax.set_title('2 component PCA', fontsize = 20)targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']colors = ['r', 'g', 'b']for target, color in zip(targets,colors): indicesToKeep = finalDf['target'] == target ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1'] , finalDf.loc[indicesToKeep, 'principal component 2'] , c = color , s = 50)ax.legend(targets)ax.grid()2分量主成分图被解释的方差被解释的方差告诉你有多少信息(方差)可以归因于每个主成分。这很重要,因为当你把四维空间转换成二维空间时,你会丢失一些方差(信息)。通过使用属性explained_variance_ratio_,你可以看到第一个主成分包含了72.77%的方差,第二个主成分包含了23.03%的方差。这两个部分总共包含了95.80%的信息。pca.explained_variance_ratio_PCA加速机器学习算法PCA最重要的应用之一是加速机器学习算法。在这里使用IRIS数据集是不切实际的,因为该数据集只有150行和4个特征列。MNIST手写数字数据库更合适,因为它有784个特征列(784个维度)、一组包含60,000个示例的训练集和一组包含10,000个示例的测试集。下载并加载数据还可以向fetch_mldata添加data_home参数,以更改下载数据的位置。from sklearn.datasets import fetch_openmlmnist = fetch_openml('mnist_784')你下载的图像包含在MNIST中。数据和形状(70000, 784)意味着有70000张具有784个维度(784个特征)的图像。标签(整数0-9)包含在mnist.target中。功能是784维(28 x 28图像)和标签只是从0到9的数字。将数据分解为训练集和测试集一般来说,训练测试分为80%的训练和20%的测试。在这个例子中,我选择了6/7的数据作为训练,1/7的数据作为测试集。from sklearn.model_selection import train_test_split# test_size: what proportion of original data is used for test settrain_img, test_img, train_lbl, test_lbl = train_test_split( mnist.data, mnist.target, test_size=1/7.0, random_state=0)标准化数据这一段的文字几乎完全是早先所写内容的翻版。主成分分析受尺度影响,因此在应用主成分分析之前,需要对数据中的特征进行尺度分析。你可以将数据转换到单位尺度(均值= 0和方差= 1),这是许多机器学习算法的最优性能的要求。StandardScaler帮助标准化数据集的特性。注意,你适合于训练集,并在训练和测试集上进行转换。如果你想了解不缩放数据可能带来的负面影响,scikit-learn有一节介绍不标准化数据的影响(https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py)。from sklearn.preprocessing import StandardScalerscaler = StandardScaler()# Fit on training set only.scaler.fit(train_img)# Apply transform to both the training set and the test set.train_img = scaler.transform(train_img)test_img = scaler.transform(test_img)导入并应用PCA注意,下面的代码使用.95作为成分数量参数。这意味着scikit-learn选择主成分的最小数量,这样95%的方差被保留。from sklearn.decomposition import PCA# Make an instance of the Modelpca = PCA(.95)在训练集中安装主成分分析。注意:你只在训练集中安装主成分分析。pca.fit(train_img)注意:通过使用pca.n_components_对模型进行拟合,可以知道PCA选择了多少个成分。在这种情况下,95%的方差相当于330个主成分。将“映射”(转换)应用到训练集和测试集。train_img = pca.transform(train_img)test_img = pca.transform(test_img)对转换后的数据应用逻辑回归步骤1:导入你想要使用的模型在sklearn中,所有的机器学习模型都被用作Python class。from sklearn.linear_model import LogisticRegression步骤2:创建模型的实例。#未指定的所有参数都设置为默认值#默认解算器非常慢,这就是为什么它被改为“lbfgs”logisticRegr = LogisticRegression(solver = 'lbfgs')步骤3:在数据上训练模型,存储从数据中学习到的信息模型学习的是数字和标签之间的关系logisticRegr.fit(train_img, train_lbl)步骤4:预测新数据(新图像)的标签使用模型在模型训练过程中学习到的信息下面的代码预测了一个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0].reshape(1,-1))下面的代码一次预测了多个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0:10])测量模型的性能虽然准确度并不总是机器学习算法的最佳度量标准(精度、回忆、F1分数、ROC曲线等会更好(https://http://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0 ),但这里使用它是为了简单。logisticRegr.score(test_img, test_lbl)主成分分析后拟合逻辑回归的时间本节教程的全部目的是向你展示可以使用PCA来加速机器学习算法的拟合。下表显示了在我的MacBook上使用PCA(每次保留不同数量的方差)后进行logistic回归所花费的时间)。主成分分析后的逻辑回归的拟合时间,保留不同方差分量压缩后的图像重建本教程前面的部分演示了如何使用PCA将高维数据压缩为低维数据。我想简要地提一下,PCA还可以将数据的压缩重建(低维数据)还原为原始高维数据的近似形式。如果你对生成下图的代码感兴趣,请查看我的github(https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Image_Reconstruction_and_such.ipynb)。主成分分析后的原始图像(左)和原始数据的近似(右) 总结思想这篇文章我本来可以写得更长一些,因为PCA有很多不同的用途。我希望这篇文章能对你有所帮助。我的下一个机器学习教程将介绍如何理解用于分类的决策树(https://towardsdatascience.com/understanding-decision-trees-for-classification-python-9663d683c952)。更多高质量科技类原创文章,请访问数据应用学院官网Blog:https://www.dataapplab.com/参加数据应用学院线上免费公开课:https://www.dataapplab.com/event/查看数据应用学院往期课程视频:https://www.youtube.com/channel/UCa8NLpvi70mHVsW4J_x9OeQ发布于 2020-06-03 18:08sklearnPrincipal Component AnalysisPython赞同 121 条评论分享喜欢收藏申请转载文章被以下专栏收录数据应用学院北美首屈一指的Data Bootcamp (www.DataAppLab.c 【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客 【python】sklearn中PCA的使用方法 我从崖边跌落 已于 2022-04-14 14:55:00 修改 阅读量10w+ 收藏 735 点赞数 158 分类专栏: python编程 算法 文章标签: python 于 2019-07-09 23:01:53 首次发布 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。 本文链接:https://blog.csdn.net/qq_20135597/article/details/95247381 版权 python编程 同时被 2 个专栏收录 11 篇文章 2 订阅 订阅专栏 算法 4 篇文章 0 订阅 订阅专栏 from sklearn.decomposition import PCA PCA 主成分分析(Principal Components Analysis),简称PCA,是一种数据降维技术,用于数据预处理。 PCA的一般步骤是:先对原始数据零均值化,然后求协方差矩阵,接着对协方差矩阵求特征向量和特征值,这些特征向量组成了新的特征空间。 sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False) 参数: n_components: 意义:PCA算法中所要保留的主成分个数n,也即保留下来的特征个数n 类型:int 或者 string,缺省时默认为None,所有成分被保留。 赋值为int,比如n_components=1,将把原始数据降到一个维度。 赋值为string,比如n_components='mle',将自动选取特征个数n,使得满足所要求的方差百分比。 copy: 类型:bool,True或者False,缺省时默认为True。 意义:表示是否在运行算法时,将原始训练数据复制一份。若为True,则运行PCA算法后,原始训练数据的值不 会有任何改变,因为是在原始数据的副本上进行运算;若为False,则运行PCA算法后,原始训练数据的 值会改,因为是在原始数据上进行降维计算。 whiten: 类型:bool,缺省时默认为False 意义:白化,使得每个特征具有相同的方差。 PCA属性: components_ :返回具有最大方差的成分。explained_variance_ratio_:返回 所保留的n个成分各自的方差百分比。n_components_:返回所保留的成分个数n。mean_:noise_variance_: PCA方法: 1、fit(X,y=None) fit(X),表示用数据X来训练PCA模型。 函数返回值:调用fit方法的对象本身。比如pca.fit(X),表示用X对pca这个对象进行训练。 拓展:fit()可以说是scikit-learn中通用的方法,每个需要训练的算法都会有fit()方法,它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法,此处y自然等于None。 2、fit_transform(X) 用X来训练PCA模型,同时返回降维后的数据。 newX=pca.fit_transform(X),newX就是降维后的数据。 3、inverse_transform() 将降维后的数据转换成原始数据,X=pca.inverse_transform(newX) 4、transform(X) 将数据X转换成降维后的数据。当模型训练好后,对于新输入的数据,都可以用transform方法来降维。 此外,还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法,以后用到再补充吧。 实例: import numpy as np from sklearn.decomposition import PCA X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) pca = PCA(n_components=2) newX = pca.fit_transform(X) #等价于pca.fit(X) pca.transform(X) invX = pca.inverse_transform(newX) #将降维后的数据转换成原始数据 print(X) [[-1 -1] [-2 -1] [-3 -2] [ 1 1] [ 2 1] [ 3 2]] print(newX) array([[ 1.38340578, 0.2935787 ], [ 2.22189802, -0.25133484], [ 3.6053038 , 0.04224385], [-1.38340578, -0.2935787 ], [-2.22189802, 0.25133484], [-3.6053038 , -0.04224385]]) print(invX) [[-1 -1] [-2 -1] [-3 -2] [ 1 1] [ 2 1] [ 3 2]] print(pca.explained_variance_ratio_) [ 0.99244289 0.00755711] 我们所训练的pca对象的n_components值为2,即保留2个特征,第一个特征占所有特征的方差百分比为0.99244289,意味着几乎保留了所有的信息。即第一个特征可以99.24%表达整个数据集,因此我们可以降到1维: pca = PCA(n_components=1) newX = pca.fit_transform(X) print(pca.explained_variance_ratio_) [ 0.99244289] 优惠劵 我从崖边跌落 关注 关注 158 点赞 踩 735 收藏 觉得还不错? 一键收藏 知道了 19 评论 【python】sklearn中PCA的使用方法 from sklearn.decomposition import PCAPCA主成分分析(Principal Components Analysis),简称PCA,是一种数据降维技术,用于数据预处理。PCA的一般步骤是:先对原始数据零均值化,然后求协方差矩阵,接着对协方差矩阵求特征向量和特征值,这些特征向量组成了新的特征空间。sklearn.decomposition.PC... 复制链接 扫一扫 专栏目录 机器学习代码实战——PCA(主成分分析) 12-22 文章目录1.主成分分析基本概念2.代码 1.主成分分析基本概念 2.代码 导入必要的库 import pandas as pd import numpy as np from sklearn.datasets import load_iris #sklearn中导入load_iris数据 import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler %matplotlib inline 加载数据,构建DataFrame iris = load_iris() #加载数据集 df = Python sklearn库实现PCA教程(以鸢尾花分类为例) 09-17 今天小编就为大家分享一篇Python sklearn库实现PCA教程(以鸢尾花分类为例),具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧 19 条评论 您还未登录,请先 登录 后发表或查看评论 [Python] 什么是PCA降维技术以及scikit-learn中PCA类使用案例(图文教程,含详细代码) 最新发布 老狼工作室的博客 01-30 1324 本文主要介绍了什么是PCA降维技术以及scikit-learn中PCA函数及其使用案例(含详细代码和图文)。 pca降维-python 04-28 pca降维 注解: 首先导入NumPy库和sklearn中的PCA模型; 输入原始数据矩阵X,其中有4个样本,每个样本有2个特征; 使用PCA()创建一个PCA对象,并指定降维后的维数为1; 使用fit_transform()方法对输入数据进行降维,得到降维后的数据; 使用shape属性输出降维前后数据的形状; 使用print()输出降维后的数据。 在实际使用PCA进行降维时,需要根据具体问题选择合适的降维维数,一般可通过累计贡献率、特征值等方法进行选择。另外,PCA降维仅适用于服从正态分布的数据特征降维,对于非正态分布、离散型特征的数据降维效果会受到一定的影响,需要对数据进行合适的处理和转换。 用Python的sklearn库进行PCA(主成分分析) 热门推荐 puredreammer的博客 08-20 14万+ 在python的sklearn的库里面集成很多机器学习算法的库,其中也包括主成分分析的方法。 接下来讲讲怎么在python里面使用pca算法 首先要导入库: from sklearn.decomposition import PCA 下面是官网上的例子: >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3 Python-sklearn之PCA主成分分析 m0_43405302的博客 03-19 7232 文章目录写在前面一、PCA主成分分析1、主成分分析步骤2、主成分分析的主要作二、Python使用PCA主成分分析 写在前面 作为大数据开发人员,我们经常会收到一些数据分析工程师给我们的指标,我们基于这些指标进行数据提取。其中数据分析工程师最主要的一个特征提取方式就是PCA主成分分析,下面我将介绍Python的sklearn库中是如何实现PCA算法及其使用。 一、PCA主成分分析 什么是PCA主成分分析。百度百科给出如下定义: 1、主成分分析步骤 对于一个PCA主成分分析,一般分为以下几个步骤: 去除平均 sklearn学习06——PCA qq_42929168的博客 12-30 5570 sklearn学习06——PCA前言一、PCA的核心思想1.1、PCA的原理1.2、PCA的大致流程1.3、样本信息量的衡量二、sklearn实现PCA过程2.1、引入相关库2.2、利用PCA降维2.3、不同主成分个数对应的可解释方差分析(Explained Variance)总结 前言 主成分分析(principal component analysis)是一种常见的数据降维方法,其目的是在“信息”损失较小的前提下,将高维的数据转换到低维,从而减小计算量。本篇简单介绍PCA的思想,然后继续使用skle 使用Sklearn学习降维算法PCA和SVD 理科男同学 10-24 4811 1,概述 1.1,什么是维度? 我们先来解释一下维度的概念。 对于数组和Series来说,维度就是功能shape返回的结果,shape中返回了几个数字,就是几维。索引以外的数据,不分行列的叫一维(此时shape返回唯一的维度上的数据个数),有行列之分叫二维(shape返回行x列),也称为表。一张表最多二维,复数的表构成了更高的维度。当一个数组中存在2张3行4列的表时,shape返回的是(更高维,行,列)。当数组中存在2组2张3行4列的表时,数据就是4维,shape返回(2,2,3,4)。 数组中的每 用sklearn进行PCA降维——基于python语言 weixin_46838605的博客 03-22 1万+ 1. sklearn的PCA类 在sklearn中,与PCA相关的类都在sklearn.decomposition包中,主要有: sklearn.decomposition.PCA 最常用的PCA类,接下来会在2中详细讲解。 KernelPCA类,主要用于非线性数据的降维,需要用到核技巧。因此在使用的时候需要选择合适的核函数并对核函数的参数进行调参。 IncrementalPCA类,主要解决单机内存限制。有时候样本量可能是上百万,维度可能也是上千,直接拟合数据可能会让内存爆掉, 此时可以用Incremen sklearn的系统学习——PCA降维(案例及完整python代码) weixin_44904136的博客 08-10 5700 在实际操作数据集的过程中,难免会遇到很多高维特征的数据,计算量也会暴增。为了在减少计算量的同时提高算法准确率,可尝试降维。下文将列出三个案例,分别是人脸识别、降噪和处理手写数据集。,即主成分分析方法,是一种使用最广泛的数据。... python_主成分分析(PCA)降维 huizxhhui1994的博客 01-22 2万+ 主成分分析(principal component analysis)是一种常见的数据降维方法,其目的是在“信息”损失较小的前提下,将高维的数据转换到低维,从而减小计算量。 PCA的本质就是找一些投影方向,使得数据在这些投影方向上的方差最大,而且这些投影方向是相互正交的。这其实就是找新的正交基的过程,计算原始数据在这些正交基上投影的方差,方差越大,就说明在对应正交基上包含了更多的信息量。 对python sklearn one-hot编码详解 12-25 使用one-hot编码,将离散特征的取值扩展到了欧式空间,离散特征的某个取值就对应欧式空间的某个点 将离散特征通过one-hot编码映射到欧式空间,是因为,在回归,分类,聚类等机器学习算法中,特征之间距离的计算或... Python PCA遥感影像变化检测算法代码 04-12 (1)基于Python sklearn与opencv实现的利用PCA方式的两期影像变化检测算法。 (2)支持大影像,并可以将变化图斑转成矢量。 (3)并基于图像处理的方式滤除一些面积过小(或者长宽比过大的区域)的图斑,这里可以... sklearn实现pca qq_48566899的博客 06-16 1238 导入 from sklearn.decomposition import PCA n_components希望保留的主成分数量,如果为none,则所有主成分均被保留,为mle时自动选择组价数量 whiten白化,即对数据进行归一化,让期望为0,方差为1。whiten=false即 标准化改转变会损失部分方差信息,但有时候会是的后续的建模效果有所改进 白化就是对降维后的数据的每个特征进行归一化 from sklearn.datasets import load_wine wine=load_wine() x 【sklearn】sklearn.decomposition.PCA 烦成航的博客 07-23 710 sklearn.decomposition.PCA PCA()使用 累积可解释方差贡献率曲线 n_components='mle' 自选 按信息量占比选超参数 SVD Python机器学习笔记 使用scikit-learn工具进行PCA降维 weixin_30347335的博客 04-04 2480 之前总结过关于PCA的知识:深入学习主成分分析(PCA)算法原理。这里打算再写一篇笔记,总结一下如何使用scikit-learn工具来进行PCA降维。
在数据处理中,经常会遇到特征维度比样本数量多得多的情况,如果拿到实际工程中去跑,效果不一定好。一是因为冗余的特征会带来一些噪音,影响计算的结果;二是因为无关的特征会加大计算量,耗费时间和资源。所以我们通常会对数据重新变换一下,再跑模型。数... 【educoder 机器学习】PCA weixin_44787324的博客 10-30 2418 PCA ( principal components analysis )即主成分分析,是一种使用最广泛的数据降维算法。 PCA 的主要思想是将n维特征映射到k维上,这k维是全新的正交特征也被称为主成分,是在原有n维特征的基础上重新构造出来的k维特征。 本实训项目的主要内容是基于 python 语言实现 PCA 算法,并熟悉 sklearn 中提供的 PCA 接口来对数据进行降维。 sklearn基础篇(九)-- 主成分分析(PCA) CarpeDiem 12-15 2028 PCA(Principal Component Analysis) 是一种常见的数据分析方式,常用于高维数据的降维,可用于提取数据的主要特征分量。PCA 的数学推导可以从最大可分型和最近重构性两方面进行,前者的优化条件为划分后方差最大,后者的优化条件为点到划分平面距离最小,这里我原理出发,介绍算法流程和sklearn实现。 python sklearn pca 03-16 Python中的sklearn库中提供了PCA(Principal Component Analysis)算法,用于降维。PCA是一种常用的数据降维方法,可以将高维数据降到低维,从而减少数据的维度,提高数据的可视化和处理效率。在sklearn中,PCA算法可以通过调用sklearn.decomposition.PCA类来实现。该类提供了fit()、transform()和fit_transform()等方法,可以对数据进行拟合、转换和拟合转换等操作。 “相关推荐”对你有帮助么? 非常没帮助 没帮助 一般 有帮助 非常有帮助 提交 我从崖边跌落 CSDN认证博客专家 CSDN认证企业博客 码龄10年 暂无认证 54 原创 6万+ 周排名 3万+ 总排名 42万+ 访问 等级 2684 积分 61 粉丝 346 获赞 45 评论 1555 收藏 私信 关注 热门文章 【python】sklearn中PCA的使用方法 109760 【python】二维数组按照某行或某列排序(numpy lexsort) 63661 【python】用 np.loadtxt() 读取txt文件 52143 【python】从数组随机取数据 27181 【python】scipy中pdist和squareform 26017 分类专栏 数据仓库 大数据实战 1篇 编程-剑指offer 10篇 强化学习 2篇 编程 6篇 笔试编程题 4篇 python编程 11篇 神经网络 2篇 论文笔记 7篇 图书资料 1篇 LeetCode 3篇 算法 4篇 学习 3篇 TensorFlow 14篇 深度学习 最优化 最新评论 【python】sklearn中PCA的使用方法 Nickee-Lin: 看了那么多PCA,终于在这里有进展了,别的贴基本都是泛泛而谈原理 【论文笔记】Neural Machine Translation by Jointly Learning to Align and Translate Mike_0728: 你这篇笔记除了抄paper没有什么贡献 【论文笔记】Neural Machine Translation by Jointly Learning to Align and Translate hmnnn: 请问隐藏状态到底是St还是Ht啊? 【python】Python中glob.glob按照阿拉伯数字排序问题 crazy_runnin_ant: 请问每次调用的时候,他自动排的顺序会变吗?? 【tensorflow】batch_normalization zjsfgmxm233: 你好,请问网络中有BN层,按文章这样做训练的设置,但使用tf.compat.v1.train.import_meta_graph加载训练好的模型和参数,怎么让推理(使用模型)时tf.layers.batch_normalization()的training=False呢,还是必须要用tf.train.Saver(),可否赐教,谢谢! 您愿意向朋友推荐“博客详情页”吗? 强烈不推荐 不推荐 一般般 推荐 强烈推荐 提交 最新文章 count(*)、count(1)和count(列名)的区别 【CDH搭建问题】unable to retrieve remote parcel repository manifest 【TensorFlow】GPU服务器上Anaconda的配置问题 2023年1篇 2022年1篇 2019年29篇 2018年28篇 目录 目录 分类专栏 数据仓库 大数据实战 1篇 编程-剑指offer 10篇 强化学习 2篇 编程 6篇 笔试编程题 4篇 python编程 11篇 神经网络 2篇 论文笔记 7篇 图书资料 1篇 LeetCode 3篇 算法 4篇 学习 3篇 TensorFlow 14篇 深度学习 最优化 目录 评论 19 被折叠的 条评论 为什么被折叠? 到【灌水乐园】发言 查看更多评论 添加红包 祝福语 请填写红包祝福语或标题 红包数量 个 红包个数最小为10个 红包总金额 元 红包金额最低5元 余额支付 当前余额3.43元 前往充值 > 需支付:10.00元 取消 确定 下一步 知道了 成就一亿技术人! 领取后你会自动成为博主和红包主的粉丝 规则 hope_wisdom 发出的红包 实付元 使用余额支付 点击重新获取 扫码支付 钱包余额 0 抵扣说明: 1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。 2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。 余额充值 2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation Install User Guide API Examples Community Getting Started Tutorial What's new Glossary Development FAQ Support Related packages Roadmap Governance About us GitHub Other Versions and Download More Getting Started Tutorial What's new Glossary Development FAQ Support Related packages Roadmap Governance About us GitHub Other Versions and Download Toggle Menu PrevUp Next scikit-learn 1.4.1 Other versions Please cite us if you use the software. 2.5. Decomposing signals in components (matrix factorization problems) 2.5.1. Principal component analysis (PCA) 2.5.1.1. Exact PCA and probabilistic interpretation 2.5.1.2. Incremental PCA 2.5.1.3. PCA using randomized SVD 2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA) 2.5.2. Kernel Principal Component Analysis (kPCA) 2.5.2.1. Exact Kernel PCA 2.5.2.2. Choice of solver for Kernel PCA 2.5.3. Truncated singular value decomposition and latent semantic analysis 2.5.4. Dictionary Learning 2.5.4.1. Sparse coding with a precomputed dictionary 2.5.4.2. Generic dictionary learning 2.5.4.3. Mini-batch dictionary learning 2.5.5. Factor Analysis 2.5.6. Independent component analysis (ICA) 2.5.7. Non-negative matrix factorization (NMF or NNMF) 2.5.7.1. NMF with the Frobenius norm 2.5.7.2. NMF with a beta-divergence 2.5.7.3. Mini-batch Non Negative Matrix Factorization 2.5.8. Latent Dirichlet Allocation (LDA) 2.5. Decomposing signals in components (matrix factorization problems)¶ 2.5.1. Principal component analysis (PCA)¶ 2.5.1.1. Exact PCA and probabilistic interpretation¶ PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of the variance. In scikit-learn, PCA is implemented as a transformer object that learns \(n\) components in its fit method, and can be used on new data to project it on these components. PCA centers but does not scale the input data for each feature before applying the SVD. The optional parameter whiten=True makes it possible to project the data onto the singular space while scaling each component to unit variance. This is often useful if the models down-stream make strong assumptions on the isotropy of the signal: this is for example the case for Support Vector Machines with the RBF kernel and the K-Means clustering algorithm. Below is an example of the iris dataset, which is comprised of 4 features, projected on the 2 dimensions that explain most variance: The PCA object also provides a probabilistic interpretation of the PCA that can give a likelihood of data based on the amount of variance it explains. As such it implements a score method that can be used in cross-validation: Examples: PCA example with Iris Data-set Comparison of LDA and PCA 2D projection of Iris dataset Model selection with Probabilistic PCA and Factor Analysis (FA) 2.5.1.2. Incremental PCA¶ The PCA object is very useful, but has certain limitations for large datasets. The biggest limitation is that PCA only supports batch processing, which means all of the data to be processed must fit in main memory. The IncrementalPCA object uses a different form of processing and allows for partial computations which almost exactly match the results of PCA while processing the data in a minibatch fashion. IncrementalPCA makes it possible to implement out-of-core Principal Component Analysis either by: Using its partial_fit method on chunks of data fetched sequentially from the local hard drive or a network database. Calling its fit method on a memory mapped file using numpy.memmap. IncrementalPCA only stores estimates of component and noise variances, in order update explained_variance_ratio_ incrementally. This is why memory usage depends on the number of samples per batch, rather than the number of samples to be processed in the dataset. As in PCA, IncrementalPCA centers but does not scale the input data for each feature before applying the SVD. Examples: Incremental PCA 2.5.1.3. PCA using randomized SVD¶ It is often interesting to project data to a lower-dimensional space that preserves most of the variance, by dropping the singular vector of components associated with lower singular values. For instance, if we work with 64x64 pixel gray-level pictures for face recognition, the dimensionality of the data is 4096 and it is slow to train an RBF support vector machine on such wide data. Furthermore we know that the intrinsic dimensionality of the data is much lower than 4096 since all pictures of human faces look somewhat alike. The samples lie on a manifold of much lower dimension (say around 200 for instance). The PCA algorithm can be used to linearly transform the data while both reducing the dimensionality and preserve most of the explained variance at the same time. The class PCA used with the optional parameter svd_solver='randomized' is very useful in that case: since we are going to drop most of the singular vectors it is much more efficient to limit the computation to an approximated estimate of the singular vectors we will keep to actually perform the transform. For instance, the following shows 16 sample portraits (centered around 0.0) from the Olivetti dataset. On the right hand side are the first 16 singular vectors reshaped as portraits. Since we only require the top 16 singular vectors of a dataset with size \(n_{samples} = 400\) and \(n_{features} = 64 \times 64 = 4096\), the computation time is less than 1s: If we note \(n_{\max} = \max(n_{\mathrm{samples}}, n_{\mathrm{features}})\) and \(n_{\min} = \min(n_{\mathrm{samples}}, n_{\mathrm{features}})\), the time complexity of the randomized PCA is \(O(n_{\max}^2 \cdot n_{\mathrm{components}})\) instead of \(O(n_{\max}^2 \cdot n_{\min})\) for the exact method implemented in PCA. The memory footprint of randomized PCA is also proportional to \(2 \cdot n_{\max} \cdot n_{\mathrm{components}}\) instead of \(n_{\max} \cdot n_{\min}\) for the exact method. Note: the implementation of inverse_transform in PCA with svd_solver='randomized' is not the exact inverse transform of transform even when whiten=False (default). Examples: Faces recognition example using eigenfaces and SVMs Faces dataset decompositions References: Algorithm 4.3 in “Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions” Halko, et al., 2009 “An implementation of a randomized algorithm for principal component analysis” A. Szlam et al. 2014 2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)¶ SparsePCA is a variant of PCA, with the goal of extracting the set of sparse components that best reconstruct the data. Mini-batch sparse PCA (MiniBatchSparsePCA) is a variant of SparsePCA that is faster but less accurate. The increased speed is reached by iterating over small chunks of the set of features, for a given number of iterations. Principal component analysis (PCA) has the disadvantage that the components extracted by this method have exclusively dense expressions, i.e. they have non-zero coefficients when expressed as linear combinations of the original variables. This can make interpretation difficult. In many cases, the real underlying components can be more naturally imagined as sparse vectors; for example in face recognition, components might naturally map to parts of faces. Sparse principal components yields a more parsimonious, interpretable representation, clearly emphasizing which of the original features contribute to the differences between samples. The following example illustrates 16 components extracted using sparse PCA from the Olivetti faces dataset. It can be seen how the regularization term induces many zeros. Furthermore, the natural structure of the data causes the non-zero coefficients to be vertically adjacent. The model does not enforce this mathematically: each component is a vector \(h \in \mathbf{R}^{4096}\), and there is no notion of vertical adjacency except during the human-friendly visualization as 64x64 pixel images. The fact that the components shown below appear local is the effect of the inherent structure of the data, which makes such local patterns minimize reconstruction error. There exist sparsity-inducing norms that take into account adjacency and different kinds of structure; see [Jen09] for a review of such methods. For more details on how to use Sparse PCA, see the Examples section, below. Note that there are many different formulations for the Sparse PCA problem. The one implemented here is based on [Mrl09] . The optimization problem solved is a PCA problem (dictionary learning) with an \(\ell_1\) penalty on the components: \[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2} ||X-UV||_{\text{Fro}}^2+\alpha||V||_{1,1} \\ \text{subject to } & ||U_k||_2 <= 1 \text{ for all } 0 \leq k < n_{components}\end{split}\] \(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\) stands for the entry-wise matrix norm which is the sum of the absolute values of all the entries in the matrix. The sparsity-inducing \(||.||_{1,1}\) matrix norm also prevents learning components from noise when few training samples are available. The degree of penalization (and thus sparsity) can be adjusted through the hyperparameter alpha. Small values lead to a gently regularized factorization, while larger values shrink many coefficients to zero. Note While in the spirit of an online algorithm, the class MiniBatchSparsePCA does not implement partial_fit because the algorithm is online along the features direction, not the samples direction. Examples: Faces dataset decompositions References: [Mrl09] “Online Dictionary Learning for Sparse Coding” J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009 [Jen09] “Structured Sparse Principal Component Analysis” R. Jenatton, G. Obozinski, F. Bach, 2009 2.5.2. Kernel Principal Component Analysis (kPCA)¶ 2.5.2.1. Exact Kernel PCA¶ KernelPCA is an extension of PCA which achieves non-linear dimensionality reduction through the use of kernels (see Pairwise metrics, Affinities and Kernels) [Scholkopf1997]. It has many applications including denoising, compression and structured prediction (kernel dependency estimation). KernelPCA supports both transform and inverse_transform. Note KernelPCA.inverse_transform relies on a kernel ridge to learn the function mapping samples from the PCA basis into the original feature space [Bakir2003]. Thus, the reconstruction obtained with KernelPCA.inverse_transform is an approximation. See the example linked below for more details. Examples: Kernel PCA References: [Scholkopf1997] Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. “Kernel principal component analysis.” International conference on artificial neural networks. Springer, Berlin, Heidelberg, 1997. [Bakir2003] Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf. “Learning to find pre-images.” Advances in neural information processing systems 16 (2003): 449-456. 2.5.2.2. Choice of solver for Kernel PCA¶ While in PCA the number of components is bounded by the number of features, in KernelPCA the number of components is bounded by the number of samples. Many real-world datasets have large number of samples! In these cases finding all the components with a full kPCA is a waste of computation time, as data is mostly described by the first few components (e.g. n_components<=100). In other words, the centered Gram matrix that is eigendecomposed in the Kernel PCA fitting process has an effective rank that is much smaller than its size. This is a situation where approximate eigensolvers can provide speedup with very low precision loss. Eigensolvers Click for more details ¶ The optional parameter eigen_solver='randomized' can be used to significantly reduce the computation time when the number of requested n_components is small compared with the number of samples. It relies on randomized decomposition methods to find an approximate solution in a shorter time. The time complexity of the randomized KernelPCA is \(O(n_{\mathrm{samples}}^2 \cdot n_{\mathrm{components}})\) instead of \(O(n_{\mathrm{samples}}^3)\) for the exact method implemented with eigen_solver='dense'. The memory footprint of randomized KernelPCA is also proportional to \(2 \cdot n_{\mathrm{samples}} \cdot n_{\mathrm{components}}\) instead of \(n_{\mathrm{samples}}^2\) for the exact method. Note: this technique is the same as in PCA using randomized SVD. In addition to the above two solvers, eigen_solver='arpack' can be used as an alternate way to get an approximate decomposition. In practice, this method only provides reasonable execution times when the number of components to find is extremely small. It is enabled by default when the desired number of components is less than 10 (strict) and the number of samples is more than 200 (strict). See KernelPCA for details. References: dense solver: scipy.linalg.eigh documentation randomized solver: Algorithm 4.3 in “Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions” Halko, et al. (2009) “An implementation of a randomized algorithm for principal component analysis” A. Szlam et al. (2014) arpack solver: scipy.sparse.linalg.eigsh documentation R. B. Lehoucq, D. C. Sorensen, and C. Yang, (1998) 2.5.3. Truncated singular value decomposition and latent semantic analysis¶ TruncatedSVD implements a variant of singular value decomposition (SVD) that only computes the \(k\) largest singular values, where \(k\) is a user-specified parameter. TruncatedSVD is very similar to PCA, but differs in that the matrix \(X\) does not need to be centered. When the columnwise (per-feature) means of \(X\) are subtracted from the feature values, truncated SVD on the resulting matrix is equivalent to PCA. About truncated SVD and latent semantic analysis (LSA) Click for more details ¶ When truncated SVD is applied to term-document matrices (as returned by CountVectorizer or TfidfVectorizer), this transformation is known as latent semantic analysis (LSA), because it transforms such matrices to a “semantic” space of low dimensionality. In particular, LSA is known to combat the effects of synonymy and polysemy (both of which roughly mean there are multiple meanings per word), which cause term-document matrices to be overly sparse and exhibit poor similarity under measures such as cosine similarity. Note LSA is also known as latent semantic indexing, LSI, though strictly that refers to its use in persistent indexes for information retrieval purposes. Mathematically, truncated SVD applied to training samples \(X\) produces a low-rank approximation \(X\): \[X \approx X_k = U_k \Sigma_k V_k^\top\] After this operation, \(U_k \Sigma_k\) is the transformed training set with \(k\) features (called n_components in the API). To also transform a test set \(X\), we multiply it with \(V_k\): \[X' = X V_k\] Note Most treatments of LSA in the natural language processing (NLP) and information retrieval (IR) literature swap the axes of the matrix \(X\) so that it has shape n_features × n_samples. We present LSA in a different way that matches the scikit-learn API better, but the singular values found are the same. While the TruncatedSVD transformer works with any feature matrix, using it on tf–idf matrices is recommended over raw frequency counts in an LSA/document processing setting. In particular, sublinear scaling and inverse document frequency should be turned on (sublinear_tf=True, use_idf=True) to bring the feature values closer to a Gaussian distribution, compensating for LSA’s erroneous assumptions about textual data. Examples: Clustering text documents using k-means References: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (2008), Introduction to Information Retrieval, Cambridge University Press, chapter 18: Matrix decompositions & latent semantic indexing 2.5.4. Dictionary Learning¶ 2.5.4.1. Sparse coding with a precomputed dictionary¶ The SparseCoder object is an estimator that can be used to transform signals into sparse linear combination of atoms from a fixed, precomputed dictionary such as a discrete wavelet basis. This object therefore does not implement a fit method. The transformation amounts to a sparse coding problem: finding a representation of the data as a linear combination of as few dictionary atoms as possible. All variations of dictionary learning implement the following transform methods, controllable via the transform_method initialization parameter: Orthogonal matching pursuit (Orthogonal Matching Pursuit (OMP)) Least-angle regression (Least Angle Regression) Lasso computed by least-angle regression Lasso using coordinate descent (Lasso) Thresholding Thresholding is very fast but it does not yield accurate reconstructions. They have been shown useful in literature for classification tasks. For image reconstruction tasks, orthogonal matching pursuit yields the most accurate, unbiased reconstruction. The dictionary learning objects offer, via the split_code parameter, the possibility to separate the positive and negative values in the results of sparse coding. This is useful when dictionary learning is used for extracting features that will be used for supervised learning, because it allows the learning algorithm to assign different weights to negative loadings of a particular atom, from to the corresponding positive loading. The split code for a single sample has length 2 * n_components and is constructed using the following rule: First, the regular code of length n_components is computed. Then, the first n_components entries of the split_code are filled with the positive part of the regular code vector. The second half of the split code is filled with the negative part of the code vector, only with a positive sign. Therefore, the split_code is non-negative. Examples: Sparse coding with a precomputed dictionary 2.5.4.2. Generic dictionary learning¶ Dictionary learning (DictionaryLearning) is a matrix factorization problem that amounts to finding a (usually overcomplete) dictionary that will perform well at sparsely encoding the fitted data. Representing data as sparse combinations of atoms from an overcomplete dictionary is suggested to be the way the mammalian primary visual cortex works. Consequently, dictionary learning applied on image patches has been shown to give good results in image processing tasks such as image completion, inpainting and denoising, as well as for supervised recognition tasks. Dictionary learning is an optimization problem solved by alternatively updating the sparse code, as a solution to multiple Lasso problems, considering the dictionary fixed, and then updating the dictionary to best fit the sparse code. \[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2} ||X-UV||_{\text{Fro}}^2+\alpha||U||_{1,1} \\ \text{subject to } & ||V_k||_2 <= 1 \text{ for all } 0 \leq k < n_{\mathrm{atoms}}\end{split}\] \(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\) stands for the entry-wise matrix norm which is the sum of the absolute values of all the entries in the matrix. After using such a procedure to fit the dictionary, the transform is simply a sparse coding step that shares the same implementation with all dictionary learning objects (see Sparse coding with a precomputed dictionary). It is also possible to constrain the dictionary and/or code to be positive to match constraints that may be present in the data. Below are the faces with different positivity constraints applied. Red indicates negative values, blue indicates positive values, and white represents zeros. The following image shows how a dictionary learned from 4x4 pixel image patches extracted from part of the image of a raccoon face looks like. Examples: Image denoising using dictionary learning References: “Online dictionary learning for sparse coding” J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009 2.5.4.3. Mini-batch dictionary learning¶ MiniBatchDictionaryLearning implements a faster, but less accurate version of the dictionary learning algorithm that is better suited for large datasets. By default, MiniBatchDictionaryLearning divides the data into mini-batches and optimizes in an online manner by cycling over the mini-batches for the specified number of iterations. However, at the moment it does not implement a stopping condition. The estimator also implements partial_fit, which updates the dictionary by iterating only once over a mini-batch. This can be used for online learning when the data is not readily available from the start, or for when the data does not fit into the memory. Clustering for dictionary learning Note that when using dictionary learning to extract a representation (e.g. for sparse coding) clustering can be a good proxy to learn the dictionary. For instance the MiniBatchKMeans estimator is computationally efficient and implements on-line learning with a partial_fit method. Example: Online learning of a dictionary of parts of faces 2.5.5. Factor Analysis¶ In unsupervised learning we only have a dataset \(X = \{x_1, x_2, \dots, x_n \}\). How can this dataset be described mathematically? A very simple continuous latent variable model for \(X\) is \[x_i = W h_i + \mu + \epsilon\] The vector \(h_i\) is called “latent” because it is unobserved. \(\epsilon\) is considered a noise term distributed according to a Gaussian with mean 0 and covariance \(\Psi\) (i.e. \(\epsilon \sim \mathcal{N}(0, \Psi)\)), \(\mu\) is some arbitrary offset vector. Such a model is called “generative” as it describes how \(x_i\) is generated from \(h_i\). If we use all the \(x_i\)’s as columns to form a matrix \(\mathbf{X}\) and all the \(h_i\)’s as columns of a matrix \(\mathbf{H}\) then we can write (with suitably defined \(\mathbf{M}\) and \(\mathbf{E}\)): \[\mathbf{X} = W \mathbf{H} + \mathbf{M} + \mathbf{E}\] In other words, we decomposed matrix \(\mathbf{X}\). If \(h_i\) is given, the above equation automatically implies the following probabilistic interpretation: \[p(x_i|h_i) = \mathcal{N}(Wh_i + \mu, \Psi)\] For a complete probabilistic model we also need a prior distribution for the latent variable \(h\). The most straightforward assumption (based on the nice properties of the Gaussian distribution) is \(h \sim \mathcal{N}(0, \mathbf{I})\). This yields a Gaussian as the marginal distribution of \(x\): \[p(x) = \mathcal{N}(\mu, WW^T + \Psi)\] Now, without any further assumptions the idea of having a latent variable \(h\) would be superfluous – \(x\) can be completely modelled with a mean and a covariance. We need to impose some more specific structure on one of these two parameters. A simple additional assumption regards the structure of the error covariance \(\Psi\): \(\Psi = \sigma^2 \mathbf{I}\): This assumption leads to the probabilistic model of PCA. \(\Psi = \mathrm{diag}(\psi_1, \psi_2, \dots, \psi_n)\): This model is called FactorAnalysis, a classical statistical model. The matrix W is sometimes called the “factor loading matrix”. Both models essentially estimate a Gaussian with a low-rank covariance matrix. Because both models are probabilistic they can be integrated in more complex models, e.g. Mixture of Factor Analysers. One gets very different models (e.g. FastICA) if non-Gaussian priors on the latent variables are assumed. Factor analysis can produce similar components (the columns of its loading matrix) to PCA. However, one can not make any general statements about these components (e.g. whether they are orthogonal): The main advantage for Factor Analysis over PCA is that it can model the variance in every direction of the input space independently (heteroscedastic noise): This allows better model selection than probabilistic PCA in the presence of heteroscedastic noise: Factor Analysis is often followed by a rotation of the factors (with the parameter rotation), usually to improve interpretability. For example, Varimax rotation maximizes the sum of the variances of the squared loadings, i.e., it tends to produce sparser factors, which are influenced by only a few features each (the “simple structure”). See e.g., the first example below. Examples: Factor Analysis (with rotation) to visualize patterns Model selection with Probabilistic PCA and Factor Analysis (FA) 2.5.6. Independent component analysis (ICA)¶ Independent component analysis separates a multivariate signal into additive subcomponents that are maximally independent. It is implemented in scikit-learn using the Fast ICA algorithm. Typically, ICA is not used for reducing dimensionality but for separating superimposed signals. Since the ICA model does not include a noise term, for the model to be correct, whitening must be applied. This can be done internally using the whiten argument or manually using one of the PCA variants. It is classically used to separate mixed signals (a problem known as blind source separation), as in the example below: ICA can also be used as yet another non linear decomposition that finds components with some sparsity: Examples: Blind source separation using FastICA FastICA on 2D point clouds Faces dataset decompositions 2.5.7. Non-negative matrix factorization (NMF or NNMF)¶ 2.5.7.1. NMF with the Frobenius norm¶ NMF [1] is an alternative approach to decomposition that assumes that the data and the components are non-negative. NMF can be plugged in instead of PCA or its variants, in the cases where the data matrix does not contain negative values. It finds a decomposition of samples \(X\) into two matrices \(W\) and \(H\) of non-negative elements, by optimizing the distance \(d\) between \(X\) and the matrix product \(WH\). The most widely used distance function is the squared Frobenius norm, which is an obvious extension of the Euclidean norm to matrices: \[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{\mathrm{Fro}}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\] Unlike PCA, the representation of a vector is obtained in an additive fashion, by superimposing the components, without subtracting. Such additive models are efficient for representing images and text. It has been observed in [Hoyer, 2004] [2] that, when carefully constrained, NMF can produce a parts-based representation of the dataset, resulting in interpretable models. The following example displays 16 sparse components found by NMF from the images in the Olivetti faces dataset, in comparison with the PCA eigenfaces. The init attribute determines the initialization method applied, which has a great impact on the performance of the method. NMF implements the method Nonnegative Double Singular Value Decomposition. NNDSVD [4] is based on two SVD processes, one approximating the data matrix, the other approximating positive sections of the resulting partial SVD factors utilizing an algebraic property of unit rank matrices. The basic NNDSVD algorithm is better fit for sparse factorization. Its variants NNDSVDa (in which all zeros are set equal to the mean of all elements of the data), and NNDSVDar (in which the zeros are set to random perturbations less than the mean of the data divided by 100) are recommended in the dense case. Note that the Multiplicative Update (‘mu’) solver cannot update zeros present in the initialization, so it leads to poorer results when used jointly with the basic NNDSVD algorithm which introduces a lot of zeros; in this case, NNDSVDa or NNDSVDar should be preferred. NMF can also be initialized with correctly scaled random non-negative matrices by setting init="random". An integer seed or a RandomState can also be passed to random_state to control reproducibility. In NMF, L1 and L2 priors can be added to the loss function in order to regularize the model. The L2 prior uses the Frobenius norm, while the L1 prior uses an elementwise L1 norm. As in ElasticNet, we control the combination of L1 and L2 with the l1_ratio (\(\rho\)) parameter, and the intensity of the regularization with the alpha_W and alpha_H (\(\alpha_W\) and \(\alpha_H\)) parameters. The priors are scaled by the number of samples (\(n\_samples\)) for H and the number of features (\(n\_features\)) for W to keep their impact balanced with respect to one another and to the data fit term as independent as possible of the size of the training set. Then the priors terms are: \[(\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features + (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\] and the regularized objective function is: \[d_{\mathrm{Fro}}(X, WH) + (\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features + (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\] 2.5.7.2. NMF with a beta-divergence¶ As described previously, the most widely used distance function is the squared Frobenius norm, which is an obvious extension of the Euclidean norm to matrices: \[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\] Other distance functions can be used in NMF as, for example, the (generalized) Kullback-Leibler (KL) divergence, also referred as I-divergence: \[d_{KL}(X, Y) = \sum_{i,j} (X_{ij} \log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})\] Or, the Itakura-Saito (IS) divergence: \[d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - \log(\frac{X_{ij}}{Y_{ij}}) - 1)\] These three distances are special cases of the beta-divergence family, with \(\beta = 2, 1, 0\) respectively [6]. The beta-divergence are defined by : \[d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})\] Note that this definition is not valid if \(\beta \in (0; 1)\), yet it can be continuously extended to the definitions of \(d_{KL}\) and \(d_{IS}\) respectively. NMF implemented solvers Click for more details ¶ NMF implements two solvers, using Coordinate Descent (‘cd’) [5], and Multiplicative Update (‘mu’) [6]. The ‘mu’ solver can optimize every beta-divergence, including of course the Frobenius norm (\(\beta=2\)), the (generalized) Kullback-Leibler divergence (\(\beta=1\)) and the Itakura-Saito divergence (\(\beta=0\)). Note that for \(\beta \in (1; 2)\), the ‘mu’ solver is significantly faster than for other values of \(\beta\). Note also that with a negative (or 0, i.e. ‘itakura-saito’) \(\beta\), the input matrix cannot contain zero values. The ‘cd’ solver can only optimize the Frobenius norm. Due to the underlying non-convexity of NMF, the different solvers may converge to different minima, even when optimizing the same distance function. NMF is best used with the fit_transform method, which returns the matrix W. The matrix H is stored into the fitted model in the components_ attribute; the method transform will decompose a new matrix X_new based on these stored components: >>> import numpy as np >>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]]) >>> from sklearn.decomposition import NMF >>> model = NMF(n_components=2, init='random', random_state=0) >>> W = model.fit_transform(X) >>> H = model.components_ >>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]]) >>> W_new = model.transform(X_new) Examples: Faces dataset decompositions Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation 2.5.7.3. Mini-batch Non Negative Matrix Factorization¶ MiniBatchNMF [7] implements a faster, but less accurate version of the non negative matrix factorization (i.e. NMF), better suited for large datasets. By default, MiniBatchNMF divides the data into mini-batches and optimizes the NMF model in an online manner by cycling over the mini-batches for the specified number of iterations. The batch_size parameter controls the size of the batches. In order to speed up the mini-batch algorithm it is also possible to scale past batches, giving them less importance than newer batches. This is done introducing a so-called forgetting factor controlled by the forget_factor parameter. The estimator also implements partial_fit, which updates H by iterating only once over a mini-batch. This can be used for online learning when the data is not readily available from the start, or when the data does not fit into memory. References: [1] “Learning the parts of objects by non-negative matrix factorization” D. Lee, S. Seung, 1999 [2] “Non-negative Matrix Factorization with Sparseness Constraints” P. Hoyer, 2004 [4] “SVD based initialization: A head start for nonnegative matrix factorization” C. Boutsidis, E. Gallopoulos, 2008 [5] “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” A. Cichocki, A. Phan, 2009 [6] (1,2) “Algorithms for nonnegative matrix factorization with the beta-divergence” C. Fevotte, J. Idier, 2011 [7] “Online algorithms for nonnegative matrix factorization with the Itakura-Saito divergence” A. Lefevre, F. Bach, C. Fevotte, 2011 2.5.8. Latent Dirichlet Allocation (LDA)¶ Latent Dirichlet Allocation is a generative probabilistic model for collections of discrete dataset such as text corpora. It is also a topic model that is used for discovering abstract topics from a collection of documents. The graphical model of LDA is a three-level generative model: Note on notations presented in the graphical model above, which can be found in Hoffman et al. (2013): The corpus is a collection of \(D\) documents. A document is a sequence of \(N\) words. There are \(K\) topics in the corpus. The boxes represent repeated sampling. In the graphical model, each node is a random variable and has a role in the generative process. A shaded node indicates an observed variable and an unshaded node indicates a hidden (latent) variable. In this case, words in the corpus are the only data that we observe. The latent variables determine the random mixture of topics in the corpus and the distribution of words in the documents. The goal of LDA is to use the observed words to infer the hidden topic structure. Details on modeling text corpora Click for more details ¶ When modeling text corpora, the model assumes the following generative process for a corpus with \(D\) documents and \(K\) topics, with \(K\) corresponding to n_components in the API: For each topic \(k \in K\), draw \(\beta_k \sim \mathrm{Dirichlet}(\eta)\). This provides a distribution over the words, i.e. the probability of a word appearing in topic \(k\). \(\eta\) corresponds to topic_word_prior. For each document \(d \in D\), draw the topic proportions \(\theta_d \sim \mathrm{Dirichlet}(\alpha)\). \(\alpha\) corresponds to doc_topic_prior. For each word \(i\) in document \(d\): Draw the topic assignment \(z_{di} \sim \mathrm{Multinomial} (\theta_d)\) Draw the observed word \(w_{ij} \sim \mathrm{Multinomial} (\beta_{z_{di}})\) For parameter estimation, the posterior distribution is: \[p(z, \theta, \beta |w, \alpha, \eta) = \frac{p(z, \theta, \beta|\alpha, \eta)}{p(w|\alpha, \eta)}\] Since the posterior is intractable, variational Bayesian method uses a simpler distribution \(q(z,\theta,\beta | \lambda, \phi, \gamma)\) to approximate it, and those variational parameters \(\lambda\), \(\phi\), \(\gamma\) are optimized to maximize the Evidence Lower Bound (ELBO): \[\log\: P(w | \alpha, \eta) \geq L(w,\phi,\gamma,\lambda) \overset{\triangle}{=} E_{q}[\log\:p(w,z,\theta,\beta|\alpha,\eta)] - E_{q}[\log\:q(z, \theta, \beta)]\] Maximizing ELBO is equivalent to minimizing the Kullback-Leibler(KL) divergence between \(q(z,\theta,\beta)\) and the true posterior \(p(z, \theta, \beta |w, \alpha, \eta)\). LatentDirichletAllocation implements the online variational Bayes algorithm and supports both online and batch update methods. While the batch method updates variational variables after each full pass through the data, the online method updates variational variables from mini-batch data points. Note Although the online method is guaranteed to converge to a local optimum point, the quality of the optimum point and the speed of convergence may depend on mini-batch size and attributes related to learning rate setting. When LatentDirichletAllocation is applied on a “document-term” matrix, the matrix will be decomposed into a “topic-term” matrix and a “document-topic” matrix. While “topic-term” matrix is stored as components_ in the model, “document-topic” matrix can be calculated from transform method. LatentDirichletAllocation also implements partial_fit method. This is used when data can be fetched sequentially. Examples: Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation References: “Latent Dirichlet Allocation” D. Blei, A. Ng, M. Jordan, 2003 “Online Learning for Latent Dirichlet Allocation” M. Hoffman, D. Blei, F. Bach, 2010 “Stochastic Variational Inference” M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013 “The varimax criterion for analytic rotation in factor analysis” H. F. Kaiser, 1958 See also Dimensionality reduction for dimensionality reduction with Neighborhood Components Analysis. © 2007 - 2024, scikit-learn developers (BSD License). Show this page source Sklearn 可视化数据: 主成分分析(PCA) - 知乎首发于奇客谷教程切换模式写文章登录/注册Sklearn 可视化数据: 主成分分析(PCA)吴吃辣编程20年,精品技术教程:www.qikegu.com主成分分析(PCA)是一种常用于减少大数据集维数的降维方法,把大变量集转换为仍包含大变量集中大部分信息的较小变量集。减少数据集的变量数量,自然是以牺牲精度为代价的,降维的好处是以略低的精度换取简便。因为较小的数据集更易于探索和可视化,并且使机器学习算法更容易和更快地分析数据,而不需处理无关变量。总而言之,主成分分析(PCA)的概念很简单——减少数据集的变量数量,同时保留尽可能多的信息。使用scikit-learn,可以很容易地对数据进行主成分分析:# 创建一个随机的PCA模型,该模型包含两个组件 randomized_pca = PCA(n_components=2, svd_solver='randomized') # 拟合数据并将其转换为模型 reduced_data_rpca = randomized_pca.fit_transform(digits.data) # 创建一个常规的PCA模型 pca = PCA(n_components=2) # 拟合数据并将其转换为模型 reduced_data_pca = pca.fit_transform(digits.data) # 检查形状 reduced_data_pca.shape # 打印数据 print(reduced_data_rpca) print(reduced_data_pca)输出[[ -1.25946586 21.27488217] [ 7.95761214 -20.76870381] [ 6.99192224 -9.95598251] ... [ 10.80128338 -6.96025076] [ -4.87209834 12.42395157] [ -0.34439091 6.36555458]] [[ -1.2594653 21.27488157] [ 7.95761471 -20.76871125] [ 6.99191791 -9.95597343] ... [ 10.80128002 -6.96024527] [ -4.87209081 12.42395739] [ -0.34439546 6.36556369]]随机的PCA模型在维数较多时性能更好。可以比较常规PCA模型与随机PCA模型的结果,看看有什么不同。告诉模型保留两个组件,是为了确保有二维数据可用来绘图。现在可以绘制一个散点图来可视化数据:colors = ['black', 'blue', 'purple', 'yellow', 'white', 'red', 'lime', 'cyan', 'orange', 'gray'] # 根据主成分分析结果绘制散点图 for i in range(len(colors)): x = reduced_data_rpca[:, 0][digits.target == i] y = reduced_data_rpca[:, 1][digits.target == i] plt.scatter(x, y, c=colors[i]) # 设置图例,0-9用不同颜色表示 plt.legend(digits.target_names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.) # 设置坐标标签 plt.xlabel('First Principal Component') plt.ylabel('Second Principal Component') # 设置标题 plt.title("PCA Scatter Plot") # 显示图形 plt.show()显示:章节SciKit-Learn 加载数据集SciKit-Learn 数据集基本信息SciKit-Learn 使用matplotlib可视化数据SciKit-Learn 可视化数据:主成分分析(PCA)SciKit-Learn 预处理数据SciKit-Learn K均值聚类SciKit-Learn 支持向量机SciKit-Learn 速查发布于 2019-08-13 16:24sklearnPython机器学习赞同 7添加评论分享喜欢收藏申请转载文章被以下专栏收录奇客谷教程精品编程教程与实例,互联网技 主成分分析(PCA)-scikit-learn中文社区 安装 用户指南 API 案例 更多 入门 教程 更新日志 词汇表 常见问题 交流群 Toggle Menu Prev Up Next CDA数据科学研究院 提供翻译支持 主成分分析(PCA) 主成分分析(PCA)¶ 这些图有助于说明点云在一个方向上是如何非常平坦的--这就是PCA选择一个不是平坦方向的地方。 print(__doc__)# Authors: Gael Varoquaux# Jaques Grobler# Kevin Hughes# License: BSD 3 clausefrom sklearn.decomposition import PCAfrom mpl_toolkits.mplot3d import Axes3Dimport numpy as npimport matplotlib.pyplot as pltfrom scipy import stats# ############################################################################## Create the datae = np.exp(1)np.random.seed(4)def pdf(x): return 0.5 * (stats.norm(scale=0.25 / e).pdf(x) + stats.norm(scale=4 / e).pdf(x))y = np.random.normal(scale=0.5, size=(30000))x = np.random.normal(scale=0.5, size=(30000))z = np.random.normal(scale=0.1, size=len(x))density = pdf(x) * pdf(y)pdf_z = pdf(5 * z)density *= pdf_za = x + yb = 2 * yc = a - b + znorm = np.sqrt(a.var() + b.var())a /= normb /= norm# ############################################################################## Plot the figuresdef plot_figs(fig_num, elev, azim): fig = plt.figure(fig_num, figsize=(4, 3)) plt.clf() ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=elev, azim=azim) ax.scatter(a[::10], b[::10], c[::10], c=density[::10], marker='+', alpha=.4) Y = np.c_[a, b, c] # Using SciPy's SVD, this would be: # _, pca_score, V = scipy.linalg.svd(Y, full_matrices=False) pca = PCA(n_components=3) pca.fit(Y) pca_score = pca.explained_variance_ratio_ V = pca.components_ x_pca_axis, y_pca_axis, z_pca_axis = 3 * V.T x_pca_plane = np.r_[x_pca_axis[:2], - x_pca_axis[1::-1]] y_pca_plane = np.r_[y_pca_axis[:2], - y_pca_axis[1::-1]] z_pca_plane = np.r_[z_pca_axis[:2], - z_pca_axis[1::-1]] x_pca_plane.shape = (2, 2) y_pca_plane.shape = (2, 2) z_pca_plane.shape = (2, 2) ax.plot_surface(x_pca_plane, y_pca_plane, z_pca_plane) ax.w_xaxis.set_ticklabels([]) ax.w_yaxis.set_ticklabels([]) ax.w_zaxis.set_ticklabels([])elev = -40azim = -80plot_figs(1, elev, azim)elev = 30azim = 20plot_figs(2, elev, azim)plt.show() 脚本的总运行时间:(0分0.198秒) Download Python source code: plot_pca_3d.py Download Jupyter notebook: plot_pca_3d.ipynb © 2007 - 2020, scikit-learn developers (BSD License). 用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园 会员 周边 新闻 博问 AI培训 云市场 所有博客 当前博客 我的博客 我的园子 账号设置 简洁模式 ... 退出登录 注册 登录 刘建平Pinard 十五年码农,对数学统计学,数据挖掘,机器学习,大数据平台,大数据平台应用开发,大数据可视化感兴趣。 博客园 首页 新随笔 联系 订阅 管理 用scikit-learn学习主成分分析(PCA) 在主成分分析(PCA)原理总结中,我们对主成分分析(以下简称PCA)的原理做了总结,下面我们就总结下如何使用scikit-learn工具来进行PCA降维。 1. scikit-learn PCA类介绍 在scikit-learn中,与PCA相关的类都在sklearn.decomposition包中。最常用的PCA类就是sklearn.decomposition.PCA,我们下面主要也会讲解基于这个类的使用的方法。 除了PCA类以外,最常用的PCA相关类还有KernelPCA类,在原理篇我们也讲到了,它主要用于非线性数据的降维,需要用到核技巧。因此在使用的时候需要选择合适的核函数并对核函数的参数进行调参。 另外一个常用的PCA相关类是IncrementalPCA类,它主要是为了解决单机内存限制的。有时候我们的样本量可能是上百万+,维度可能也是上千,直接去拟合数据可能会让内存爆掉, 此时我们可以用IncrementalPCA类来解决这个问题。IncrementalPCA先将数据分成多个batch,然后对每个batch依次递增调用partial_fit函数,这样一步步的得到最终的样本最优降维。 此外还有SparsePCA和MiniBatchSparsePCA。他们和上面讲到的PCA类的区别主要是使用了L1的正则化,这样可以将很多非主要成分的影响度降为0,这样在PCA降维的时候我们仅仅需要对那些相对比较主要的成分进行PCA降维,避免了一些噪声之类的因素对我们PCA降维的影响。SparsePCA和MiniBatchSparsePCA之间的区别则是MiniBatchSparsePCA通过使用一部分样本特征和给定的迭代次数来进行PCA降维,以解决在大样本时特征分解过慢的问题,当然,代价就是PCA降维的精确度可能会降低。使用SparsePCA和MiniBatchSparsePCA需要对L1正则化参数进行调参。 2. sklearn.decomposition.PCA参数介绍 下面我们主要基于sklearn.decomposition.PCA来讲解如何使用scikit-learn进行PCA降维。PCA类基本不需要调参,一般来说,我们只需要指定我们需要降维到的维度,或者我们希望降维后的主成分的方差和占原始维度所有特征方差和的比例阈值就可以了。 现在我们对sklearn.decomposition.PCA的主要参数做一个介绍: 1)n_components:这个参数可以帮我们指定希望PCA降维后的特征维度数目。最常用的做法是直接指定降维到的维度数目,此时n_components是一个大于等于1的整数。当然,我们也可以指定主成分的方差和所占的最小比例阈值,让PCA类自己去根据样本特征方差来决定降维到的维度数,此时n_components是一个(0,1]之间的数。当然,我们还可以将参数设置为"mle", 此时PCA类会用MLE算法根据特征的方差分布情况自己去选择一定数量的主成分特征来降维。我们也可以用默认值,即不输入n_components,此时n_components=min(样本数,特征数)。 2)whiten :判断是否进行白化。所谓白化,就是对降维后的数据的每个特征进行归一化,让方差都为1.对于PCA降维本身来说,一般不需要白化。如果你PCA降维后有后续的数据处理动作,可以考虑白化。默认值是False,即不进行白化。 3)svd_solver:即指定奇异值分解SVD的方法,由于特征分解是奇异值分解SVD的一个特例,一般的PCA库都是基于SVD实现的。有4个可以选择的值:{‘auto’, ‘full’, ‘arpack’, ‘randomized’}。randomized一般适用于数据量大,数据维度多同时主成分数目比例又较低的PCA降维,它使用了一些加快SVD的随机算法。 full则是传统意义上的SVD,使用了scipy库对应的实现。arpack和randomized的适用场景类似,区别是randomized使用的是scikit-learn自己的SVD实现,而arpack直接使用了scipy库的sparse SVD实现。默认是auto,即PCA类会自己去在前面讲到的三种算法里面去权衡,选择一个合适的SVD算法来降维。一般来说,使用默认值就够了。 除了这些输入参数外,有两个PCA类的成员值得关注。第一个是explained_variance_,它代表降维后的各主成分的方差值。方差值越大,则说明越是重要的主成分。第二个是explained_variance_ratio_,它代表降维后的各主成分的方差值占总方差值的比例,这个比例越大,则越是重要的主成分。 3. PCA实例 下面我们用一个实例来学习下scikit-learn中的PCA类使用。为了方便的可视化让大家有一个直观的认识,我们这里使用了三维的数据来降维。 完整代码参见我的github: https://github.com/ljpzzz/machinelearning/blob/master/classic-machine-learning/pca.ipynb 首先我们生成随机数据并可视化,代码如下: import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D %matplotlib inline from sklearn.datasets.samples_generator import make_blobs # X为样本特征,Y为样本簇类别, 共1000个样本,每个样本3个特征,共4个簇 X, y = make_blobs(n_samples=10000, n_features=3, centers=[[3,3, 3], [0,0,0], [1,1,1], [2,2,2]], cluster_std=[0.2, 0.1, 0.2, 0.2], random_state =9) fig = plt.figure() ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20) plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o') 三维数据的分布图如下: 我们先不降维,只对数据进行投影,看看投影后的三个维度的方差分布,代码如下: from sklearn.decomposition import PCA pca = PCA(n_components=3) pca.fit(X) print pca.explained_variance_ratio_ print pca.explained_variance_ 输出如下: [ 0.98318212 0.00850037 0.00831751][ 3.78483785 0.03272285 0.03201892] 可以看出投影后三个特征维度的方差比例大约为98.3%:0.8%:0.8%。投影后第一个特征占了绝大多数的主成分比例。 现在我们来进行降维,从三维降到2维,代码如下: pca = PCA(n_components=2) pca.fit(X) print pca.explained_variance_ratio_ print pca.explained_variance_ 输出如下: [ 0.98318212 0.00850037][ 3.78483785 0.03272285] 这个结果其实可以预料,因为上面三个投影后的特征维度的方差分别为:[ 3.78483785 0.03272285 0.03201892],投影到二维后选择的肯定是前两个特征,而抛弃第三个特征。 为了有个直观的认识,我们看看此时转化后的数据分布,代码如下: X_new = pca.transform(X) plt.scatter(X_new[:, 0], X_new[:, 1],marker='o') plt.show() 输出的图如下: 可见降维后的数据依然可以很清楚的看到我们之前三维图中的4个簇。 现在我们看看不直接指定降维的维度,而指定降维后的主成分方差和比例。 pca = PCA(n_components=0.95) pca.fit(X) print pca.explained_variance_ratio_ print pca.explained_variance_ print pca.n_components_ 我们指定了主成分至少占95%,输出如下: [ 0.98318212] [ 3.78483785] 1 可见只有第一个投影特征被保留。这也很好理解,我们的第一个主成分占投影特征的方差比例高达98%。只选择这一个特征维度便可以满足95%的阈值。我们现在选择阈值99%看看,代码如下: pca = PCA(n_components=0.99) pca.fit(X) print pca.explained_variance_ratio_ print pca.explained_variance_ print pca.n_components_ 此时的输出如下: [ 0.98318212 0.00850037] [ 3.78483785 0.03272285] 2 这个结果也很好理解,因为我们第一个主成分占了98.3%的方差比例,第二个主成分占了0.8%的方差比例,两者一起可以满足我们的阈值。 最后我们看看让MLE算法自己选择降维维度的效果,代码如下: pca = PCA(n_components='mle') pca.fit(X) print pca.explained_variance_ratio_ print pca.explained_variance_ print pca.n_components_ 输出结果如下: [ 0.98318212][ 3.78483785]1 可见由于我们的数据的第一个投影特征的方差占比高达98.3%,MLE算法只保留了我们的第一个特征。 (欢迎转载,转载请注明出处。欢迎沟通交流: liujianping-ok@163.com) posted @ 2017-01-02 20:55 刘建平Pinard 阅读(154250) 评论(74) 编辑 收藏 举报 会员力量,点亮园子希望 刷新页面返回顶部 公告 Copyright © 2024 刘建平Pinard Powered by .NET 8.0 on Kubernetes PCA example with Iris Data-set — scikit-learn 1.4.1 documentation Install User Guide API Examples Community Getting Started Tutorial What's new Glossary Development FAQ Support Related packages Roadmap Governance About us GitHub Other Versions and Download More Getting Started Tutorial What's new Glossary Development FAQ Support Related packages Roadmap Governance About us GitHub Other Versions and Download Toggle Menu PrevUp Next scikit-learn 1.4.1 Other versions Please cite us if you use the software. PCA example with Iris Data-set Note Go to the end to download the full example code or to run this example in your browser via JupyterLite or Binder PCA example with Iris Data-set¶ Principal Component Analysis applied to the Iris dataset. See here for more information on this dataset. # Code source: Gaël Varoquaux # License: BSD 3 clause import matplotlib.pyplot as plt # unused but required import for doing 3d projections with matplotlib < 3.2 import mpl_toolkits.mplot3d # noqa: F401 import numpy as np from sklearn import datasets, decomposition np.random.seed(5) iris = datasets.load_iris() X = iris.data y = iris.target fig = plt.figure(1, figsize=(4, 3)) plt.clf() ax = fig.add_subplot(111, projection="3d", elev=48, azim=134) ax.set_position([0, 0, 0.95, 1]) plt.cla() pca = decomposition.PCA(n_components=3) pca.fit(X) X = pca.transform(X) for name, label in [("Setosa", 0), ("Versicolour", 1), ("Virginica", 2)]: ax.text3D( X[y == label, 0].mean(), X[y == label, 1].mean() + 1.5, X[y == label, 2].mean(), name, horizontalalignment="center", bbox=dict(alpha=0.5, edgecolor="w", facecolor="w"), ) # Reorder the labels to have colors matching the cluster results y = np.choose(y, [1, 2, 0]).astype(float) ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.nipy_spectral, edgecolor="k") ax.xaxis.set_ticklabels([]) ax.yaxis.set_ticklabels([]) ax.zaxis.set_ticklabels([]) plt.show() Total running time of the script: (0 minutes 0.095 seconds) Download Jupyter notebook: plot_pca_iris.ipynb Download Python source code: plot_pca_iris.py Related examples The Iris Dataset The Iris Dataset K-means Clustering K-means Clustering Sparsity Example: Fitting only features 1 and 2 Sparsity Example: Fitting only features 1 and 2 Incremental PCA Incremental PCA Comparison of LDA and PCA 2D projection of Iris dataset Comparison of LDA and PCA 2D projection of Iris dataset Gallery generated by Sphinx-Gallery © 2007 - 2024, scikit-learn developers (BSD License). Show this page source用Python (scikit-learn) 做PCA分析 - 知乎
【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客
>2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation
Sklearn 可视化数据: 主成分分析(PCA) - 知乎
主成分分析(PCA)-scikit-learn中文社区
用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园
PCA example with Iris Data-set — scikit-learn 1.4.1 documentation