比特派app最新版下载苹果|sklearn pca

作者：比特派app最新版下载苹果

2024-03-09 22:23:15

sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

sklearn.decomposition.PCA

PCA

PCA.fit

PCA.fit_transform

PCA.get_covariance

PCA.get_feature_names_out

PCA.get_metadata_routing

PCA.get_params

PCA.get_precision

PCA.inverse_transform

PCA.score

PCA.score_samples

PCA.set_output

PCA.set_params

PCA.transform

Examples using sklearn.decomposition.PCA

sklearn.decomposition.PCA¶

class sklearn.decomposition.PCA(n_components=None, *, copy=True, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', n_oversamples=10, power_iteration_normalizer='auto', random_state=None)[source]¶

Principal component analysis (PCA).

Linear dimensionality reduction using Singular Value Decomposition of the

data to project it to a lower dimensional space. The input data is centered

but not scaled for each feature before applying the SVD.

It uses the LAPACK implementation of the full SVD or a randomized truncated

SVD by the method of Halko et al. 2009, depending on the shape of the input

data and the number of components to extract.

It can also use the scipy.sparse.linalg ARPACK implementation of the

truncated SVD.

Notice that this class does not support sparse input. See

TruncatedSVD for an alternative with sparse data.

For a usage example, see

PCA example with Iris Data-set

SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎

SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎首发于数据分析切换模式写文章登录/注册SKLEARN中的PCA(Principal Component Analysis)主成分分析法HarryYang致力于用算法评估管理，改善管理PCA(Principal Component Analysis)主成分分析法是机器学习中非常重要的方法，主要作用有降维和可视化。PCA的过程除了背后深刻的数学意义外，也有深刻的思路和方法。1. 准备数据集本文利用sklearn中的datasets的Iris数据做示范，说明sklearn中的PCA方法。导入数据并对数据做一个概览：import numpy as np

import matplotlib.pyplot as plt

from sklearn import datasets

digits = datasets.load_digits()

X = digits.data

y = digits.target

X.shape,y.shape

((1797, 64), (1797,))将数据做一个分离，分离成训练数据集和测试数据集：from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,random_state = 666)

X_train.shape,X_test.shape

((1347, 64), (450, 64))2. 利用PCA对数据集进行降维将按照降维前，降维后的模型训练时间和score来了解PCA的作用首先不对数据集做处理，直接fit，同时对降维之前的fit过程计时，查看score：%%time

from sklearn.neighbors import KNeighborsClassifier

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train, y_train)

Wall time: 88 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test, y_test)

0.9866666666666667通过sklearn中的PCA对数据集进行降维，查看降维后的运行时间和score:from sklearn.decomposition import PCA

pca = PCA(n_components = 2)

pca.fit(X_train)

X_train_reduction = pca.transform(X_train)

X_test_reduction = pca.transform(X_test)

%%time

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train_reduction,y_train)

Wall time: 3 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test_reduction,y_test)

0.6066666666666667从以上数据来看，时间的运行时间有明显降低，但是准确率不是我们可以接受的：#通过sklearn.PCA.explaine_variance_ration_来查看刚刚的2个纬度的方差爱解释度：

pca.explained_variance_ratio_

array([0.14566817, 0.13735469])

pca = PCA(n_components = X_train.shape[1])##计算所有纬度的方差解释度

pca.fit(X_train)

pca.explained_variance_ratio_

array([1.45668166e-01, 1.37354688e-01, 1.17777287e-01, 8.49968861e-02,

5.86018996e-02, 5.11542945e-02, 4.26605279e-02, 3.60119663e-02,

3.41105814e-02, 3.05407804e-02, 2.42337671e-02, 2.28700570e-02,

1.80304649e-02, 1.79346003e-02, 1.45798298e-02, 1.42044841e-02,

1.29961033e-02, 1.26617002e-02, 1.01728635e-02, 9.09314698e-03,

8.85220461e-03, 7.73828332e-03, 7.60516219e-03, 7.11864860e-03,

6.85977267e-03, 5.76411920e-03, 5.71688020e-03, 5.08255707e-03,

4.89020776e-03, 4.34888085e-03, 3.72917505e-03, 3.57755036e-03,

3.26989470e-03, 3.14917937e-03, 3.09269839e-03, 2.87619649e-03,

2.50362666e-03, 2.25417403e-03, 2.20030857e-03, 1.98028746e-03,

1.88195578e-03, 1.52769283e-03, 1.42823692e-03, 1.38003340e-03,

1.17572392e-03, 1.07377463e-03, 9.55152460e-04, 9.00017642e-04,

5.79162563e-04, 3.82793717e-04, 2.38328586e-04, 8.40132221e-05,

5.60545588e-05, 5.48538930e-05, 1.08077650e-05, 4.01354717e-06,

1.23186515e-06, 1.05783059e-06, 6.06659094e-07, 5.86686040e-07,

1.71368535e-33, 7.44075955e-34, 7.44075955e-34, 7.15189459e-34])

##将各个纬度的累计解释度和特征的数量绘制线形图如下：

plt.plot([i for i in range(X_train.shape[1])],

[np.sum(pca.explained_variance_ratio_[:i+1]) for i in range(X_train.shape[1])])

plt.show()其中sklearn已经有一个封装好的通过方差解释度来确定数据特征数量的超参数。以下按95%的解释度来计算特征数量：#其中sklearn中已经有封装一个函数

pca = PCA(0.95)

pca.fit(X_train)

PCA(copy=True, iterated_power='auto', n_components=0.95, random_state=None,

svd_solver='auto', tol=0.0, whiten=False)

#查看选择特征的数量

pca.n_components_

X_train_reduction = pca.transform(X_train)

X_test_reduction = pca.transform(X_test)

#查看各个特征的方差解释度：

pca.explained_variance_ratio_

array([0.14566817, 0.13735469, 0.11777729, 0.08499689, 0.0586019 ,

0.05115429, 0.04266053, 0.03601197, 0.03411058, 0.03054078,

0.02423377, 0.02287006, 0.01803046, 0.0179346 , 0.01457983,

0.01420448, 0.0129961 , 0.0126617 , 0.01017286, 0.00909315,

0.0088522 , 0.00773828, 0.00760516, 0.00711865, 0.00685977,

0.00576412, 0.00571688, 0.00508256])

%%time

##由此可看到fit时间明显降低

knn_clf = KNeighborsClassifier()

knn_clf.fit(X_train_reduction,y_train)

Wall time: 8 ms

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',

metric_params=None, n_jobs=None, n_neighbors=5, p=2,

weights='uniform')

knn_clf.score(X_test_reduction,y_test)

0.98由此可以看到纬度有67减低到28之后，score的降低比较小，但是训练时间降低非常多。pca = PCA(n_components = 2)

pca.fit(X)

X_reduction = pca.transform(X)3. PCA在可视化中的应用虽然降低纬度后会损失信息量，降到二维进行可视化，是一个很好的方法：##将数据集的的前2个特征绘制散点图：

for i in range(10):

plt.scatter(X_reduction[y == i, 0],X_reduction[y==i,1],alpha=0.8)

plt.show()从上图可以看到，对多维度的数据，可以通过降纬后进行可是，让人对很多的数据有一个直观的了解。发布于 2020-01-29 16:38数据降维sklearnPrincipal Component Analysis赞同 212 条评论分享喜欢收藏申请转载文章被以下专栏收录数据分析通过量化和数据改善管

具体介绍sklearn库中：主成分分析（PCA）的参数、属性、方法_sklearn pca参数-CSDN博客

具体介绍sklearn库中：主成分分析（PCA）的参数、属性、方法

最新推荐文章于 2024-03-08 23:20:52 发布

SGangX

最新推荐文章于 2024-03-08 23:20:52 发布

阅读量2.3w

215

点赞数

分类专栏：

多元统计分析

机器学习

文章标签：

python

机器学习

pca降维

数据分析

统计学

本文链接：https://blog.csdn.net/weixin_44781900/article/details/104839136

版权

多元统计分析

同时被 2 个专栏收录

2 篇文章

0 订阅

订阅专栏

机器学习

1 篇文章

0 订阅

订阅专栏

转载请注明出处：https://editor.csdn.net/md?articleId=104839136

文章目录

主成分分析（PCA）Sklearn库中PCA一、参数说明（Parameters）二、属性（Attributes）三、方法（Methods）四、示例（Sample）五、参考资料（Reference data）

主成分分析（PCA）

主成分分析（Principal components analysis，以下简称PCA）的思想是将n维特征映射到k维上（k

这里主要针对用Sklearn库里的PCA，并解释里面的参数、属性、方法。

Sklearn库中PCA

一、参数说明（Parameters）

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

1. n_components：int, float, None or str 意义 :代表返回的主成分的个数,也就是你想把数据降到几维 n_components=2 代表返回前2个主成分 0 < n_components < 1代表满足最低的主成分方差累计贡献率 n_components=0.98，指返回满足主成分方差累计贡献率达到98%的主成分 n_components=None，返回所有主成分 n_components=‘mle’，将自动选取主成分个数n，使得满足所要求的方差百分比

2. copy ： bool类型, False/True 默认是True 意义：在运行的过程中，是否将原数据复制。由于你在运行的过程中，是在降维，数据会变动。这copy主要影响的是，调用显示降维后的数据的方法不同。 copy=True时，直接 fit_transform(X)，就能够显示出降维后的数据。 copy=False时，需要 fit(X).transform(X) ，才能够显示出降维后的数据。（fit_transform()方法后面会讲到！）

3. whiten：bool类型，False/True 默认是False 意义：白化。白化是一种重要的预处理过程，其目的就是降低输入数据的冗余性，使得经过白化处理的输入数据具有如下性质：(i)特征之间相关性较低；(ii)所有特征具有相同的方差。

4. svd_solver：str类型，str {‘auto’, ‘full’, ‘arpack’, ‘randomized’} 意义：定奇异值分解 SVD 的方法。 svd_solver=auto：PCA 类自动选择下述三种算法权衡。 svd_solver=‘full’:传统意义上的 SVD，使用了 scipy 库对应的实现。 svd_solver=‘arpack’:直接使用 scipy 库的 sparse SVD 实现，和 randomized 的适用场景类似。 svd_solver=‘randomized’:适用于数据量大，数据维度多同时主成分数目比例又较低的 PCA 降维。

二、属性（Attributes）

1. components_：返回最大方差的主成分。 2. explained_variance_：它代表降维后的各主成分的方差值。方差值越大，则说明越是重要的主成分。 3. explained_variance_ratio_：它代表降维后的各主成分的方差值占总方差值的比例，这个比例越大，则越是重要的主成分。（主成分方差贡献率） 4. singular_values_：返回所被选主成分的奇异值。实现降维的过程中，有两个方法，一种是用特征值分解，另一种用奇异值分解，前者限制比较多，需要矩阵是方阵，而后者可以是任意矩阵，而且计算量比前者少，所以说一般实现PCA都是用奇异值分解的方式。 5. mean_：每个特征的经验平均值，由训练集估计。 6. n_features_：训练数据中的特征数。 7. n_samples_：训练数据中的样本数量。 8. noise_variance_：噪声协方差

三、方法（Methods）

1. fit(self, X，Y=None) #模型训练，由于PCA是无监督学习，所以Y=None，没有标签。如：

model=decomposition.PCA(n_components=2)

model.fit(X)

2. fit_transform(self, X,Y=None)#：将模型与X进行训练，并对X进行降维处理，返回的是降维后的数据。如：

X_new=model.fit_transform(X)

3. get_covariance(self)#获得协方差数据 4. get_params(self,deep=True)#返回模型的参数如：

print(model.get_params())

输出：{'copy': True, 'iterated_power': 'auto', 'n_components': 3, 'random_state': None, 'svd_solver': 'auto', 'tol': 0.0, 'whiten': False}

5. get_precision(self)#计算数据精度矩阵（用生成模型） 6. inverse_transform(self, X)#将降维后的数据转换成原始数据，但可能不会完全一样 7. score(self, X, Y=None)#计算所有样本的log似然平均值 8. transform(X)#将数据X转换成降维后的数据。当模型训练好后，对于新输入的数据，都可以用transform方法来降维。

四、示例（Sample）

import numpy as np

from sklearn import decomposition,datasets

iris=datasets.load_iris()#加载数据

X=iris['data']

model=decomposition.PCA(n_components=2)

model.fit(X)

X_new=model.fit_transform(X)

Maxcomponent=model.components_

ratio=model.explained_variance_ratio_

score=model.score(X)

print('降维后的数据:',X_new)

print('返回具有最大方差的成分:',Maxcomponent)

print('保留主成分的方差贡献率:',ratio)

print('所有样本的log似然平均值:',score)

print('奇异值:',model.singular_values_)

print('噪声协方差:',model.noise_variance_)

g1=plt.figure(1,figsize=(8,6))

plt.scatter(X_new[:,0],X_new[:,1],c='r',cmap=plt.cm.Set1, edgecolor='k', s=40)

plt.xlabel('x1')

plt.ylabel('x2')

plt.title('After the dimension reduction')

plt.show()

五、参考资料（Reference data）

主成分分析（Principal components analysis）-最大方差解释： https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html 主成分分析（Principal components analysis）-最小平方误差解释： https://www.cnblogs.com/jerrylead/archive/2011/04/18/2020216.html 机器学习（七）白化whitening: https://blog.csdn.net/hjimce/article/details/50864602?depth_1-utm_source=distribute.pc_relevant.none-task&utm_source=distribute.pc_relevant.none-task scikit-learn源码之降维–PCA： https://zhuanlan.zhihu.com/p/53268659 Sklearn中的PCA： https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html?highlight=pca#sklearn.decomposition.PCA.set_params

优惠劵

SGangX

关注

踩

215

觉得还不错?

一键收藏

知道了

具体介绍sklearn库中：主成分分析（PCA）的参数、属性、方法

主成分分析（PCA）主成分分析（Principal components analysis，以下简称PCA）的思想是将n维特征映射到k维上（k<n），这k维是全新的正交特征(新的坐标系)。这k维特征称为主元，是重新构造出来的k维特征，而不是简单地从n维特征中去除其余n-k维特征。实现这思想的方法就是降维，用低维的数据去代表高维的数据，也就是用少数几个变量代替原有的数目庞大的变量，把重复的信...

复制链接

扫一扫

专栏目录

PCA.rar_DEMO_PCA VS_VS PCA_VS下主成分分析PCA使用方法_主成分分析

09-21

一个主成分分析算法的典型使用demo,在vs平台下实现

主成分分析（PCA）：主成分分析（PCA）-matlab开发

05-31

对实值数据进行主成分分析 (PCA)。

有两种方法可用：'eig' 和 'svd'，它们分别通过特征值分解和奇异值分解来解决问题。请注意“svd”在“经济”模式下运行。

7 条评论

您还未登录，请先

后发表或查看评论

使用Sklearn学习降维算法PCA和SVD

理科男同学

10-24

4811

1，概述

1.1，什么是维度？

我们先来解释一下维度的概念。

对于数组和Series来说，维度就是功能shape返回的结果，shape中返回了几个数字，就是几维。索引以外的数据，不分行列的叫一维（此时shape返回唯一的维度上的数据个数），有行列之分叫二维（shape返回行x列），也称为表。一张表最多二维，复数的表构成了更高的维度。当一个数组中存在2张3行4列的表时，shape返回的是(更高维，行，列)。当数组中存在2组2张3行4列的表时，数据就是4维，shape返回(2,2,3,4)。

数组中的每

sklearn.decomposition.PCA 参数速查手册

02-10

1801

sklearn.decomposition.PCA 参数速查手册

调用

sklearn.decomposition.PCA(ncomponents=None, copy=True, whiten=False, svdsolver='auto', tol=0.0, iteratedpower='auto', randomstate=None)参数n_components

释义

PCA 算法中所要保...

sklearn中主成分分析PCA参数解释

知识搬运者

08-20

737

【代码】sklearn中主成分分析PCA参数解释。

scikit-learn中PCA的使用方法

用Python (scikit-learn) 做PCA分析 - 知乎

用Python (scikit-learn) 做PCA分析 - 知乎首发于数据应用学院切换模式写文章登录/注册用Python (scikit-learn) 做PCA分析数据应用学院原始图像(左)保留不同数量的方差我的上一个教程讨论了使用Python的逻辑回归（https://towardsdatascience.com/logistic-regression-using-python-sklearn-numpy-mnist-handwriting-recognition-matplotlib-a6b31e2b166a）。我们学到的一件事是，你可以通过改变优化算法来加速机器学习算法的拟合。加速机器学习算法的一种更常见的方法是使用主成分分析 Principal Component Analysis (PCA)。如果你的学习算法太慢，因为输入维数太高，那么使用PCA来加速是一个合理的选择。这可能是PCA最常见的应用。PCA的另一个常见应用是数据可视化。为了理解使用PCA进行数据可视化的价值，本教程的第一部分介绍了应用PCA后对IRIS数据集的基本可视化。第二部分使用PCA来加速MNIST数据集上的机器学习算法(逻辑回归)。现在，让我们开始吧！本教程中使用的代码如下所示：“PCA的数据可视化的应用https://http://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Data_Visualization_Iris_Dataset_Blog.ipynb用PCA来加速机器学习的计算https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_to_Speed-up_Machine_Learning_Algorithms.ipynb”PCA在数据可视化的应用对于许多机器学习应用程序来说，能够可视化你的数据是很有帮助的。将2维或3维数据可视化并不那么困难。然而，即使在本教程的这一部分中使用的Iris数据集也是四维的。你可以使用主成分分析将四维数据减少到2维或3维这样你就能更好地绘制并理解数据。加载Iris数据集Iris数据集是scikit-learn附带的数据集之一，不需要从外部网站下载任何文件。下面的代码将加载iris数据集。import pandas as pdurl = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"# load dataset into Pandas DataFramedf = pd.read_csv(url, names=['sepal length','sepal width','petal length','petal width','target'])原版pandas df（特征+目标）标准化数据主成分分析受尺度影响，所以在应用主成分分析之前，需要对数据中的特征进行尺度分析。使用StandardScaler帮助你将数据集的特性标准化到单元尺度(均值= 0，方差= 1)，这是许多机器学习算法实现最佳性能的要求。如果你想看到不缩放数据可能带来的负面影响，scikit-learn有一节是讲关于不标准化数据的影响的（https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py）。from sklearn.preprocessingimport StandardScalerfeatures = ['sepal length', 'sepal width', 'petal length', 'petal width']# Separating out the featuresx = df.loc[:, features].values# Separating out the targety = df.loc[:,['target']].values# Standardizing the featuresx = StandardScaler().fit_transform(x)标准化前后的数组x(由panda dataframe显示)主成分分析二维“投影”原始数据有4列(萼片长度、萼片宽度、花瓣长度和花瓣宽度)。在本节中，代码将原来的4维数据“投影”到2维中。我要指出的是，在降维之后，通常不会给每个主成分赋予一个特定的意义。新的成分只是变化的两个主要维度。from sklearn.decompositionimport PCApca = PCA(n_components=2)principalComponents = pca.fit_transform(x)principalDf = pd.DataFrame(data = principalComponents , columns = ['principal component 1', 'principal component 2'])PCA和保留前2个主成分finalDf = pd.concat([principalDf, df[['target']]], axis = 1)通过设置axis=1连接dataframe。finalDf是绘制数据之前的最后一个dataframe。可视化二维”投影”这部分只是绘制二维数据。请注意下面的图表，这些类似乎彼此分离得很好。fig = plt.figure(figsize = (8,8))ax = fig.add_subplot(1,1,1) ax.set_xlabel('Principal Component 1', fontsize = 15)ax.set_ylabel('Principal Component 2', fontsize = 15)ax.set_title('2 component PCA', fontsize = 20)targets = ['Iris-setosa', 'Iris-versicolor', 'Iris-virginica']colors = ['r', 'g', 'b']for target, color in zip(targets,colors): indicesToKeep = finalDf['target'] == target ax.scatter(finalDf.loc[indicesToKeep, 'principal component 1'] , finalDf.loc[indicesToKeep, 'principal component 2'] , c = color , s = 50)ax.legend(targets)ax.grid()2分量主成分图被解释的方差被解释的方差告诉你有多少信息(方差)可以归因于每个主成分。这很重要，因为当你把四维空间转换成二维空间时，你会丢失一些方差(信息)。通过使用属性explained_variance_ratio_，你可以看到第一个主成分包含了72.77%的方差，第二个主成分包含了23.03%的方差。这两个部分总共包含了95.80%的信息。pca.explained_variance_ratio_PCA加速机器学习算法PCA最重要的应用之一是加速机器学习算法。在这里使用IRIS数据集是不切实际的，因为该数据集只有150行和4个特征列。MNIST手写数字数据库更合适，因为它有784个特征列(784个维度)、一组包含60,000个示例的训练集和一组包含10,000个示例的测试集。下载并加载数据还可以向fetch_mldata添加data_home参数，以更改下载数据的位置。from sklearn.datasets import fetch_openmlmnist = fetch_openml('mnist_784')你下载的图像包含在MNIST中。数据和形状(70000, 784)意味着有70000张具有784个维度(784个特征)的图像。标签(整数0-9)包含在mnist.target中。功能是784维(28 x 28图像)和标签只是从0到9的数字。将数据分解为训练集和测试集一般来说，训练测试分为80%的训练和20%的测试。在这个例子中，我选择了6/7的数据作为训练，1/7的数据作为测试集。from sklearn.model_selection import train_test_split# test_size: what proportion of original data is used for test settrain_img, test_img, train_lbl, test_lbl = train_test_split( mnist.data, mnist.target, test_size=1/7.0, random_state=0)标准化数据这一段的文字几乎完全是早先所写内容的翻版。主成分分析受尺度影响，因此在应用主成分分析之前，需要对数据中的特征进行尺度分析。你可以将数据转换到单位尺度(均值= 0和方差= 1)，这是许多机器学习算法的最优性能的要求。StandardScaler帮助标准化数据集的特性。注意，你适合于训练集，并在训练和测试集上进行转换。如果你想了解不缩放数据可能带来的负面影响，scikit-learn有一节介绍不标准化数据的影响（https://scikit-learn.org/stable/auto_examples/preprocessing/plot_scaling_importance.html#sphx-glr-auto-examples-preprocessing-plot-scaling-importance-py）。from sklearn.preprocessing import StandardScalerscaler = StandardScaler()# Fit on training set only.scaler.fit(train_img)# Apply transform to both the training set and the test set.train_img = scaler.transform(train_img)test_img = scaler.transform(test_img)导入并应用PCA注意，下面的代码使用.95作为成分数量参数。这意味着scikit-learn选择主成分的最小数量，这样95%的方差被保留。from sklearn.decomposition import PCA# Make an instance of the Modelpca = PCA(.95)在训练集中安装主成分分析。注意:你只在训练集中安装主成分分析。pca.fit(train_img)注意：通过使用pca.n_components_对模型进行拟合，可以知道PCA选择了多少个成分。在这种情况下，95%的方差相当于330个主成分。将“映射”(转换)应用到训练集和测试集。train_img = pca.transform(train_img)test_img = pca.transform(test_img)对转换后的数据应用逻辑回归步骤1：导入你想要使用的模型在sklearn中，所有的机器学习模型都被用作Python class。from sklearn.linear_model import LogisticRegression步骤2：创建模型的实例。#未指定的所有参数都设置为默认值#默认解算器非常慢，这就是为什么它被改为“lbfgs”logisticRegr = LogisticRegression(solver = 'lbfgs')步骤3：在数据上训练模型，存储从数据中学习到的信息模型学习的是数字和标签之间的关系logisticRegr.fit(train_img, train_lbl)步骤4:预测新数据(新图像)的标签使用模型在模型训练过程中学习到的信息下面的代码预测了一个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0].reshape(1,-1))下面的代码一次预测了多个观察结果#预测一次观测(图片)logisticRegr.predict(test_img[0:10])测量模型的性能虽然准确度并不总是机器学习算法的最佳度量标准(精度、回忆、F1分数、ROC曲线等会更好（https://http://towardsdatascience.com/receiver-operating-characteristic-curves-demystified-in-python-bd531a4364d0 )，但这里使用它是为了简单。logisticRegr.score(test_img, test_lbl)主成分分析后拟合逻辑回归的时间本节教程的全部目的是向你展示可以使用PCA来加速机器学习算法的拟合。下表显示了在我的MacBook上使用PCA(每次保留不同数量的方差)后进行logistic回归所花费的时间)。主成分分析后的逻辑回归的拟合时间，保留不同方差分量压缩后的图像重建本教程前面的部分演示了如何使用PCA将高维数据压缩为低维数据。我想简要地提一下，PCA还可以将数据的压缩重建(低维数据)还原为原始高维数据的近似形式。如果你对生成下图的代码感兴趣，请查看我的github（https://github.com/mGalarnyk/Python_Tutorials/blob/master/Sklearn/PCA/PCA_Image_Reconstruction_and_such.ipynb）。主成分分析后的原始图像(左)和原始数据的近似(右) 总结思想这篇文章我本来可以写得更长一些，因为PCA有很多不同的用途。我希望这篇文章能对你有所帮助。我的下一个机器学习教程将介绍如何理解用于分类的决策树（https://towardsdatascience.com/understanding-decision-trees-for-classification-python-9663d683c952）。更多高质量科技类原创文章，请访问数据应用学院官网Blog：https://www.dataapplab.com/参加数据应用学院线上免费公开课：https://www.dataapplab.com/event/查看数据应用学院往期课程视频：https://www.youtube.com/channel/UCa8NLpvi70mHVsW4J_x9OeQ发布于 2020-06-03 18:08sklearnPrincipal Component AnalysisPython赞同 121 条评论分享喜欢收藏申请转载文章被以下专栏收录数据应用学院北美首屈一指的Data Bootcamp (www.DataAppLab.c

【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客

【python】sklearn中PCA的使用方法

我从崖边跌落

已于 2022-04-14 14:55:00 修改

阅读量10w+

735

点赞数

158

分类专栏：

python编程

算法

文章标签：

python

于 2019-07-09 23:01:53 首次发布

本文链接：https://blog.csdn.net/qq_20135597/article/details/95247381

版权

python编程

同时被 2 个专栏收录

11 篇文章

2 订阅

订阅专栏

算法

4 篇文章

0 订阅

订阅专栏

from sklearn.decomposition import PCA

PCA

主成分分析（Principal Components Analysis），简称PCA，是一种数据降维技术，用于数据预处理。

PCA的一般步骤是：先对原始数据零均值化，然后求协方差矩阵，接着对协方差矩阵求特征向量和特征值，这些特征向量组成了新的特征空间。

sklearn.decomposition.PCA(n_components=None, copy=True, whiten=False)

参数：

n_components:

意义：PCA算法中所要保留的主成分个数n，也即保留下来的特征个数n

类型：int 或者 string，缺省时默认为None，所有成分被保留。

赋值为int，比如n_components=1，将把原始数据降到一个维度。

赋值为string，比如n_components='mle'，将自动选取特征个数n，使得满足所要求的方差百分比。

copy:

类型：bool，True或者False，缺省时默认为True。

意义：表示是否在运行算法时，将原始训练数据复制一份。若为True，则运行PCA算法后，原始训练数据的值不会有任何改变，因为是在原始数据的副本上进行运算；若为False，则运行PCA算法后，原始训练数据的值会改，因为是在原始数据上进行降维计算。

whiten:

类型：bool，缺省时默认为False

意义：白化，使得每个特征具有相同的方差。

PCA属性：

components_ ：返回具有最大方差的成分。explained_variance_ratio_：返回所保留的n个成分各自的方差百分比。n_components_：返回所保留的成分个数n。mean_：noise_variance_：

PCA方法：

1、fit(X,y=None)

fit(X)，表示用数据X来训练PCA模型。

函数返回值：调用fit方法的对象本身。比如pca.fit(X)，表示用X对pca这个对象进行训练。

拓展：fit()可以说是scikit-learn中通用的方法，每个需要训练的算法都会有fit()方法，它其实就是算法中的“训练”这一步骤。因为PCA是无监督学习算法，此处y自然等于None。

2、fit_transform(X)

用X来训练PCA模型，同时返回降维后的数据。

newX=pca.fit_transform(X)，newX就是降维后的数据。

3、inverse_transform()

将降维后的数据转换成原始数据，X=pca.inverse_transform(newX)

4、transform(X)

将数据X转换成降维后的数据。当模型训练好后，对于新输入的数据，都可以用transform方法来降维。

此外，还有get_covariance()、get_precision()、get_params(deep=True)、score(X, y=None)等方法，以后用到再补充吧。

实例：

import numpy as np

from sklearn.decomposition import PCA

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

pca = PCA(n_components=2)

newX = pca.fit_transform(X) #等价于pca.fit(X) pca.transform(X)

invX = pca.inverse_transform(newX) #将降维后的数据转换成原始数据

print(X)

[[-1 -1]

[-2 -1]

[-3 -2]

[ 1 1]

[ 2 1]

[ 3 2]]

print(newX）

array([[ 1.38340578, 0.2935787 ],

[ 2.22189802, -0.25133484],

[ 3.6053038 , 0.04224385],

[-1.38340578, -0.2935787 ],

[-2.22189802, 0.25133484],

[-3.6053038 , -0.04224385]])

print(invX)

[[-1 -1]

[-2 -1]

[-3 -2]

[ 1 1]

[ 2 1]

[ 3 2]]

print(pca.explained_variance_ratio_)

[ 0.99244289 0.00755711]

我们所训练的pca对象的n_components值为2，即保留2个特征，第一个特征占所有特征的方差百分比为0.99244289，意味着几乎保留了所有的信息。即第一个特征可以99.24%表达整个数据集，因此我们可以降到1维：

pca = PCA(n_components=1)

newX = pca.fit_transform(X)

print(pca.explained_variance_ratio_)

[ 0.99244289]

优惠劵

我从崖边跌落

关注

158

踩

735

觉得还不错?

一键收藏

知道了

【python】sklearn中PCA的使用方法

from sklearn.decomposition import PCAPCA主成分分析（Principal Components Analysis），简称PCA，是一种数据降维技术，用于数据预处理。PCA的一般步骤是：先对原始数据零均值化，然后求协方差矩阵，接着对协方差矩阵求特征向量和特征值，这些特征向量组成了新的特征空间。sklearn.decomposition.PC...

复制链接

扫一扫

专栏目录

机器学习代码实战——PCA（主成分分析）

12-22

文章目录1.主成分分析基本概念2.代码

1.主成分分析基本概念

2.代码

导入必要的库

import pandas as pd

import numpy as np

from sklearn.datasets import load_iris #sklearn中导入load_iris数据

import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler

%matplotlib inline

加载数据，构建DataFrame

iris = load_iris() #加载数据集

df =

Python sklearn库实现PCA教程(以鸢尾花分类为例)

09-17

今天小编就为大家分享一篇Python sklearn库实现PCA教程(以鸢尾花分类为例)，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧

19 条评论

您还未登录，请先

后发表或查看评论

[Python] 什么是PCA降维技术以及scikit-learn中PCA类使用案例（图文教程，含详细代码）

2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

2.5. Decomposing signals in components (matrix factorization problems)

2.5.1. Principal component analysis (PCA)

2.5.1.1. Exact PCA and probabilistic interpretation

2.5.1.2. Incremental PCA

2.5.1.3. PCA using randomized SVD

2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)

2.5.2. Kernel Principal Component Analysis (kPCA)

2.5.2.1. Exact Kernel PCA

2.5.2.2. Choice of solver for Kernel PCA

2.5.3. Truncated singular value decomposition and latent semantic analysis

2.5.4. Dictionary Learning

2.5.4.1. Sparse coding with a precomputed dictionary

2.5.4.2. Generic dictionary learning

2.5.4.3. Mini-batch dictionary learning

2.5.5. Factor Analysis

2.5.6. Independent component analysis (ICA)

2.5.7. Non-negative matrix factorization (NMF or NNMF)

2.5.7.1. NMF with the Frobenius norm

2.5.7.2. NMF with a beta-divergence

2.5.7.3. Mini-batch Non Negative Matrix Factorization

2.5.8. Latent Dirichlet Allocation (LDA)

2.5. Decomposing signals in components (matrix factorization problems)¶

2.5.1. Principal component analysis (PCA)¶

2.5.1.1. Exact PCA and probabilistic interpretation¶

PCA is used to decompose a multivariate dataset in a set of successive

orthogonal components that explain a maximum amount of the variance. In

scikit-learn, PCA is implemented as a transformer object

that learns \(n\) components in its fit method, and can be used on new

data to project it on these components.

PCA centers but does not scale the input data for each feature before

applying the SVD. The optional parameter whiten=True makes it

possible to project the data onto the singular space while scaling each

component to unit variance. This is often useful if the models down-stream make

strong assumptions on the isotropy of the signal: this is for example the case

for Support Vector Machines with the RBF kernel and the K-Means clustering

algorithm.

Below is an example of the iris dataset, which is comprised of 4

features, projected on the 2 dimensions that explain most variance:

The PCA object also provides a

probabilistic interpretation of the PCA that can give a likelihood of

data based on the amount of variance it explains. As such it implements a

score method that can be used in cross-validation:

Examples:

PCA example with Iris Data-set

Comparison of LDA and PCA 2D projection of Iris dataset

Model selection with Probabilistic PCA and Factor Analysis (FA)

2.5.1.2. Incremental PCA¶

The PCA object is very useful, but has certain limitations for

large datasets. The biggest limitation is that PCA only supports

batch processing, which means all of the data to be processed must fit in main

memory. The IncrementalPCA object uses a different form of

processing and allows for partial computations which almost

exactly match the results of PCA while processing the data in a

minibatch fashion. IncrementalPCA makes it possible to implement

out-of-core Principal Component Analysis either by:

Using its partial_fit method on chunks of data fetched sequentially

from the local hard drive or a network database.

Calling its fit method on a memory mapped file using

numpy.memmap.

IncrementalPCA only stores estimates of component and noise variances,

in order update explained_variance_ratio_ incrementally. This is why

memory usage depends on the number of samples per batch, rather than the

number of samples to be processed in the dataset.

As in PCA, IncrementalPCA centers but does not scale the

input data for each feature before applying the SVD.

Examples:

Incremental PCA

2.5.1.3. PCA using randomized SVD¶

It is often interesting to project data to a lower-dimensional

space that preserves most of the variance, by dropping the singular vector

of components associated with lower singular values.

For instance, if we work with 64x64 pixel gray-level pictures

for face recognition,

the dimensionality of the data is 4096 and it is slow to train an

RBF support vector machine on such wide data. Furthermore we know that

the intrinsic dimensionality of the data is much lower than 4096 since all

pictures of human faces look somewhat alike.

The samples lie on a manifold of much lower

dimension (say around 200 for instance). The PCA algorithm can be used

to linearly transform the data while both reducing the dimensionality

and preserve most of the explained variance at the same time.

The class PCA used with the optional parameter

svd_solver='randomized' is very useful in that case: since we are going

to drop most of the singular vectors it is much more efficient to limit the

computation to an approximated estimate of the singular vectors we will keep

to actually perform the transform.

For instance, the following shows 16 sample portraits (centered around

0.0) from the Olivetti dataset. On the right hand side are the first 16

singular vectors reshaped as portraits. Since we only require the top

16 singular vectors of a dataset with size \(n_{samples} = 400\)

and \(n_{features} = 64 \times 64 = 4096\), the computation time is

less than 1s:

If we note \(n_{\max} = \max(n_{\mathrm{samples}}, n_{\mathrm{features}})\) and

\(n_{\min} = \min(n_{\mathrm{samples}}, n_{\mathrm{features}})\), the time complexity

of the randomized PCA is \(O(n_{\max}^2 \cdot n_{\mathrm{components}})\)

instead of \(O(n_{\max}^2 \cdot n_{\min})\) for the exact method

implemented in PCA.

The memory footprint of randomized PCA is also proportional to

\(2 \cdot n_{\max} \cdot n_{\mathrm{components}}\) instead of \(n_{\max}

\cdot n_{\min}\) for the exact method.

Note: the implementation of inverse_transform in PCA with

svd_solver='randomized' is not the exact inverse transform of

transform even when whiten=False (default).

Examples:

Faces recognition example using eigenfaces and SVMs

Faces dataset decompositions

References:

Algorithm 4.3 in

“Finding structure with randomness: Stochastic algorithms for

constructing approximate matrix decompositions”

Halko, et al., 2009

“An implementation of a randomized algorithm for principal component

analysis” A. Szlam et al. 2014

2.5.1.4. Sparse principal components analysis (SparsePCA and MiniBatchSparsePCA)¶

SparsePCA is a variant of PCA, with the goal of extracting the

set of sparse components that best reconstruct the data.

Mini-batch sparse PCA (MiniBatchSparsePCA) is a variant of

SparsePCA that is faster but less accurate. The increased speed is

reached by iterating over small chunks of the set of features, for a given

number of iterations.

Principal component analysis (PCA) has the disadvantage that the

components extracted by this method have exclusively dense expressions, i.e.

they have non-zero coefficients when expressed as linear combinations of the

original variables. This can make interpretation difficult. In many cases,

the real underlying components can be more naturally imagined as sparse

vectors; for example in face recognition, components might naturally map to

parts of faces.

Sparse principal components yields a more parsimonious, interpretable

representation, clearly emphasizing which of the original features contribute

to the differences between samples.

The following example illustrates 16 components extracted using sparse PCA from

the Olivetti faces dataset. It can be seen how the regularization term induces

many zeros. Furthermore, the natural structure of the data causes the non-zero

coefficients to be vertically adjacent. The model does not enforce this

mathematically: each component is a vector \(h \in \mathbf{R}^{4096}\), and

there is no notion of vertical adjacency except during the human-friendly

visualization as 64x64 pixel images. The fact that the components shown below

appear local is the effect of the inherent structure of the data, which makes

such local patterns minimize reconstruction error. There exist sparsity-inducing

norms that take into account adjacency and different kinds of structure; see

[Jen09] for a review of such methods.

For more details on how to use Sparse PCA, see the Examples section, below.

Note that there are many different formulations for the Sparse PCA

problem. The one implemented here is based on [Mrl09] . The optimization

problem solved is a PCA problem (dictionary learning) with an

\(\ell_1\) penalty on the components:

\[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2}

||X-UV||_{\text{Fro}}^2+\alpha||V||_{1,1} \\

\text{subject to } & ||U_k||_2 <= 1 \text{ for all }

0 \leq k < n_{components}\end{split}\]

\(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\)

stands for the entry-wise matrix norm which is the sum of the absolute values

of all the entries in the matrix.

The sparsity-inducing \(||.||_{1,1}\) matrix norm also prevents learning

components from noise when few training samples are available. The degree

of penalization (and thus sparsity) can be adjusted through the

hyperparameter alpha. Small values lead to a gently regularized

factorization, while larger values shrink many coefficients to zero.

Note

While in the spirit of an online algorithm, the class

MiniBatchSparsePCA does not implement partial_fit because

the algorithm is online along the features direction, not the samples

direction.

Examples:

Faces dataset decompositions

References:

[Mrl09]

“Online Dictionary Learning for Sparse Coding”

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009

[Jen09]

“Structured Sparse Principal Component Analysis”

R. Jenatton, G. Obozinski, F. Bach, 2009

2.5.2. Kernel Principal Component Analysis (kPCA)¶

2.5.2.1. Exact Kernel PCA¶

KernelPCA is an extension of PCA which achieves non-linear

dimensionality reduction through the use of kernels (see Pairwise metrics, Affinities and Kernels) [Scholkopf1997]. It

has many applications including denoising, compression and structured

prediction (kernel dependency estimation). KernelPCA supports both

transform and inverse_transform.

Note

KernelPCA.inverse_transform relies on a kernel ridge to learn the

function mapping samples from the PCA basis into the original feature

space [Bakir2003]. Thus, the reconstruction obtained with

KernelPCA.inverse_transform is an approximation. See the example

linked below for more details.

Examples:

Kernel PCA

References:

[Scholkopf1997]

Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller.

“Kernel principal component analysis.”

International conference on artificial neural networks.

Springer, Berlin, Heidelberg, 1997.

[Bakir2003]

Bakır, Gökhan H., Jason Weston, and Bernhard Schölkopf.

“Learning to find pre-images.”

Advances in neural information processing systems 16 (2003): 449-456.

2.5.2.2. Choice of solver for Kernel PCA¶

While in PCA the number of components is bounded by the number of

features, in KernelPCA the number of components is bounded by the

number of samples. Many real-world datasets have large number of samples! In

these cases finding all the components with a full kPCA is a waste of

computation time, as data is mostly described by the first few components

(e.g. n_components<=100). In other words, the centered Gram matrix that

is eigendecomposed in the Kernel PCA fitting process has an effective rank that

is much smaller than its size. This is a situation where approximate

eigensolvers can provide speedup with very low precision loss.

Eigensolvers

Click for more details

The optional parameter eigen_solver='randomized' can be used to

significantly reduce the computation time when the number of requested

n_components is small compared with the number of samples. It relies on

randomized decomposition methods to find an approximate solution in a shorter

time.

The time complexity of the randomized KernelPCA is

\(O(n_{\mathrm{samples}}^2 \cdot n_{\mathrm{components}})\)

instead of \(O(n_{\mathrm{samples}}^3)\) for the exact method

implemented with eigen_solver='dense'.

The memory footprint of randomized KernelPCA is also proportional to

\(2 \cdot n_{\mathrm{samples}} \cdot n_{\mathrm{components}}\) instead of

\(n_{\mathrm{samples}}^2\) for the exact method.

Note: this technique is the same as in PCA using randomized SVD.

In addition to the above two solvers, eigen_solver='arpack' can be used as

an alternate way to get an approximate decomposition. In practice, this method

only provides reasonable execution times when the number of components to find

is extremely small. It is enabled by default when the desired number of

components is less than 10 (strict) and the number of samples is more than 200

(strict). See KernelPCA for details.

References:

dense solver:

scipy.linalg.eigh documentation

randomized solver:

Algorithm 4.3 in

“Finding structure with randomness: Stochastic

algorithms for constructing approximate matrix decompositions”

Halko, et al. (2009)

“An implementation of a randomized algorithm

for principal component analysis”

A. Szlam et al. (2014)

arpack solver:

scipy.sparse.linalg.eigsh documentation

R. B. Lehoucq, D. C. Sorensen, and C. Yang, (1998)

2.5.3. Truncated singular value decomposition and latent semantic analysis¶

TruncatedSVD implements a variant of singular value decomposition

(SVD) that only computes the \(k\) largest singular values,

where \(k\) is a user-specified parameter.

TruncatedSVD is very similar to PCA, but differs

in that the matrix \(X\) does not need to be centered.

When the columnwise (per-feature) means of \(X\)

are subtracted from the feature values,

truncated SVD on the resulting matrix is equivalent to PCA.

About truncated SVD and latent semantic analysis (LSA)

Click for more details

When truncated SVD is applied to term-document matrices

(as returned by CountVectorizer or

TfidfVectorizer),

this transformation is known as

latent semantic analysis

(LSA), because it transforms such matrices

to a “semantic” space of low dimensionality.

In particular, LSA is known to combat the effects of synonymy and polysemy

(both of which roughly mean there are multiple meanings per word),

which cause term-document matrices to be overly sparse

and exhibit poor similarity under measures such as cosine similarity.

Note

LSA is also known as latent semantic indexing, LSI,

though strictly that refers to its use in persistent indexes

for information retrieval purposes.

Mathematically, truncated SVD applied to training samples \(X\)

produces a low-rank approximation \(X\):

\[X \approx X_k = U_k \Sigma_k V_k^\top\]

After this operation, \(U_k \Sigma_k\)

is the transformed training set with \(k\) features

(called n_components in the API).

To also transform a test set \(X\), we multiply it with \(V_k\):

\[X' = X V_k\]

Note

Most treatments of LSA in the natural language processing (NLP)

and information retrieval (IR) literature

swap the axes of the matrix \(X\) so that it has shape

n_features × n_samples.

We present LSA in a different way that matches the scikit-learn API better,

but the singular values found are the same.

While the TruncatedSVD transformer

works with any feature matrix,

using it on tf–idf matrices is recommended over raw frequency counts

in an LSA/document processing setting.

In particular, sublinear scaling and inverse document frequency

should be turned on (sublinear_tf=True, use_idf=True)

to bring the feature values closer to a Gaussian distribution,

compensating for LSA’s erroneous assumptions about textual data.

Examples:

Clustering text documents using k-means

References:

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze (2008),

Introduction to Information Retrieval, Cambridge University Press,

chapter 18: Matrix decompositions & latent semantic indexing

2.5.4. Dictionary Learning¶

2.5.4.1. Sparse coding with a precomputed dictionary¶

The SparseCoder object is an estimator that can be used to transform signals

into sparse linear combination of atoms from a fixed, precomputed dictionary

such as a discrete wavelet basis. This object therefore does not

implement a fit method. The transformation amounts

to a sparse coding problem: finding a representation of the data as a linear

combination of as few dictionary atoms as possible. All variations of

dictionary learning implement the following transform methods, controllable via

the transform_method initialization parameter:

Orthogonal matching pursuit (Orthogonal Matching Pursuit (OMP))

Least-angle regression (Least Angle Regression)

Lasso computed by least-angle regression

Lasso using coordinate descent (Lasso)

Thresholding

Thresholding is very fast but it does not yield accurate reconstructions.

They have been shown useful in literature for classification tasks. For image

reconstruction tasks, orthogonal matching pursuit yields the most accurate,

unbiased reconstruction.

The dictionary learning objects offer, via the split_code parameter, the

possibility to separate the positive and negative values in the results of

sparse coding. This is useful when dictionary learning is used for extracting

features that will be used for supervised learning, because it allows the

learning algorithm to assign different weights to negative loadings of a

particular atom, from to the corresponding positive loading.

The split code for a single sample has length 2 * n_components

and is constructed using the following rule: First, the regular code of length

n_components is computed. Then, the first n_components entries of the

split_code are

filled with the positive part of the regular code vector. The second half of

the split code is filled with the negative part of the code vector, only with

a positive sign. Therefore, the split_code is non-negative.

Examples:

Sparse coding with a precomputed dictionary

2.5.4.2. Generic dictionary learning¶

Dictionary learning (DictionaryLearning) is a matrix factorization

problem that amounts to finding a (usually overcomplete) dictionary that will

perform well at sparsely encoding the fitted data.

Representing data as sparse combinations of atoms from an overcomplete

dictionary is suggested to be the way the mammalian primary visual cortex works.

Consequently, dictionary learning applied on image patches has been shown to

give good results in image processing tasks such as image completion,

inpainting and denoising, as well as for supervised recognition tasks.

Dictionary learning is an optimization problem solved by alternatively updating

the sparse code, as a solution to multiple Lasso problems, considering the

dictionary fixed, and then updating the dictionary to best fit the sparse code.

\[\begin{split}(U^*, V^*) = \underset{U, V}{\operatorname{arg\,min\,}} & \frac{1}{2}

||X-UV||_{\text{Fro}}^2+\alpha||U||_{1,1} \\

\text{subject to } & ||V_k||_2 <= 1 \text{ for all }

0 \leq k < n_{\mathrm{atoms}}\end{split}\]

\(||.||_{\text{Fro}}\) stands for the Frobenius norm and \(||.||_{1,1}\)

stands for the entry-wise matrix norm which is the sum of the absolute values

of all the entries in the matrix.

After using such a procedure to fit the dictionary, the transform is simply a

sparse coding step that shares the same implementation with all dictionary

learning objects (see Sparse coding with a precomputed dictionary).

It is also possible to constrain the dictionary and/or code to be positive to

match constraints that may be present in the data. Below are the faces with

different positivity constraints applied. Red indicates negative values, blue

indicates positive values, and white represents zeros.

The following image shows how a dictionary learned from 4x4 pixel image patches

extracted from part of the image of a raccoon face looks like.

Examples:

Image denoising using dictionary learning

References:

“Online dictionary learning for sparse coding”

J. Mairal, F. Bach, J. Ponce, G. Sapiro, 2009

2.5.4.3. Mini-batch dictionary learning¶

MiniBatchDictionaryLearning implements a faster, but less accurate

version of the dictionary learning algorithm that is better suited for large

datasets.

By default, MiniBatchDictionaryLearning divides the data into

mini-batches and optimizes in an online manner by cycling over the mini-batches

for the specified number of iterations. However, at the moment it does not

implement a stopping condition.

The estimator also implements partial_fit, which updates the dictionary by

iterating only once over a mini-batch. This can be used for online learning

when the data is not readily available from the start, or for when the data

does not fit into the memory.

Clustering for dictionary learning

Note that when using dictionary learning to extract a representation

(e.g. for sparse coding) clustering can be a good proxy to learn the

dictionary. For instance the MiniBatchKMeans estimator is

computationally efficient and implements on-line learning with a

partial_fit method.

Example: Online learning of a dictionary of parts of faces

2.5.5. Factor Analysis¶

In unsupervised learning we only have a dataset \(X = \{x_1, x_2, \dots, x_n

\}\). How can this dataset be described mathematically? A very simple

continuous latent variable model for \(X\) is

\[x_i = W h_i + \mu + \epsilon\]

The vector \(h_i\) is called “latent” because it is unobserved. \(\epsilon\) is

considered a noise term distributed according to a Gaussian with mean 0 and

covariance \(\Psi\) (i.e. \(\epsilon \sim \mathcal{N}(0, \Psi)\)), \(\mu\) is some

arbitrary offset vector. Such a model is called “generative” as it describes

how \(x_i\) is generated from \(h_i\). If we use all the \(x_i\)’s as columns to form

a matrix \(\mathbf{X}\) and all the \(h_i\)’s as columns of a matrix \(\mathbf{H}\)

then we can write (with suitably defined \(\mathbf{M}\) and \(\mathbf{E}\)):

\[\mathbf{X} = W \mathbf{H} + \mathbf{M} + \mathbf{E}\]

In other words, we decomposed matrix \(\mathbf{X}\).

If \(h_i\) is given, the above equation automatically implies the following

probabilistic interpretation:

\[p(x_i|h_i) = \mathcal{N}(Wh_i + \mu, \Psi)\]

For a complete probabilistic model we also need a prior distribution for the

latent variable \(h\). The most straightforward assumption (based on the nice

properties of the Gaussian distribution) is \(h \sim \mathcal{N}(0,

\mathbf{I})\). This yields a Gaussian as the marginal distribution of \(x\):

\[p(x) = \mathcal{N}(\mu, WW^T + \Psi)\]

Now, without any further assumptions the idea of having a latent variable \(h\)

would be superfluous – \(x\) can be completely modelled with a mean

and a covariance. We need to impose some more specific structure on one

of these two parameters. A simple additional assumption regards the

structure of the error covariance \(\Psi\):

\(\Psi = \sigma^2 \mathbf{I}\): This assumption leads to

the probabilistic model of PCA.

\(\Psi = \mathrm{diag}(\psi_1, \psi_2, \dots, \psi_n)\): This model is called

FactorAnalysis, a classical statistical model. The matrix W is

sometimes called the “factor loading matrix”.

Both models essentially estimate a Gaussian with a low-rank covariance matrix.

Because both models are probabilistic they can be integrated in more complex

models, e.g. Mixture of Factor Analysers. One gets very different models (e.g.

FastICA) if non-Gaussian priors on the latent variables are assumed.

Factor analysis can produce similar components (the columns of its loading

matrix) to PCA. However, one can not make any general statements

about these components (e.g. whether they are orthogonal):

The main advantage for Factor Analysis over PCA is that

it can model the variance in every direction of the input space independently

(heteroscedastic noise):

This allows better model selection than probabilistic PCA in the presence

of heteroscedastic noise:

Factor Analysis is often followed by a rotation of the factors (with the

parameter rotation), usually to improve interpretability. For example,

Varimax rotation maximizes the sum of the variances of the squared loadings,

i.e., it tends to produce sparser factors, which are influenced by only a few

features each (the “simple structure”). See e.g., the first example below.

Examples:

Factor Analysis (with rotation) to visualize patterns

Model selection with Probabilistic PCA and Factor Analysis (FA)

2.5.6. Independent component analysis (ICA)¶

Independent component analysis separates a multivariate signal into

additive subcomponents that are maximally independent. It is

implemented in scikit-learn using the Fast ICA

algorithm. Typically, ICA is not used for reducing dimensionality but

for separating superimposed signals. Since the ICA model does not include

a noise term, for the model to be correct, whitening must be applied.

This can be done internally using the whiten argument or manually using one

of the PCA variants.

It is classically used to separate mixed signals (a problem known as

blind source separation), as in the example below:

ICA can also be used as yet another non linear decomposition that finds

components with some sparsity:

Examples:

Blind source separation using FastICA

FastICA on 2D point clouds

Faces dataset decompositions

2.5.7. Non-negative matrix factorization (NMF or NNMF)¶

2.5.7.1. NMF with the Frobenius norm¶

NMF [1] is an alternative approach to decomposition that assumes that the

data and the components are non-negative. NMF can be plugged in

instead of PCA or its variants, in the cases where the data matrix

does not contain negative values. It finds a decomposition of samples

\(X\) into two matrices \(W\) and \(H\) of non-negative elements,

by optimizing the distance \(d\) between \(X\) and the matrix product

\(WH\). The most widely used distance function is the squared Frobenius

norm, which is an obvious extension of the Euclidean norm to matrices:

\[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{\mathrm{Fro}}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\]

Unlike PCA, the representation of a vector is obtained in an additive

fashion, by superimposing the components, without subtracting. Such additive

models are efficient for representing images and text.

It has been observed in [Hoyer, 2004] [2] that, when carefully constrained,

NMF can produce a parts-based representation of the dataset,

resulting in interpretable models. The following example displays 16

sparse components found by NMF from the images in the Olivetti

faces dataset, in comparison with the PCA eigenfaces.

The init attribute determines the initialization method applied, which

has a great impact on the performance of the method. NMF implements the

method Nonnegative Double Singular Value Decomposition. NNDSVD [4] is based on

two SVD processes, one approximating the data matrix, the other approximating

positive sections of the resulting partial SVD factors utilizing an algebraic

property of unit rank matrices. The basic NNDSVD algorithm is better fit for

sparse factorization. Its variants NNDSVDa (in which all zeros are set equal to

the mean of all elements of the data), and NNDSVDar (in which the zeros are set

to random perturbations less than the mean of the data divided by 100) are

recommended in the dense case.

Note that the Multiplicative Update (‘mu’) solver cannot update zeros present in

the initialization, so it leads to poorer results when used jointly with the

basic NNDSVD algorithm which introduces a lot of zeros; in this case, NNDSVDa or

NNDSVDar should be preferred.

NMF can also be initialized with correctly scaled random non-negative

matrices by setting init="random". An integer seed or a

RandomState can also be passed to random_state to control

reproducibility.

In NMF, L1 and L2 priors can be added to the loss function in order to

regularize the model. The L2 prior uses the Frobenius norm, while the L1 prior

uses an elementwise L1 norm. As in ElasticNet,

we control the combination of L1 and L2 with the l1_ratio (\(\rho\))

parameter, and the intensity of the regularization with the alpha_W and

alpha_H (\(\alpha_W\) and \(\alpha_H\)) parameters. The priors are

scaled by the number of samples (\(n\_samples\)) for H and the number of

features (\(n\_features\)) for W to keep their impact balanced with

respect to one another and to the data fit term as independent as possible of

the size of the training set. Then the priors terms are:

\[(\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features

+ (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\]

and the regularized objective function is:

\[d_{\mathrm{Fro}}(X, WH)

+ (\alpha_W \rho ||W||_1 + \frac{\alpha_W(1-\rho)}{2} ||W||_{\mathrm{Fro}} ^ 2) * n\_features

+ (\alpha_H \rho ||H||_1 + \frac{\alpha_H(1-\rho)}{2} ||H||_{\mathrm{Fro}} ^ 2) * n\_samples\]

2.5.7.2. NMF with a beta-divergence¶

As described previously, the most widely used distance function is the squared

Frobenius norm, which is an obvious extension of the Euclidean norm to

matrices:

\[d_{\mathrm{Fro}}(X, Y) = \frac{1}{2} ||X - Y||_{Fro}^2 = \frac{1}{2} \sum_{i,j} (X_{ij} - {Y}_{ij})^2\]

Other distance functions can be used in NMF as, for example, the (generalized)

Kullback-Leibler (KL) divergence, also referred as I-divergence:

\[d_{KL}(X, Y) = \sum_{i,j} (X_{ij} \log(\frac{X_{ij}}{Y_{ij}}) - X_{ij} + Y_{ij})\]

Or, the Itakura-Saito (IS) divergence:

\[d_{IS}(X, Y) = \sum_{i,j} (\frac{X_{ij}}{Y_{ij}} - \log(\frac{X_{ij}}{Y_{ij}}) - 1)\]

These three distances are special cases of the beta-divergence family, with

\(\beta = 2, 1, 0\) respectively [6]. The beta-divergence are

defined by :

\[d_{\beta}(X, Y) = \sum_{i,j} \frac{1}{\beta(\beta - 1)}(X_{ij}^\beta + (\beta-1)Y_{ij}^\beta - \beta X_{ij} Y_{ij}^{\beta - 1})\]

Note that this definition is not valid if \(\beta \in (0; 1)\), yet it can

be continuously extended to the definitions of \(d_{KL}\) and \(d_{IS}\)

respectively.

NMF implemented solvers

Click for more details

NMF implements two solvers, using Coordinate Descent (‘cd’) [5], and

Multiplicative Update (‘mu’) [6]. The ‘mu’ solver can optimize every

beta-divergence, including of course the Frobenius norm (\(\beta=2\)), the

(generalized) Kullback-Leibler divergence (\(\beta=1\)) and the

Itakura-Saito divergence (\(\beta=0\)). Note that for

\(\beta \in (1; 2)\), the ‘mu’ solver is significantly faster than for other

values of \(\beta\). Note also that with a negative (or 0, i.e.

‘itakura-saito’) \(\beta\), the input matrix cannot contain zero values.

The ‘cd’ solver can only optimize the Frobenius norm. Due to the

underlying non-convexity of NMF, the different solvers may converge to

different minima, even when optimizing the same distance function.

NMF is best used with the fit_transform method, which returns the matrix W.

The matrix H is stored into the fitted model in the components_ attribute;

the method transform will decompose a new matrix X_new based on these

stored components:

>>> import numpy as np

>>> X = np.array([[1, 1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])

>>> from sklearn.decomposition import NMF

>>> model = NMF(n_components=2, init='random', random_state=0)

>>> W = model.fit_transform(X)

>>> H = model.components_

>>> X_new = np.array([[1, 0], [1, 6.1], [1, 0], [1, 4], [3.2, 1], [0, 4]])

>>> W_new = model.transform(X_new)

Examples:

Faces dataset decompositions

Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation

2.5.7.3. Mini-batch Non Negative Matrix Factorization¶

MiniBatchNMF [7] implements a faster, but less accurate version of the

non negative matrix factorization (i.e. NMF),

better suited for large datasets.

By default, MiniBatchNMF divides the data into mini-batches and

optimizes the NMF model in an online manner by cycling over the mini-batches

for the specified number of iterations. The batch_size parameter controls

the size of the batches.

In order to speed up the mini-batch algorithm it is also possible to scale

past batches, giving them less importance than newer batches. This is done

introducing a so-called forgetting factor controlled by the forget_factor

parameter.

The estimator also implements partial_fit, which updates H by iterating

only once over a mini-batch. This can be used for online learning when the data

is not readily available from the start, or when the data does not fit into memory.

References:

[1]

“Learning the parts of objects by non-negative matrix factorization”

D. Lee, S. Seung, 1999

[2]

“Non-negative Matrix Factorization with Sparseness Constraints”

P. Hoyer, 2004

[4]

“SVD based initialization: A head start for nonnegative

matrix factorization”

C. Boutsidis, E. Gallopoulos, 2008

[5]

“Fast local algorithms for large scale nonnegative matrix and tensor

factorizations.”

A. Cichocki, A. Phan, 2009

[6]

(1,2)

“Algorithms for nonnegative matrix factorization with

the beta-divergence”

C. Fevotte, J. Idier, 2011

[7]

“Online algorithms for nonnegative matrix factorization with the

Itakura-Saito divergence”

A. Lefevre, F. Bach, C. Fevotte, 2011

2.5.8. Latent Dirichlet Allocation (LDA)¶

Latent Dirichlet Allocation is a generative probabilistic model for collections of

discrete dataset such as text corpora. It is also a topic model that is used for

discovering abstract topics from a collection of documents.

The graphical model of LDA is a three-level generative model:

Note on notations presented in the graphical model above, which can be found in

Hoffman et al. (2013):

The corpus is a collection of \(D\) documents.

A document is a sequence of \(N\) words.

There are \(K\) topics in the corpus.

The boxes represent repeated sampling.

In the graphical model, each node is a random variable and has a role in the

generative process. A shaded node indicates an observed variable and an unshaded

node indicates a hidden (latent) variable. In this case, words in the corpus are

the only data that we observe. The latent variables determine the random mixture

of topics in the corpus and the distribution of words in the documents.

The goal of LDA is to use the observed words to infer the hidden topic

structure.

Details on modeling text corpora

Click for more details

When modeling text corpora, the model assumes the following generative process

for a corpus with \(D\) documents and \(K\) topics, with \(K\)

corresponding to n_components in the API:

For each topic \(k \in K\), draw \(\beta_k \sim

\mathrm{Dirichlet}(\eta)\). This provides a distribution over the words,

i.e. the probability of a word appearing in topic \(k\).

\(\eta\) corresponds to topic_word_prior.

For each document \(d \in D\), draw the topic proportions

\(\theta_d \sim \mathrm{Dirichlet}(\alpha)\). \(\alpha\)

corresponds to doc_topic_prior.

For each word \(i\) in document \(d\):

Draw the topic assignment \(z_{di} \sim \mathrm{Multinomial}

(\theta_d)\)

Draw the observed word \(w_{ij} \sim \mathrm{Multinomial}

(\beta_{z_{di}})\)

For parameter estimation, the posterior distribution is:

\[p(z, \theta, \beta |w, \alpha, \eta) =

\frac{p(z, \theta, \beta|\alpha, \eta)}{p(w|\alpha, \eta)}\]

Since the posterior is intractable, variational Bayesian method

uses a simpler distribution \(q(z,\theta,\beta | \lambda, \phi, \gamma)\)

to approximate it, and those variational parameters \(\lambda\),

\(\phi\), \(\gamma\) are optimized to maximize the Evidence

Lower Bound (ELBO):

\[\log\: P(w | \alpha, \eta) \geq L(w,\phi,\gamma,\lambda) \overset{\triangle}{=}

E_{q}[\log\:p(w,z,\theta,\beta|\alpha,\eta)] - E_{q}[\log\:q(z, \theta, \beta)]\]

Maximizing ELBO is equivalent to minimizing the Kullback-Leibler(KL) divergence

between \(q(z,\theta,\beta)\) and the true posterior

\(p(z, \theta, \beta |w, \alpha, \eta)\).

LatentDirichletAllocation implements the online variational Bayes

algorithm and supports both online and batch update methods.

While the batch method updates variational variables after each full pass through

the data, the online method updates variational variables from mini-batch data

points.

Note

Although the online method is guaranteed to converge to a local optimum point, the quality of

the optimum point and the speed of convergence may depend on mini-batch size and

attributes related to learning rate setting.

When LatentDirichletAllocation is applied on a “document-term” matrix, the matrix

will be decomposed into a “topic-term” matrix and a “document-topic” matrix. While

“topic-term” matrix is stored as components_ in the model, “document-topic” matrix

can be calculated from transform method.

LatentDirichletAllocation also implements partial_fit method. This is used

when data can be fetched sequentially.

Examples:

Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation

References:

“Latent Dirichlet Allocation”

D. Blei, A. Ng, M. Jordan, 2003

“Online Learning for Latent Dirichlet Allocation”

M. Hoffman, D. Blei, F. Bach, 2010

“Stochastic Variational Inference”

M. Hoffman, D. Blei, C. Wang, J. Paisley, 2013

“The varimax criterion for analytic rotation in factor analysis”

H. F. Kaiser, 1958

See also Dimensionality reduction for dimensionality reduction with

Neighborhood Components Analysis.

Show this page source

Sklearn 可视化数据: 主成分分析(PCA) - 知乎

Sklearn 可视化数据: 主成分分析(PCA) - 知乎首发于奇客谷教程切换模式写文章登录/注册Sklearn 可视化数据: 主成分分析(PCA)吴吃辣编程20年，精品技术教程：www.qikegu.com主成分分析(PCA)是一种常用于减少大数据集维数的降维方法，把大变量集转换为仍包含大变量集中大部分信息的较小变量集。减少数据集的变量数量，自然是以牺牲精度为代价的，降维的好处是以略低的精度换取简便。因为较小的数据集更易于探索和可视化，并且使机器学习算法更容易和更快地分析数据，而不需处理无关变量。总而言之，主成分分析(PCA)的概念很简单——减少数据集的变量数量，同时保留尽可能多的信息。使用scikit-learn，可以很容易地对数据进行主成分分析:# 创建一个随机的PCA模型，该模型包含两个组件

randomized_pca = PCA(n_components=2, svd_solver='randomized')

# 拟合数据并将其转换为模型

reduced_data_rpca = randomized_pca.fit_transform(digits.data)

# 创建一个常规的PCA模型

pca = PCA(n_components=2)

# 拟合数据并将其转换为模型

reduced_data_pca = pca.fit_transform(digits.data)

# 检查形状

reduced_data_pca.shape

# 打印数据

print(reduced_data_rpca)

print(reduced_data_pca)输出[[ -1.25946586 21.27488217]

[ 7.95761214 -20.76870381]

[ 6.99192224 -9.95598251]

...

[ 10.80128338 -6.96025076]

[ -4.87209834 12.42395157]

[ -0.34439091 6.36555458]]

[[ -1.2594653 21.27488157]

[ 7.95761471 -20.76871125]

[ 6.99191791 -9.95597343]

...

[ 10.80128002 -6.96024527]

[ -4.87209081 12.42395739]

[ -0.34439546 6.36556369]]随机的PCA模型在维数较多时性能更好。可以比较常规PCA模型与随机PCA模型的结果，看看有什么不同。告诉模型保留两个组件，是为了确保有二维数据可用来绘图。现在可以绘制一个散点图来可视化数据:colors = ['black', 'blue', 'purple', 'yellow', 'white', 'red', 'lime', 'cyan', 'orange', 'gray']

# 根据主成分分析结果绘制散点图

for i in range(len(colors)):

x = reduced_data_rpca[:, 0][digits.target == i]

y = reduced_data_rpca[:, 1][digits.target == i]

plt.scatter(x, y, c=colors[i])

# 设置图例，0-9用不同颜色表示

plt.legend(digits.target_names, bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)

# 设置坐标标签

plt.xlabel('First Principal Component')

plt.ylabel('Second Principal Component')

# 设置标题

plt.title("PCA Scatter Plot")

# 显示图形

plt.show()显示:章节SciKit-Learn 加载数据集SciKit-Learn 数据集基本信息SciKit-Learn 使用matplotlib可视化数据SciKit-Learn 可视化数据:主成分分析(PCA)SciKit-Learn 预处理数据SciKit-Learn K均值聚类SciKit-Learn 支持向量机SciKit-Learn 速查发布于 2019-08-13 16:24sklearnPython机器学习赞同 7添加评论分享喜欢收藏申请转载文章被以下专栏收录奇客谷教程精品编程教程与实例，互联网技

主成分分析(PCA)-scikit-learn中文社区

安装

用户指南

API

案例

入门

教程

更新日志

词汇表

常见问题

交流群

Toggle Menu

CDA数据科学研究院提供翻译支持

主成分分析(PCA)

主成分分析(PCA)¶

这些图有助于说明点云在一个方向上是如何非常平坦的--这就是PCA选择一个不是平坦方向的地方。

print(__doc__)# Authors: Gael Varoquaux# Jaques Grobler# Kevin Hughes# License: BSD 3 clausefrom sklearn.decomposition import PCAfrom mpl_toolkits.mplot3d import Axes3Dimport numpy as npimport matplotlib.pyplot as pltfrom scipy import stats# ############################################################################## Create the datae = np.exp(1)np.random.seed(4)def pdf(x): return 0.5 * (stats.norm(scale=0.25 / e).pdf(x) + stats.norm(scale=4 / e).pdf(x))y = np.random.normal(scale=0.5, size=(30000))x = np.random.normal(scale=0.5, size=(30000))z = np.random.normal(scale=0.1, size=len(x))density = pdf(x) * pdf(y)pdf_z = pdf(5 * z)density *= pdf_za = x + yb = 2 * yc = a - b + znorm = np.sqrt(a.var() + b.var())a /= normb /= norm# ############################################################################## Plot the figuresdef plot_figs(fig_num, elev, azim): fig = plt.figure(fig_num, figsize=(4, 3)) plt.clf() ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=elev, azim=azim) ax.scatter(a[::10], b[::10], c[::10], c=density[::10], marker='+', alpha=.4) Y = np.c_[a, b, c] # Using SciPy's SVD, this would be: # _, pca_score, V = scipy.linalg.svd(Y, full_matrices=False) pca = PCA(n_components=3) pca.fit(Y) pca_score = pca.explained_variance_ratio_ V = pca.components_ x_pca_axis, y_pca_axis, z_pca_axis = 3 * V.T x_pca_plane = np.r_[x_pca_axis[:2], - x_pca_axis[1::-1]] y_pca_plane = np.r_[y_pca_axis[:2], - y_pca_axis[1::-1]] z_pca_plane = np.r_[z_pca_axis[:2], - z_pca_axis[1::-1]] x_pca_plane.shape = (2, 2) y_pca_plane.shape = (2, 2) z_pca_plane.shape = (2, 2) ax.plot_surface(x_pca_plane, y_pca_plane, z_pca_plane) ax.w_xaxis.set_ticklabels([]) ax.w_yaxis.set_ticklabels([]) ax.w_zaxis.set_ticklabels([])elev = -40azim = -80plot_figs(1, elev, azim)elev = 30azim = 20plot_figs(2, elev, azim)plt.show()

脚本的总运行时间：(0分0.198秒)

Download Python source code: plot_pca_3d.py

Download Jupyter notebook: plot_pca_3d.ipynb

用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园

会员

周边

新闻

博问

AI培训

云市场

所有博客

当前博客

我的博客

我的园子

账号设置

简洁模式 ...

退出登录

刘建平Pinard

十五年码农，对数学统计学，数据挖掘，机器学习，大数据平台，大数据平台应用开发，大数据可视化感兴趣。

博客园

首页

新随笔

联系

管理

用scikit-learn学习主成分分析(PCA)

　　　　在主成分分析（PCA）原理总结中，我们对主成分分析(以下简称PCA)的原理做了总结，下面我们就总结下如何使用scikit-learn工具来进行PCA降维。

1. scikit-learn PCA类介绍

　　　　在scikit-learn中，与PCA相关的类都在sklearn.decomposition包中。最常用的PCA类就是sklearn.decomposition.PCA，我们下面主要也会讲解基于这个类的使用的方法。

　　　　除了PCA类以外，最常用的PCA相关类还有KernelPCA类，在原理篇我们也讲到了，它主要用于非线性数据的降维，需要用到核技巧。因此在使用的时候需要选择合适的核函数并对核函数的参数进行调参。

　　　　另外一个常用的PCA相关类是IncrementalPCA类，它主要是为了解决单机内存限制的。有时候我们的样本量可能是上百万+，维度可能也是上千，直接去拟合数据可能会让内存爆掉，此时我们可以用IncrementalPCA类来解决这个问题。IncrementalPCA先将数据分成多个batch，然后对每个batch依次递增调用partial_fit函数，这样一步步的得到最终的样本最优降维。

　　　　此外还有SparsePCA和MiniBatchSparsePCA。他们和上面讲到的PCA类的区别主要是使用了L1的正则化，这样可以将很多非主要成分的影响度降为0，这样在PCA降维的时候我们仅仅需要对那些相对比较主要的成分进行PCA降维，避免了一些噪声之类的因素对我们PCA降维的影响。SparsePCA和MiniBatchSparsePCA之间的区别则是MiniBatchSparsePCA通过使用一部分样本特征和给定的迭代次数来进行PCA降维，以解决在大样本时特征分解过慢的问题，当然，代价就是PCA降维的精确度可能会降低。使用SparsePCA和MiniBatchSparsePCA需要对L1正则化参数进行调参。

2. sklearn.decomposition.PCA参数介绍

　　　　现在我们对sklearn.decomposition.PCA的主要参数做一个介绍：

　　　　1）n_components：这个参数可以帮我们指定希望PCA降维后的特征维度数目。最常用的做法是直接指定降维到的维度数目，此时n_components是一个大于等于1的整数。当然，我们也可以指定主成分的方差和所占的最小比例阈值，让PCA类自己去根据样本特征方差来决定降维到的维度数，此时n_components是一个（0，1]之间的数。当然，我们还可以将参数设置为"mle", 此时PCA类会用MLE算法根据特征的方差分布情况自己去选择一定数量的主成分特征来降维。我们也可以用默认值，即不输入n_components，此时n_components=min(样本数，特征数)。

　　　　2）whiten ：判断是否进行白化。所谓白化，就是对降维后的数据的每个特征进行归一化，让方差都为1.对于PCA降维本身来说，一般不需要白化。如果你PCA降维后有后续的数据处理动作，可以考虑白化。默认值是False，即不进行白化。

　　　　3）svd_solver：即指定奇异值分解SVD的方法，由于特征分解是奇异值分解SVD的一个特例，一般的PCA库都是基于SVD实现的。有4个可以选择的值：{‘auto’, ‘full’, ‘arpack’, ‘randomized’}。randomized一般适用于数据量大，数据维度多同时主成分数目比例又较低的PCA降维，它使用了一些加快SVD的随机算法。 full则是传统意义上的SVD，使用了scipy库对应的实现。arpack和randomized的适用场景类似，区别是randomized使用的是scikit-learn自己的SVD实现，而arpack直接使用了scipy库的sparse SVD实现。默认是auto，即PCA类会自己去在前面讲到的三种算法里面去权衡，选择一个合适的SVD算法来降维。一般来说，使用默认值就够了。

　　　　除了这些输入参数外，有两个PCA类的成员值得关注。第一个是explained_variance_，它代表降维后的各主成分的方差值。方差值越大，则说明越是重要的主成分。第二个是explained_variance_ratio_，它代表降维后的各主成分的方差值占总方差值的比例，这个比例越大，则越是重要的主成分。

3. PCA实例

　　　　下面我们用一个实例来学习下scikit-learn中的PCA类使用。为了方便的可视化让大家有一个直观的认识，我们这里使用了三维的数据来降维。

　　　　完整代码参见我的github: https://github.com/ljpzzz/machinelearning/blob/master/classic-machine-learning/pca.ipynb

　　　　首先我们生成随机数据并可视化，代码如下：

import numpy as np

import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline

from sklearn.datasets.samples_generator import make_blobs

# X为样本特征，Y为样本簇类别，共1000个样本，每个样本3个特征，共4个簇

X, y = make_blobs(n_samples=10000, n_features=3, centers=[[3,3, 3], [0,0,0], [1,1,1], [2,2,2]], cluster_std=[0.2, 0.1, 0.2, 0.2],

random_state =9)

fig = plt.figure()

ax = Axes3D(fig, rect=[0, 0, 1, 1], elev=30, azim=20)

plt.scatter(X[:, 0], X[:, 1], X[:, 2],marker='o')

　　　　三维数据的分布图如下：

　　　　我们先不降维，只对数据进行投影，看看投影后的三个维度的方差分布，代码如下：

from sklearn.decomposition import PCA

pca = PCA(n_components=3)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

　　　　输出如下：

[ 0.98318212 0.00850037 0.00831751][ 3.78483785 0.03272285 0.03201892]

　　　　可以看出投影后三个特征维度的方差比例大约为98.3%：0.8%：0.8%。投影后第一个特征占了绝大多数的主成分比例。

　　　　现在我们来进行降维，从三维降到2维，代码如下：

pca = PCA(n_components=2)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

　　　　输出如下：

[ 0.98318212 0.00850037][ 3.78483785 0.03272285]

　　　　这个结果其实可以预料，因为上面三个投影后的特征维度的方差分别为：[ 3.78483785 0.03272285 0.03201892]，投影到二维后选择的肯定是前两个特征，而抛弃第三个特征。

　　　　为了有个直观的认识，我们看看此时转化后的数据分布，代码如下：

X_new = pca.transform(X)

plt.scatter(X_new[:, 0], X_new[:, 1],marker='o')

plt.show()

　　　　输出的图如下：

　　　　可见降维后的数据依然可以很清楚的看到我们之前三维图中的4个簇。

　　　　现在我们看看不直接指定降维的维度，而指定降维后的主成分方差和比例。

pca = PCA(n_components=0.95)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

　　　　我们指定了主成分至少占95%，输出如下：

[ 0.98318212]

[ 3.78483785]

　　　　可见只有第一个投影特征被保留。这也很好理解，我们的第一个主成分占投影特征的方差比例高达98%。只选择这一个特征维度便可以满足95%的阈值。我们现在选择阈值99%看看，代码如下：

pca = PCA(n_components=0.99)

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

　　　　此时的输出如下：

[ 0.98318212 0.00850037]

[ 3.78483785 0.03272285]

　　　　这个结果也很好理解，因为我们第一个主成分占了98.3%的方差比例，第二个主成分占了0.8%的方差比例，两者一起可以满足我们的阈值。

　　　　最后我们看看让MLE算法自己选择降维维度的效果，代码如下：

pca = PCA(n_components='mle')

pca.fit(X)

print pca.explained_variance_ratio_

print pca.explained_variance_

print pca.n_components_

　　　　输出结果如下：

[ 0.98318212][ 3.78483785]1

　　　　可见由于我们的数据的第一个投影特征的方差占比高达98.3%，MLE算法只保留了我们的第一个特征。

（欢迎转载，转载请注明出处。欢迎沟通交流： liujianping-ok@163.com）

posted @

2017-01-02 20:55

刘建平Pinard

阅读(154250)

编辑

会员力量，点亮园子希望

刷新页面返回顶部

公告

PCA example with Iris Data-set — scikit-learn 1.4.1 documentation

Install

User Guide

API

Examples

Community

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Getting Started

Tutorial

What's new

Glossary

Development

FAQ

Support

Related packages

Roadmap

Governance

About us

GitHub

Other Versions and Download

Toggle Menu

PrevUp

scikit-learn 1.4.1

Other versions

Please cite us if you use the software.

PCA example with Iris Data-set

Note

Go to the end

to download the full example code or to run this example in your browser via JupyterLite or Binder

PCA example with Iris Data-set¶

Principal Component Analysis applied to the Iris dataset.

See here for more

information on this dataset.

# Code source: Gaël Varoquaux

# License: BSD 3 clause

import matplotlib.pyplot as plt

# unused but required import for doing 3d projections with matplotlib < 3.2

import mpl_toolkits.mplot3d # noqa: F401

import numpy as np

from sklearn import datasets, decomposition

np.random.seed(5)

iris = datasets.load_iris()

X = iris.data

y = iris.target

fig = plt.figure(1, figsize=(4, 3))

plt.clf()

ax = fig.add_subplot(111, projection="3d", elev=48, azim=134)

ax.set_position([0, 0, 0.95, 1])

plt.cla()

pca = decomposition.PCA(n_components=3)

pca.fit(X)

X = pca.transform(X)

for name, label in [("Setosa", 0), ("Versicolour", 1), ("Virginica", 2)]:

ax.text3D(

X[y == label, 0].mean(),

X[y == label, 1].mean() + 1.5,

X[y == label, 2].mean(),

name,

horizontalalignment="center",

bbox=dict(alpha=0.5, edgecolor="w", facecolor="w"),

)

# Reorder the labels to have colors matching the cluster results

y = np.choose(y, [1, 2, 0]).astype(float)

ax.scatter(X[:, 0], X[:, 1], X[:, 2], c=y, cmap=plt.cm.nipy_spectral, edgecolor="k")

ax.xaxis.set_ticklabels([])

ax.yaxis.set_ticklabels([])

ax.zaxis.set_ticklabels([])

plt.show()

Total running time of the script: (0 minutes 0.095 seconds)

Download Jupyter notebook: plot_pca_iris.ipynb

Download Python source code: plot_pca_iris.py

Related examples

The Iris Dataset

K-means Clustering

Sparsity Example: Fitting only features 1 and 2

Incremental PCA

Comparison of LDA and PCA 2D projection of Iris dataset

Gallery generated by Sphinx-Gallery

Show this page source

Follow

比特派app最新版下载苹果|sklearn pca

比特派app最新版下载苹果|sklearn pca

sklearn.decomposition.PCA — scikit-learn 1.4.1 documentation

SKLEARN中的PCA(Principal Component Analysis)主成分分析法 - 知乎

具体介绍sklearn库中：主成分分析（PCA）的参数、属性、方法_sklearn pca参数-CSDN博客

用Python (scikit-learn) 做PCA分析 - 知乎

【python】sklearn中PCA的使用方法_from sklearn.decomposition import pca-CSDN博客

2.5. Decomposing signals in components (matrix factorization problems) — scikit-learn 1.4.1 documentation

Sklearn 可视化数据: 主成分分析(PCA) - 知乎

主成分分析(PCA)-scikit-learn中文社区

用scikit-learn学习主成分分析(PCA) - 刘建平Pinard - 博客园

PCA example with Iris Data-set — scikit-learn 1.4.1 documentation

最近的新闻

您可能喜欢的文章

bitpie比特派钱包官网|比特币涨

比特派钱包下载安装视频|安邦保险公司

在哪下载比特派账号

比特派rvn币区别

比特派钱包矿池

苏富比特派专家

比特派钱包多地址