超强总结！100个Python核心操作！！

文摘 2024-09-14 16:26 北京

大家好~

今儿和大家分享一个非常全面且重要的内容：Python核心操作。

今天分享的Python核心操作，是围绕数据科学的周边展开，涉及到Numpy、Pandas、以及机器学习库，sklearn、pytorch、TensorFlow等等。

一起看下，收藏本页，可以随时拿出来查询使用~

1. 导入库并设置默认参数

介绍：

导入Python数据科学常用库并设定一些默认参数，例如显示所有列、禁止科学计数法等。

示例：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

pd.set_option('display.max_columns', None)  # 显示所有列
pd.set_option('display.float_format', lambda x: '%.3f' % x)  # 禁用科学计数法
sns.set(style="whitegrid")  # 设置默认Seaborn样式

2. 创建多维NumPy数组并检查其属性

介绍：

创建一个2x3的NumPy数组，并检查其形状、维度和数据类型。

示例：

arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr.shape)  # (2, 3)
print(arr.ndim)   # 2
print(arr.dtype)  # int64

3. NumPy数组的基础操作

介绍：

在NumPy数组上进行基础数学运算，如加减乘除。

示例：

arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

sum_arr = arr1 + arr2  # 元素加法 [5, 7, 9]
mul_arr = arr1 * arr2  # 元素乘法 [4, 10, 18]
exp_arr = np.exp(arr1)  # 指数 [2.718, 7.389, 20.085]

4. 生成随机数矩阵

介绍：

生成一个3x3的随机矩阵，可以指定范围。

示例：

random_matrix = np.random.randint(1, 100, size=(3, 3))
print(random_matrix)

5. Pandas创建DataFrame并查看基本信息

介绍：

创建DataFrame并查看其前几行、数据类型等信息。

示例：

data = {'Name': ['Tom', 'Jerry', 'Spike'],
        'Age': [25, 30, 22],
        'Score': [85.5, 90.1, 78.3]}
df = pd.DataFrame(data)

print(df.head())  # 查看前几行
print(df.info())  # 数据类型和非空计数
print(df.describe())  # 统计描述

6. 读取CSV文件并处理缺失值

介绍：

从CSV文件读取数据，并处理缺失值，如填充或删除缺失数据。

示例：

df = pd.read_csv('data.csv')

# 查看缺失值情况
print(df.isnull().sum())

# 填充缺失值
df['column_name'].fillna(df['column_name'].mean(), inplace=True)

# 或者删除有缺失值的行
df.dropna(inplace=True)

7. Pandas筛选数据

介绍：

通过条件筛选DataFrame中的数据。

示例：

df_filtered = df[df['Age'] > 25]  # 筛选年龄大于25的行

8. Pandas分组操作

介绍：

对DataFrame进行分组操作，常用于聚合统计。

示例：

grouped = df.groupby('Category')
mean_scores = grouped['Score'].mean()  # 计算每个分类的平均得分

9. Pandas数据透视表

介绍：

创建数据透视表用于数据汇总和分析。

示例：

pivot_table = pd.pivot_table(df, values='Score', index='Category', columns='Gender', aggfunc=np.mean)
print(pivot_table)

10. 数据可视化 - 基本Matplotlib绘图

介绍：

使用Matplotlib绘制简单的折线图。

示例：

x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()

11. Seaborn数据可视化 - 线性回归图

介绍：

使用Seaborn绘制带有回归线的散点图。

示例：

sns.lmplot(x='Age', y='Score', data=df, height=6, aspect=1.5)
plt.show()

12. Matplotlib绘制多子图

介绍：

在同一画布上绘制多个子图。

示例：

fig, axes = plt.subplots(2, 2, figsize=(10, 10))

axes[0, 0].plot(x, y)
axes[0, 1].plot(x, np.cos(x))
axes[1, 0].plot(x, np.tan(x))
axes[1, 1].plot(x, -y)

plt.show()

13. Seaborn数据分布可视化

介绍：

绘制数据的分布图，直观展示数据分布形态。

示例：

sns.histplot(df['Score'], kde=True)
plt.show()

14. Pandas处理日期数据

介绍：

将字符串转换为日期格式，并进行日期操作。

示例：

df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month

15. Pandas合并DataFrame

介绍：

通过merge操作合并两个DataFrame，类似SQL中的JOIN操作。

示例：

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value': [4, 5, 6]})

merged_df = pd.merge(df1, df2, on='key', how='inner')  # 内连接

16. Pandas透视表和层次化索引

介绍：

使用透视表进行数据聚合和层次化索引操作。

示例：

pivot = pd.pivot_table(df, values='Sales', index=['Region', 'Product'], columns='Year', aggfunc='sum')

17. 处理类别变量

介绍：

将类别变量转换为数值类型（如使用哑变量）。

示例：

df = pd.get_dummies(df, columns=['Category'], drop_first=True)

18. 绘制相关性矩阵和热力图

介绍：

计算DataFrame的相关性并绘制热力图，展示变量之间的线性关系。

示例：

corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.show()

19. 拆分训练集和测试集

介绍：

使用sklearn库将数据集划分为训练集和测试集。

示例：

from sklearn.model_selection import train_test_split

X = df[['Age', 'Score']]
y = df['Outcome']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

20. 构建线性回归模型

介绍：

使用sklearn构建并训练线性回归模型。

示例：

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

predictions = model.predict(X_test)

21. 模型评估 - 均方误差

介绍：

计算模型的均方误差(MSE)，评估模型性能。

示例：

from sklearn.metrics import mean_squared_error



mse = mean_squared_error(y_test, predictions)
print(f'MSE: {mse:.3f}')

22. 交叉验证

介绍：

使用交叉验证评估模型的稳定性和泛化性能。

示例：

from sklearn.model_selection import cross_val_score

scores = cross_val_score(model, X, y, cv=5, scoring='neg_mean_squared_error')
print(f'Cross-validated MSE: {-scores.mean():.3f}')

23. 标准化数据

介绍：

标准化特征以便将其缩放至同一量纲。

示例：

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

24. 决策树模型

介绍：

使用sklearn库构建决策树分类模型。

示例：

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X_train, y_train)

accuracy = clf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

25. 随机森林模型

介绍：

使用随机森林算法进行分类。

示例：

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)

accuracy = rf.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

26. 特征重要性

介绍：

使用随机森林提取重要特征。

示例：

feature_importances = rf.feature_importances_
print(feature_importances)

27. PCA主成分分析

介绍：

使用PCA降维，减少数据的维度。

示例：

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

28. K-Means聚类

介绍：

使用K-Means算法进行无监督学习，进行聚类分析。

示例：

from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

labels = kmeans.labels_

29. 评价聚类结果

介绍：

计算轮廓系数(Silhouette Score)评估聚类效果。

示例：

from sklearn.metrics import silhouette_score

score = silhouette_score(X, labels)
print(f'Silhouette Score: {score:.3f}')

30. 逻辑回归模型

介绍：

构建逻辑回归模型用于分类。

示例：

from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression()
log_reg.fit(X_train, y_train)

accuracy = log_reg.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

31. Grid Search 网格搜索

介绍：

通过网格搜索来调优模型超参数，寻找最佳参数组合。

示例：

from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f'Best parameters: {grid_search.best_params_}')
print(f'Best score: {grid_search.best_score_}')

32. Randomized Search 随机搜索

介绍：

随机搜索用于寻找最佳超参数，比网格搜索更快适用于大范围参数搜索。

示例：

from sklearn.model_selection import RandomizedSearchCV

param_dist = {'n_estimators': [50, 100, 200], 'max_depth': [None, 10, 20]}
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=10, cv=5)
random_search.fit(X_train, y_train)

print(f'Best parameters: {random_search.best_params_}')

33. XGBoost模型

介绍：

使用XGBoost进行梯度提升分类。

示例：

from xgboost import XGBClassifier

xgb_model = XGBClassifier(n_estimators=100, random_state=42)
xgb_model.fit(X_train, y_train)

accuracy = xgb_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

34. LightGBM模型

介绍：

使用LightGBM进行快速梯度提升分类。

示例：

import lightgbm as lgb

lgb_model = lgb.LGBMClassifier(n_estimators=100, random_state=42)
lgb_model.fit(X_train, y_train)

accuracy = lgb_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

35. CatBoost模型

介绍：

使用CatBoost处理类别特征的梯度提升模型。

示例：

from catboost import CatBoostClassifier

cat_model = CatBoostClassifier(n_estimators=100, random_state=42, verbose=0)
cat_model.fit(X_train, y_train)

accuracy = cat_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

36. 支持向量机（SVM）分类

介绍：

使用SVM进行二分类任务，适用于高维数据。

示例：

from sklearn.svm import SVC

svm_model = SVC(kernel='linear', C=1)
svm_model.fit(X_train, y_train)

accuracy = svm_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

37. K近邻算法（KNN）分类

介绍：

使用KNN算法进行分类。

示例：

from sklearn.neighbors import KNeighborsClassifier

knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train, y_train)

accuracy = knn_model.score(X_test, y_test)
print(f'Accuracy: {accuracy:.3f}')

38. 多项式回归

介绍：

使用多项式回归进行非线性关系建模。

示例：

from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)

39. 岭回归（L2正则化）

介绍：

使用岭回归（L2正则化）以防止过拟合。

示例：

from sklearn.linear_model import Ridge

ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

40. Lasso回归（L1正则化）

介绍：

使用Lasso回归（L1正则化）进行特征选择。

示例：

from sklearn.linear_model import Lasso

lasso_model = Lasso(alpha=0.1)
lasso_model.fit(X_train, y_train)

41. ElasticNet回归

介绍：

结合L1和L2正则化的ElasticNet回归。

示例：

from sklearn.linear_model import ElasticNet

enet_model = ElasticNet(alpha=0.1, l1_ratio=0.7)
enet_model.fit(X_train, y_train)

42. Stochastic Gradient Descent (SGD)分类

介绍：

使用SGD进行大规模线性分类任务。

示例：

from sklearn.linear_model import SGDClassifier

sgd_model = SGDClassifier(max_iter=1000, tol=1e-3)
sgd_model.fit(X_train, y_train)

43. DBSCAN密度聚类

介绍：

使用DBSCAN进行密度聚类，适用于非凸形状数据。

示例：

from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=0.5, min_samples=5)
labels = dbscan.fit_predict(X)

44. 层次聚类

介绍：

使用层次聚类进行无监督学习并可视化聚类层次。

示例：

from scipy.cluster.hierarchy import dendrogram, linkage

linked = linkage(X, method='ward')
dendrogram(linked)
plt.show()

45. 孤立森林（异常检测）

介绍：

使用孤立森林进行异常检测。

示例：

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
iso_forest.fit(X)

anomalies = iso_forest.predict(X)

46. 主成分分析（PCA）可视化

介绍：

对PCA结果进行可视化，展示降维后数据的分布。

示例：

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.title('PCA Visualization')
plt.show()

47. t-SNE降维可视化

介绍：

使用t-SNE进行降维并可视化高维数据的分布。

示例：

from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X)

plt.scatter(X_tsne[:, 0], X_tsne[:, 1], c=y)
plt.title('t-SNE Visualization')
plt.show()

48. ROC曲线绘制

介绍：

绘制Receiver Operating Characteristic (ROC)曲线，评估二分类模型的表现。

示例：

from sklearn.metrics import roc_curve, auc

y_prob = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend()
plt.show()

49. 混淆矩阵

介绍：

使用混淆矩阵评估分类模型的性能。

示例：

from sklearn.metrics import confusion_matrix

y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt='d')
plt.title('Confusion Matrix')
plt.show()

50. 精度、召回率和F1分数

介绍：

计算分类模型的精度、召回率和F1分数，用于评估模型性能。

示例：

from sklearn.metrics import precision_score, recall_score, f1_score

y_pred = model.predict(X_test)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f'Precision: {precision:.3f}')
print(f'Recall: {recall:.3f}')
print(f'F1 Score: {f1:.3f}')

51. 特征选择 - 基于模型的选择

介绍：

使用模型的特征重要性进行特征选择。

示例：

from sklearn.feature_selection import SelectFromModel

selector = SelectFromModel(rf, threshold='mean')
X_selected = selector.fit_transform(X_train, y_train)

52. 交叉验证 - 分层K折

介绍：

使用分层K折交叉验证确保每个折中类别分布均匀。

示例：

from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=5)
for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

53. 标准化和归一化

介绍：

数据标准化（z-score标准化）和归一化（min-max缩放）。

示例：

from sklearn.preprocessing import StandardScaler, MinMaxScaler

# 标准化
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# 归一化
minmax_scaler = MinMaxScaler()
X_normalized = minmax_scaler.fit_transform(X)

54. 数据拆分 - 自定义拆分

介绍：

根据自定义条件拆分数据集。

示例：

train_df = df[df['Year'] < 2020]
test_df = df[df['Year'] >= 2020]

55. 时间序列分析 - 自相关图

介绍：

绘制自相关图分析时间序列数据的相关性。

示例：

from statsmodels.graphics.tsaplots import plot_acf

plot_acf(df['value'])
plt.show()

56. 时间序列分析 - 滚动均值

介绍：

计算和绘制滚动均值以平滑时间序列数据。

示例：

df['Rolling_Mean'] = df['value'].rolling(window=12).mean()
df[['value', 'Rolling_Mean']].plot()
plt.show()

57. 数据处理 - 应用函数

介绍：

在Pandas DataFrame的列上应用自定义函数。

示例：

def custom_function(x):
    return x * 2

df['new_column'] = df['column_name'].apply(custom_function)

58. 数据处理 - 数据透视表的汇总函数

介绍：

使用数据透视表进行更复杂的聚合操作。

示例：

pivot_table = pd.pivot_table(df, values='Sales', index='Region', columns='Product', aggfunc={'Sales': np.sum, 'Profit': np.mean})

59. 交叉表

介绍：

创建交叉表用于分析类别变量之间的关系。

示例：

crosstab = pd.crosstab(df['Category'], df['Outcome'])

60. 数据处理 - 数据清洗

介绍：

处理重复数据和异常值。

示例：

df = df.drop_duplicates()  # 删除重复行
df = df[df['column_name'] < threshold]  # 处理异常值

61. 分布拟合 - 正态分布

介绍：

使用scipy库拟合数据到正态分布。

示例：

from scipy import stats

mu, std = stats.norm.fit(df['value'])

62. 线性模型 - 多项式回归

介绍：

扩展线性模型以处理非线性数据。

示例：

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

poly = PolynomialFeatures(degree=3)
X_poly = poly.fit_transform(X)

model = LinearRegression()
model.fit(X_poly, y)

63. 深度学习 - TensorFlow基础

介绍：

使用TensorFlow进行基本的深度学习模型构建。

示例：

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

model = Sequential([
    Dense(64, activation='relu', input_shape=(input_dim,)),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=5)

64. 深度学习 - Keras基础

介绍：

使用Keras构建和训练深度学习模型。

示例：

from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(128, activation='relu', input_shape=(input_dim,)),
    Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

65. 模型保存和加载

介绍：

保存和加载深度学习模型。

示例：

model.save('my_model.h5')  # 保存模型
loaded_model = tf.keras.models.load_model('my_model.h5')  # 加载模型

66. 模型评估 - 混淆矩阵和分类报告

介绍：

评估模型性能并生成分类报告。

示例：

from sklearn.metrics import classification_report

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

67. 超参数调优 - 贝叶斯优化

介绍：

使用贝叶斯优化进行超参数调优。

示例：

from skopt import BayesSearchCV

bayes_search = BayesSearchCV(estimator=rf, search_spaces={'n_estimators': (50, 200), 'max_depth': (5, 30)}, n_iter=50)
bayes_search.fit(X_train, y_train)

print(f'Best parameters: {bayes_search.best_params_}')

68. 时间序列 - 季节性分解

介绍：

分解时间序列数据为趋势、季节性和残差成分。

示例：

from statsmodels.tsa.seasonal import seasonal_decompose

decomposition = seasonal_decompose(df['value'], model='additive', period=12)
decomposition.plot()
plt.show()

69. 时间序列 - ARIMA模型

介绍：

使用ARIMA模型进行时间序列预测。

示例：

from statsmodels.tsa.arima_model import ARIMA

model = ARIMA(df['value'], order=(5,1,0))
model_fit = model.fit(disp=0)

forecast = model_fit.forecast(steps=10)[0]

70. 异常检测 - LOF（局部离群因子）

介绍：

使用LOF进行异常检测。

示例：

from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20)
outliers = lof.fit_predict(X)

71. 协方差矩阵

介绍：

计算数据的协方差矩阵，了解变量之间的线性关系。

示例：

covariance_matrix = np.cov(df[['x1', 'x2']].T)

72. 条件概率计算

介绍：

计算类别变量的条件概率。

示例：：

conditional_prob = pd.crosstab(df['Category'], df['Outcome'], normalize='index')

73. 信息增益计算

介绍：

计算信息增益，用于特征选择。

示例：

from sklearn.feature_selection import mutual_info_classif

mi = mutual_info_classif(X, y)

74. 正态性检验 - Shapiro-Wilk检验

介绍：

使用Shapiro-Wilk检验检查数据是否服从正态分布。

示例：

from scipy.stats import shapiro

stat, p_value = shapiro(df['value'])

75. 方差分析（ANOVA）

介绍：

进行方差分析来比较不同组之间的均值。

示例：

from scipy.stats import f_oneway

f_stat, p_value = f_oneway(df['group1'], df['group2'], df['group3'])

76. Bootstrapping

介绍：

使用自助法进行模型评估和不确定性估计。

示例：

from sklearn.utils import resample

bootstrapped_samples = resample(df, n_samples=1000, random_state=42)

77. 贝叶斯网络

介绍：

使用贝叶斯网络进行概率推断。

示例：

from pomegranate import BayesianNetwork

model = BayesianNetwork.from_samples(X, algorithm='chow-liu')

78. 决策树可视化

介绍：

可视化决策树以理解模型决策过程。

示例：

from sklearn.tree import export_graphviz
import graphviz

dot_data = export_graphviz(clf, out_file=None, feature_names=X.columns, class_names=['0', '1'], filled=True, rounded=True)
graph = graphviz.Source(dot_data)
graph.render('decision_tree')

79. 热图

介绍：

使用热图展示数据的相关性或频次。

示例：

import seaborn as sns

sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

80. 3D散点图

介绍：

绘制三维散点图以可视化三维数据。

示例：

from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(df['x1'], df['x2'], df['x3'])
plt.show()

81. 小提琴图

介绍：

使用小提琴图展示数据分布的密度。

示例：

sns.violinplot(x='Category', y='Value', data=df)
plt.show()

82. 箱线图

介绍：

使用箱线图展示数据的分布及异常值。

示例：

sns.boxplot(x='Category', y='Value', data=df)
plt.show()

83. 直方图

介绍：

绘制直方图以展示数据的分布情况。

示例：

df['value'].hist(bins=30)
plt.show()

84. KDE（核密度估计）

介绍：

绘制KDE图以估计数据的概率密度函数。

示例：

sns.kdeplot(df['value'])
plt.show()

85. 图形化模型性能

介绍：

使用不同图形展示模型性能，例如学习曲线。

示例：

from sklearn.model_selection import learning_curve

train_sizes, train_scores, test_scores = learning_curve(model, X, y, cv=5)

plt.plot(train_sizes, train_scores.mean(axis=1), 'o-', label='Training score')
plt.plot(train_sizes, test_scores.mean(axis=1), 'o-', label='Test score')
plt.xlabel('Training examples')
plt.ylabel('Score')
plt.title('Learning Curve')
plt.legend()
plt.show()

86. 模型的系数可视化

介绍：

可视化线性模型的系数，以理解特征对预测的影响。

示例：

coef = model.coef_
plt.bar(range(len(coef)), coef)
plt.xlabel('Feature index')
plt.ylabel('Coefficient value')
plt.title('Model Coefficients')
plt.show()

87. RNN基础

介绍：

构建简单的循环神经网络（RNN）进行序列预测。

示例：

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense

model = Sequential([
    SimpleRNN(50, input_shape=(timesteps, features)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10)

88. LSTM网络

介绍：

使用长短期记忆网络（LSTM）进行序列数据预测。

示例：

from tensorflow.keras.layers import LSTM

model = Sequential([
    LSTM(50, input_shape=(timesteps, features)),
    Dense(1)
])

model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10)

89. 数据增强

介绍：

在图像数据上使用数据增强技术进行模型训练。

示例：

from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

datagen.fit(X_train)

90. 图神经网络（GNN）基础

介绍：

使用图神经网络处理图结构数据。

示例：

import torch
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self):
        super(GCN, self).__init__()
        self.conv1 = GCNConv(num_features, 64)
        self.conv2 = GCNConv(64, num_classes)
    
    def forward(self, data):
        x, edge_index = data.x, data.edge_index
        x = self.conv1(x, edge_index)
        x = x.relu()
        x = self.conv2(x, edge_index)
        return x

91. 自动编码器

介绍：

构建自动编码器进行数据降维和特征学习。

示例：

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

input_layer = Input(shape=(input_dim,))
encoded = Dense(64, activation='relu')(input_layer)
decoded = Dense(input_dim, activation='sigmoid')(encoded)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='binary_crossentropy')
autoencoder.fit(X_train, X_train, epochs=50)

92. 生成对抗网络（GAN）

介绍：

使用GAN生成新的数据样本。

示例：

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

# Generator
noise = Input(shape=(100,))
x = Dense(128, activation='relu')(noise)
generated_image = Dense(784, activation='sigmoid')(x)

generator = Model(noise, generated_image)

# Discriminator
image = Input(shape=(784,))
x = Dense(128, activation='relu')(image)
validity = Dense(1, activation='sigmoid')(x)

discriminator = Model(image, validity)

93. 图像分类 - 卷积神经网络（CNN）

介绍：

构建卷积神经网络进行图像分类。

示例：

from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense

model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    MaxPooling2D((2, 2)),
    Conv2D(64, (3, 3), activation='

relu'),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(num_classes, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10)

94. 异常检测 - Isolation Forest

介绍：

使用Isolation Forest进行异常检测。

示例：

from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(contamination=0.1)
outliers = iso_forest.fit_predict(X)

95. 模型融合 - 随机森林和梯度提升

介绍：

结合随机森林和梯度提升模型进行模型融合。

示例：

from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier

rf = RandomForestClassifier(n_estimators=100)
gb = GradientBoostingClassifier(n_estimators=100)

ensemble_model = VotingClassifier(estimators=[('rf', rf), ('gb', gb)], voting='soft')
ensemble_model.fit(X_train, y_train)

96. 主成分分析（PCA）可视化

介绍：

使用PCA降维并可视化数据。

示例：

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y)
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.title('PCA of dataset')
plt.show()

97. 特征重要性可视化

介绍：

可视化特征的重要性评分。

示例：

feature_importances = model.feature_importances_
plt.bar(range(len(feature_importances)), feature_importances)
plt.xlabel('Feature index')
plt.ylabel('Importance')
plt.title('Feature Importances')
plt.show()

98. 超参数搜索 - 网格搜索

介绍：

使用网格搜索进行超参数优化。

示例：

from sklearn.model_selection import GridSearchCV

param_grid = {'n_estimators': [50, 100, 200], 'max_depth': [10, 20, 30]}
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5)
grid_search.fit(X_train, y_train)

print(f'Best parameters: {grid_search.best_params_}')

99. 时间序列预测 - SARIMA

介绍：

使用SARIMA进行季节性时间序列预测。

示例：

from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(df['value'], order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
model_fit = model.fit(disp=0)

forecast = model_fit.forecast(steps=10)

100. 文本数据 - 词云

介绍：

使用词云可视化文本数据中的关键词。

示例：

from wordcloud import WordCloud

text = ' '.join(df['text_column'])
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

最后

以上，通过Ames Housing数据集的示例，我们展示了如何在高维数据集上应用正则化技术，并分析了不同正则化方法的效果。我们还通过可视化展示了正则化路径，解释了其在特征选择中的作用。这个流程同样适用于更大、更复杂的数据集。

最近准备了16大块的内容，124个算法问题的总结，完整的机器学习小册，免费领取~

另外，今天给大家准备了关于「深度学习」的论文合集，往期核心论文汇总，分享给大家。

点击名片，回复「深度学习论文」即可~

如果你对类似于这样的文章感兴趣。

欢迎关注、点赞、转发~

http://mp.weixin.qq.com/s?__biz=MzAwNTkyNTUxMA==&mid=2247490373&idx=1&sn=1b7512fe86258918f4f868bcc5a39802

机器学习和人工智能AI

让我们一起期待 AI 带给我们的每一场变革！推送最新行业内最新最前沿人工智能技术！

超强总结！100个Python核心操作 ！！

1. 导入库并设置默认参数

2. 创建多维NumPy数组并检查其属性

3. NumPy数组的基础操作

4. 生成随机数矩阵

5. Pandas创建DataFrame并查看基本信息

6. 读取CSV文件并处理缺失值

7. Pandas筛选数据

8. Pandas分组操作

9. Pandas数据透视表

10. 数据可视化 - 基本Matplotlib绘图

11. Seaborn数据可视化 - 线性回归图

12. Matplotlib绘制多子图

13. Seaborn数据分布可视化

14. Pandas处理日期数据

15. Pandas合并DataFrame

16. Pandas透视表和层次化索引

17. 处理类别变量

18. 绘制相关性矩阵和热力图

19. 拆分训练集和测试集

20. 构建线性回归模型

21. 模型评估 - 均方误差

22. 交叉验证

23. 标准化数据

24. 决策树模型

25. 随机森林模型

26. 特征重要性

27. PCA主成分分析

28. K-Means聚类

29. 评价聚类结果

30. 逻辑回归模型

31. Grid Search 网格搜索

32. Randomized Search 随机搜索

33. XGBoost模型

34. LightGBM模型

35. CatBoost模型

36. 支持向量机（SVM）分类

37. K近邻算法（KNN）分类

38. 多项式回归

39. 岭回归（L2正则化）

40. Lasso回归（L1正则化）

41. ElasticNet回归

42. Stochastic Gradient Descent (SGD)分类

43. DBSCAN密度聚类

44. 层次聚类

45. 孤立森林（异常检测）

46. 主成分分析（PCA）可视化

47. t-SNE降维可视化

48. ROC曲线绘制

49. 混淆矩阵

50. 精度、召回率和F1分数

51. 特征选择 - 基于模型的选择

52. 交叉验证 - 分层K折

53. 标准化和归一化

54. 数据拆分 - 自定义拆分

55. 时间序列分析 - 自相关图

56. 时间序列分析 - 滚动均值

57. 数据处理 - 应用函数

58. 数据处理 - 数据透视表的汇总函数

59. 交叉表

60. 数据处理 - 数据清洗

61. 分布拟合 - 正态分布

62. 线性模型 - 多项式回归

63. 深度学习 - TensorFlow基础

64. 深度学习 - Keras基础

65. 模型保存和加载

66. 模型评估 - 混淆矩阵和分类报告

67. 超参数调优 - 贝叶斯优化

68. 时间序列 - 季节性分解

69. 时间序列 - ARIMA模型

70. 异常检测 - LOF（局部离群因子）

71. 协方差矩阵

72. 条件概率计算

73. 信息增益计算

74. 正态性检验 - Shapiro-Wilk检验

75. 方差分析（ANOVA）

76. Bootstrapping

77. 贝叶斯网络

78. 决策树可视化

79. 热图

超强总结！100个Python核心操作！！