Explorar o código

规整聚类代码

liudawei hai 1 ano
pai
achega
4ec040a9f3
Modificáronse 81 ficheiros con 3457 adicións e 0 borrados
  1. 3 0
      .gitignore
  2. 6 0
      README.md
  3. 884 0
      light-fcm-clustering/Fuzzy-C-Means/.ipynb_checkpoints/Fuzzy C-Means develop-checkpoint.ipynb
  4. 14 0
      light-fcm-clustering/Fuzzy-C-Means/FCM.md
  5. 884 0
      light-fcm-clustering/Fuzzy-C-Means/Fuzzy C-Means develop.ipynb
  6. 188 0
      light-fcm-clustering/Fuzzy-C-Means/PlotFunctions.py
  7. 300 0
      light-fcm-clustering/Fuzzy-C-Means/fuzzy_c_means.py
  8. 151 0
      light-fcm-clustering/Fuzzy-C-Means/iris_data.csv
  9. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/final_clusters.png
  10. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/final_clusters2.png
  11. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/initial_random.png
  12. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/initial_random2.png
  13. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/notes1.PNG
  14. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/notes2.PNG
  15. BIN=BIN
      light-fcm-clustering/Fuzzy-C-Means/results/notes3.PNG
  16. 82 0
      light-fcm-clustering/env_data.py
  17. 44 0
      light-fcm-clustering/visual.py
  18. 14 0
      wind-fft-clustering/.gitignore
  19. 6 0
      wind-fft-clustering/README.md
  20. 110 0
      wind-fft-clustering/cluster_analysis.py
  21. 163 0
      wind-fft-clustering/data_add.py
  22. 483 0
      wind-fft-clustering/data_analysis.py
  23. 78 0
      wind-fft-clustering/data_clean.py
  24. 21 0
      wind-fft-clustering/聚类结果说明/README.md
  25. 11 0
      wind-fft-clustering/聚类结果说明/cluster/README.md
  26. BIN=BIN
      wind-fft-clustering/聚类结果说明/cluster/cluster_1.png
  27. BIN=BIN
      wind-fft-clustering/聚类结果说明/cluster/cluster_2.png
  28. BIN=BIN
      wind-fft-clustering/聚类结果说明/cluster/cluster_3.png
  29. BIN=BIN
      wind-fft-clustering/聚类结果说明/cluster/cluster_4.png
  30. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/10_turbine_fft.png
  31. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/11_turbine_fft.png
  32. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/12_turbine_fft.png
  33. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/13_turbine_fft.png
  34. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/14_turbine_fft.png
  35. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/15_turbine_fft.png
  36. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/16_turbine_fft.png
  37. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/17_turbine_fft.png
  38. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/18_turbine_fft.png
  39. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/19_turbine_fft.png
  40. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/1_turbine_fft.png
  41. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/20_turbine_fft.png
  42. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/21_turbine_fft.png
  43. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/22_turbine_fft.png
  44. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/23_turbine_fft.png
  45. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/24_turbine_fft.png
  46. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/25_turbine_fft.png
  47. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/26_turbine_fft.png
  48. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/27_turbine_fft.png
  49. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/28_turbine_fft.png
  50. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/29_turbine_fft.png
  51. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/2_turbine_fft.png
  52. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/30_turbine_fft.png
  53. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/31_turbine_fft.png
  54. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/32_turbine_fft.png
  55. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/33_turbine_fft.png
  56. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/34_turbine_fft.png
  57. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/35_turbine_fft.png
  58. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/36_turbine_fft.png
  59. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/37_turbine_fft.png
  60. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/38_turbine_fft.png
  61. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/39_turbine_fft.png
  62. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/3_turbine_fft.png
  63. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/40_turbine_fft.png
  64. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/41_turbine_fft.png
  65. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/42_turbine_fft.png
  66. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/43_turbine_fft.png
  67. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/44_turbine_fft.png
  68. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/45_turbine_fft.png
  69. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/46_turbine_fft.png
  70. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/47_turbine_fft.png
  71. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/48_turbine_fft.png
  72. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/49_turbine_fft.png
  73. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/4_turbine_fft.png
  74. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/5_turbine_fft.png
  75. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/6_turbine_fft.png
  76. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/7_turbine_fft.png
  77. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/8_turbine_fft.png
  78. BIN=BIN
      wind-fft-clustering/聚类结果说明/fft/9_turbine_fft.png
  79. 15 0
      wind-fft-clustering/聚类结果说明/fft/README.md
  80. BIN=BIN
      wind-fft-clustering/聚类结果说明/turbine_cluster.png
  81. BIN=BIN
      wind-fft-clustering/聚类结果说明/风机标签与风机名称对应表.xlsx

+ 3 - 0
.gitignore

@@ -0,0 +1,3 @@
+**/__pycache__
+**/.idea
+./wind-fft-clustering/analysis_img/

+ 6 - 0
README.md

@@ -0,0 +1,6 @@
+## 功率预测系统
+
+聚类工程。
+
+
+

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 884 - 0
light-fcm-clustering/Fuzzy-C-Means/.ipynb_checkpoints/Fuzzy C-Means develop-checkpoint.ipynb


+ 14 - 0
light-fcm-clustering/Fuzzy-C-Means/FCM.md

@@ -0,0 +1,14 @@
+
+
+
+
+---
+
+>Reference:
+>
+>1. https://blog.csdn.net/on2way/article/details/47087201
+>2. https://www.programmersought.com/article/9666746636/
+>3. https://www.kaggle.com/prateekk94/fuzzy-c-means-clustering-on-iris-dataset
+>4. https://www.datanovia.com/en/lessons/fuzzy-clustering-essentials/fuzzy-c-means-clustering-algorithm/
+>5. https://github.com/theimageprocessingguy/Fuzzy-C-Means-Python
+

A diferenza do arquivo foi suprimida porque é demasiado grande
+ 884 - 0
light-fcm-clustering/Fuzzy-C-Means/Fuzzy C-Means develop.ipynb


+ 188 - 0
light-fcm-clustering/Fuzzy-C-Means/PlotFunctions.py

@@ -0,0 +1,188 @@
+# -*- coding: utf-8 -*-
+"""
+Created on Sat Jun  5 00:24:23 2021
+
+@author: 34123
+"""
+import matplotlib.pyplot as plt
+import numpy as np
+import random
+from scipy.stats import multivariate_normal
+
+
+def plot_random_init_iris_sepal(df_full):
+    sepal_df = df_full.iloc[:,0:2]
+    sepal_df = np.array(sepal_df)
+    
+    m1 = random.choice(sepal_df)
+    m2 = random.choice(sepal_df)
+    m3 = random.choice(sepal_df)
+
+    cov1 = np.cov(np.transpose(sepal_df))
+    cov2 = np.cov(np.transpose(sepal_df))
+    cov3 = np.cov(np.transpose(sepal_df))
+    
+    x1 = np.linspace(4,8,150)  
+    x2 = np.linspace(1.5,4.5,150)
+    X, Y = np.meshgrid(x1,x2) 
+
+    Z1 = multivariate_normal(m1, cov1)  
+    Z2 = multivariate_normal(m2, cov2)
+    Z3 = multivariate_normal(m3, cov3)
+    
+    # a new array of given shape and type, without initializing entries
+    pos = np.empty(X.shape + (2,))
+    pos[:, :, 0] = X; pos[:, :, 1] = Y   
+
+    plt.figure(figsize=(10,10))
+    plt.scatter(sepal_df[:,0], sepal_df[:,1], marker='o')     
+    plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5) 
+    plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5) 
+    plt.contour(X, Y, Z3.pdf(pos), colors="g" ,alpha = 0.5)
+    # making both the axis equal
+    plt.axis('equal')                                                                 
+    plt.xlabel('Sepal Length', fontsize=16)
+    plt.ylabel('Sepal Width', fontsize=16)
+    plt.title('Initial Random Clusters(Sepal)', fontsize=22)
+    plt.grid()
+    plt.show()
+    
+
+def plot_random_init_iris_petal(df_full):
+    petal_df = df_full.iloc[:,2:4]
+    petal_df = np.array(petal_df)
+    
+    m1 = random.choice(petal_df)
+    m2 = random.choice(petal_df)
+    m3 = random.choice(petal_df)
+    cov1 = np.cov(np.transpose(petal_df))
+    cov2 = np.cov(np.transpose(petal_df))
+    cov3 = np.cov(np.transpose(petal_df))
+
+    x1 = np.linspace(-1,7,150)
+    x2 = np.linspace(-1,4,150)
+    X, Y = np.meshgrid(x1,x2) 
+
+    Z1 = multivariate_normal(m1, cov1)  
+    Z2 = multivariate_normal(m2, cov2)
+    Z3 = multivariate_normal(m3, cov3)
+
+    pos = np.empty(X.shape + (2,))
+    pos[:, :, 0] = X; pos[:, :, 1] = Y   
+
+    plt.figure(figsize=(10,10))
+    plt.scatter(petal_df[:,0], petal_df[:,1], marker='o')     
+    plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5) 
+    plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5) 
+    plt.contour(X, Y, Z3.pdf(pos), colors="g" ,alpha = 0.5) 
+    plt.axis('equal') 
+    plt.xlabel('Petal Length', fontsize=16) 
+    plt.ylabel('Petal Width', fontsize=16)
+    plt.title('Initial Random Clusters(Petal)', fontsize=22)
+    plt.grid()
+    plt.show()
+    
+    
+def plot_cluster_iris_sepal(df_full, labels, centers):
+    # finding mode
+    seto = max(set(labels[0:50]), key=labels[0:50].count) # 2
+    vers = max(set(labels[50:100]), key=labels[50:100].count) # 1
+    virg = max(set(labels[100:]), key=labels[100:].count) # 0
+    
+    # sepal
+    s_mean_clus1 = np.array([centers[seto][0],centers[seto][1]])
+    s_mean_clus2 = np.array([centers[vers][0],centers[vers][1]])
+    s_mean_clus3 = np.array([centers[virg][0],centers[virg][1]])
+    
+    values = np.array(labels) #label
+
+    # search all 3 species
+    searchval_seto = seto
+    searchval_vers = vers
+    searchval_virg = virg
+
+    # index of all 3 species
+    ii_seto = np.where(values == searchval_seto)[0]
+    ii_vers = np.where(values == searchval_vers)[0]
+    ii_virg = np.where(values == searchval_virg)[0]
+    ind_seto = list(ii_seto)
+    ind_vers = list(ii_vers)
+    ind_virg = list(ii_virg)
+    
+    sepal_df = df_full.iloc[:,0:2]
+    
+    seto_df = sepal_df[sepal_df.index.isin(ind_seto)]
+    vers_df = sepal_df[sepal_df.index.isin(ind_vers)]
+    virg_df = sepal_df[sepal_df.index.isin(ind_virg)]
+    
+    cov_seto = np.cov(np.transpose(np.array(seto_df)))
+    cov_vers = np.cov(np.transpose(np.array(vers_df)))
+    cov_virg = np.cov(np.transpose(np.array(virg_df)))
+    
+    sepal_df = np.array(sepal_df)
+    
+    x1 = np.linspace(4,8,150)  
+    x2 = np.linspace(1.5,4.5,150)
+    X, Y = np.meshgrid(x1,x2) 
+
+    Z1 = multivariate_normal(s_mean_clus1, cov_seto)  
+    Z2 = multivariate_normal(s_mean_clus2, cov_vers)
+    Z3 = multivariate_normal(s_mean_clus3, cov_virg)
+
+    pos = np.empty(X.shape + (2,))
+    pos[:, :, 0] = X; pos[:, :, 1] = Y   
+
+    plt.figure(figsize=(10,10))                                                          
+    plt.scatter(sepal_df[:,0], sepal_df[:,1], marker='o')     
+    plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5) 
+    plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5) 
+    plt.contour(X, Y, Z3.pdf(pos), colors="g" ,alpha = 0.5) 
+    plt.axis('equal')                                                                  
+    plt.xlabel('Sepal Length', fontsize=16)
+    plt.ylabel('Sepal Width', fontsize=16)
+    plt.title('Final Clusters(Sepal)', fontsize=22)  
+    plt.grid()
+    plt.show()
+    
+    
+    
+def plot_cluster_iris_petal(df_full, labels, centers):
+    # petal
+    p_mean_clus1 = np.array([centers[seto][2],centers[seto][3]])
+    p_mean_clus2 = np.array([centers[vers][2],centers[vers][3]])
+    p_mean_clus3 = np.array([centers[virg][2],centers[virg][3]])
+    
+    petal_df = df_full.iloc[:,2:4]
+    
+    seto_df = petal_df[petal_df.index.isin(ind_seto)]
+    vers_df = petal_df[petal_df.index.isin(ind_vers)]
+    virg_df = petal_df[petal_df.index.isin(ind_virg)]
+    
+    cov_seto = np.cov(np.transpose(np.array(seto_df)))
+    cov_vers = np.cov(np.transpose(np.array(vers_df)))
+    cov_virg = np.cov(np.transpose(np.array(virg_df)))
+    
+    petal_df = np.array(petal_df) 
+    
+    x1 = np.linspace(0.5,7,150)  
+    x2 = np.linspace(-1,4,150)
+    X, Y = np.meshgrid(x1,x2) 
+
+    Z1 = multivariate_normal(p_mean_clus1, cov_seto)  
+    Z2 = multivariate_normal(p_mean_clus2, cov_vers)
+    Z3 = multivariate_normal(p_mean_clus3, cov_virg)
+
+    pos = np.empty(X.shape + (2,))
+    pos[:, :, 0] = X; pos[:, :, 1] = Y   
+
+    plt.figure(figsize=(10,10))                                                         
+    plt.scatter(petal_df[:,0], petal_df[:,1], marker='o')     
+    plt.contour(X, Y, Z1.pdf(pos), colors="r" ,alpha = 0.5) 
+    plt.contour(X, Y, Z2.pdf(pos), colors="b" ,alpha = 0.5) 
+    plt.contour(X, Y, Z3.pdf(pos), colors="g" ,alpha = 0.5) 
+    plt.axis('equal')                                               
+    plt.xlabel('Petal Length', fontsize=16)
+    plt.ylabel('Petal Width', fontsize=16)
+    plt.title('Final Clusters(Petal)', fontsize=22)
+    plt.grid()
+    plt.show()

+ 300 - 0
light-fcm-clustering/Fuzzy-C-Means/fuzzy_c_means.py

@@ -0,0 +1,300 @@
+# -*- coding: utf-8 -*-
+import os
+import pandas as pd
+import numpy as np
+import random
+import operator
+import math
+from copy import deepcopy
+import matplotlib.pyplot as plt
+from sklearn.model_selection import train_test_split
+# # 将网格线置于曲线之下
+# plt.rcParams['axes.axisbelow'] = False
+plt.style.use('fivethirtyeight') # 'ggplot'
+
+from PlotFunctions import plot_random_init_iris_sepal, plot_random_init_iris_petal, plot_cluster_iris_sepal, plot_cluster_iris_petal
+
+
+from sklearn.datasets import load_iris
+
+
+def load_iris_data():
+    data = load_iris()
+    # iris数据集的特征列
+    features = data['data']
+    # iris数据集的标签
+    target = data['target']
+    # 增加维度1,用于拼接
+    target = target[:, np.newaxis]
+    
+    target_names = data['target_names']
+    target_dicts = dict(zip(np.unique(target), target_names))
+    
+    feature_names = data['feature_names']
+    
+    # 浅拷贝,防止原地修改
+    feature_names = data['feature_names'].copy() # deepcopy(data['feature_names'])
+    feature_names.append('label')
+    
+    df_full = pd.DataFrame(data = np.concatenate([features, target], axis=1), 
+                           columns=feature_names)
+    # 保存数据集
+    df_full.to_csv(str(os.getcwd()) + '/iris_data.csv', index=None)
+    
+    columns = list(df_full.columns)
+    features = columns[:len(columns)-1]
+    class_labels = list(df_full[columns[-1]])
+    df = df_full[features]
+    
+    return df_full, df, class_labels, target_dicts
+
+
+def load_env_data():
+    path = '../xiangzhou/features.csv'
+    env = pd.read_csv(path)
+    return env
+
+# 初始化隶属度矩阵 U
+def init_fuzzy_matrix(n_sample, c):
+    """
+    随机初始化隶属度矩阵,注意针对一个样本,三个隶属度的相加和=1
+    ----
+    param n_sample: 样本数量
+    param c: 聚类数量
+    """
+    # 针对数据集中所有样本的隶属度矩阵,shape = [n_sample, c]
+    fuzzy_matrix = []
+    
+    for i in range(n_sample):
+        # 生成 c 个随机数列表, random.random()方法随机生成[0,1)范围内的一个实数。
+        random_list = [random.random() for i in range(c)]
+        sum_of_random = sum(random_list)
+        # 归一化之后的随机数列表
+        # 单个样本的模糊隶属度列表
+        norm_random_list = [x/sum_of_random for x in random_list]
+        # 选择随机参数列表中最大的数的索引
+        one_of_random_index = norm_random_list.index(max(norm_random_list))
+        
+        for j in range(0, len(norm_random_list)):
+            if(j == one_of_random_index):
+                norm_random_list[j] = 1
+            else:
+                norm_random_list[j] = 0
+                
+        fuzzy_matrix.append(norm_random_list)
+    
+    return fuzzy_matrix
+
+
+# 计算FCM的聚类中心
+def cal_cluster_centers(df, fuzzy_matrix, n_sample, c, m):
+    """
+    param df: 数据集的特征集,不包含标签列
+    param fuzzy_matrix: 隶属度矩阵
+    param c: 聚类簇数量
+    param m: 加权指数
+    """
+    # *字符称为解包运算符
+    # zip(*fuzzy_amtrix) 相当于将fuzzy_matrix按列展开并拼接,但并不合并!
+    # list(zip(*fuzzy_amtrix)) 包含 列数 个元组。
+    fuzzy_mat_ravel = list(zip(*fuzzy_matrix))
+    
+    cluster_centers = []
+    
+    # 遍历聚类数量次
+    for j in range(c):
+        # 取出属于某一类的所有样本的隶属度列表(隶属度矩阵的一列)
+        fuzzy_one_dim_list = list(fuzzy_mat_ravel[j])
+        # 计算隶属度的m次方
+        m_fuzzy_one_dim_list = [p ** m for p in fuzzy_one_dim_list]
+        # 隶属度求和,求解聚类中心公式中的分母
+        denominator = sum(m_fuzzy_one_dim_list)
+        
+        # 
+        numerator_list = []
+        
+        # 遍历所有样本,求分子
+        for i in range(n_sample):
+            # 取出一个样本
+            sample = list(df.iloc[i])
+            # 聚类簇中心的分子部分,样本与对应的隶属度的m次方相乘
+            mul_sample_fuzzy = [m_fuzzy_one_dim_list[i] * val for val in sample]
+            numerator_list.append(mul_sample_fuzzy)
+        # 计算分子,求和
+        numerator = map(sum, list(zip(*numerator_list)))
+        cluster_center = [val/denominator for val in numerator]
+        cluster_centers.append(cluster_center)
+        
+    return cluster_centers
+
+# 更新隶属度矩阵,参考公式 (8)
+def update_fuzzy_matrix(df, fuzzy_matrix, n_sample, c, m, cluster_centers):
+    # 分母的指数项
+    order = float(2 / (m - 1))
+    # 遍历样本
+    for i in range(n_sample):
+        # 单个样本
+        sample = list(df.iloc[i])
+        # 计算更新公式的分母:样本减去聚类中心
+        distances = [np.linalg.norm(  np.array(list(  map(operator.sub, sample, cluster_centers[j])  ))  ) \
+                     for j in range(c)]
+        for j in range(c):
+            # 更新公式的分母
+            denominator = sum([math.pow(float(distances[j]/distances[val]), order) for val in range(c)])
+            fuzzy_matrix[i][j] = float(1 / denominator)
+            
+    return fuzzy_matrix  #, distances
+
+
+# 获取聚类中心
+def get_clusters(fuzzy_matrix, n_sample, iter, max_iter):
+    # 隶属度最大的那一个维度作为最终的聚类结果
+    cluster_labels, delete_labels = [], []
+    for i in range(n_sample):
+        max_val, idx = max( (val, idx) for (idx, val) in enumerate(fuzzy_matrix[i]) )
+        cluster_labels.append(idx)
+        if iter == max_iter-1:
+            print("max_val = ", max_val)
+            if max_val < 0.15:
+                delete_labels.append(i)
+    return cluster_labels, delete_labels
+
+
+# 模糊c均值聚类算法
+def fuzzy_c_means(df, fuzzy_matrix, n_sample, c, m, max_iter, init_method='random'):
+    """
+    param init_random: 聚类中心的初始化方法
+            - random: 从样本中随机选择c个作为聚类中心
+            - multi_normal: 多元高斯分布采样
+    """
+    # 样本特征数量
+    n_features = df.shape[-1]
+    # 初始化隶属度矩阵
+    fuzzy_matrix = init_fuzzy_matrix(n_sample, c)
+    # 初始化迭代次数
+    current_iter = 0
+    # 初始化聚类中心
+    init_cluster_centers = []
+    cluster_centers = []
+    # 初始化样本聚类标签的列表,每次迭代都需要保存每个样本的聚类
+    max_iter_cluster_labels = []
+    # 选择初始化方法
+    if init_method == 'multi_normal':
+        # 均值列表
+        mean = [0] * n_features
+        # 多元高斯分布的协方差矩阵,对角阵
+        cov = np.identity(n_features)
+        for i in range(0, c):
+            init_cluster_centers.append(list(np.random.multivariate_normal(mean, cov) )  )
+#     else:
+#         init_cluster_centers = [[0.1] * n_features ] * c
+        
+    print(init_cluster_centers)
+    
+    while current_iter < max_iter:
+        if current_iter == 0 and init_method == 'multi_normal':
+            cluster_centers = init_cluster_centers
+        else:
+            cluster_centers = cal_cluster_centers(df, fuzzy_matrix, n_sample, c, m)
+        fuzzy_matrix = update_fuzzy_matrix(df, fuzzy_matrix, n_sample, c, m, cluster_centers)
+        cluster_labels, delete_labels = get_clusters(fuzzy_matrix, n_sample, iter=current_iter, max_iter=max_iter)
+        max_iter_cluster_labels.append(cluster_labels)
+        
+        current_iter += 1
+        
+        print('-' * 32)
+        print("Fuzzy Matrix U:\n")
+        print(np.array(fuzzy_matrix))
+        
+    return cluster_centers, cluster_labels, max_iter_cluster_labels, delete_labels
+
+
+def process_nwp(labels, delete_labels):
+    env = load_env_data()
+    nwps = pd.read_csv('../xiangzhou/NWP.csv')
+    nwp_1, nwp_2, nwp_3, nwp_4 = [], [], [], []
+    nwps['C_TIME'] = pd.to_datetime(nwps['C_TIME'])
+    for index, nwp in nwps.iterrows():
+        time = nwp['C_TIME'].strftime('%Y-%m-%d %H:00:00')
+        if len(env[env['C_TIME'].values == time].index) == 0:
+            print("nwp此时的时间点在环境数据中找不到:", nwp['C_TIME'])
+            continue
+        row = env[env['C_TIME'].values == time].index[0]
+        cls = labels[row]
+        if row in delete_labels:
+            continue
+        if cls == 0:
+            nwp_1.append(nwp)
+        elif cls == 1:
+            nwp_2.append(nwp)
+        elif cls == 2:
+            nwp_3.append(nwp)
+        elif cls == 3:
+            nwp_4.append(nwp)
+
+    nwp_1 = pd.concat(nwp_1, axis=1).T.reset_index(drop=True)
+    nwp_2 = pd.concat(nwp_2, axis=1).T.reset_index(drop=True)
+    nwp_3 = pd.concat(nwp_3, axis=1).T.reset_index(drop=True)
+    nwp_4 = pd.concat(nwp_4, axis=1).T.reset_index(drop=True)
+
+    nwp1_train, nwp1_test = train_test_split(nwp_1, test_size=0.1,
+                                             random_state=7,
+                                             shuffle=False)
+    nwp1_test['label'] = 1
+    nwp2_train, nwp2_test = train_test_split(nwp_2, test_size=0.1,
+                                             random_state=7,
+                                             shuffle=False)
+    nwp2_test['label'] = 2
+    nwp3_train, nwp3_test = train_test_split(nwp_3, test_size=0.1,
+                                             random_state=7,
+                                             shuffle=False)
+    nwp3_test['label'] = 3
+    nwp4_train, nwp4_test = train_test_split(nwp_4, test_size=0.1,
+                                             random_state=7,
+                                             shuffle=False)
+    nwp4_test['label'] = 4
+    data_test = pd.concat([nwp1_test, nwp2_test, nwp3_test, nwp4_test])
+    data_test.to_csv('../xiangzhou/Dataset_training/nwp_test.csv', index=False)
+    nwp1_train.to_csv('../xiangzhou/Dataset_training/nwp_1.csv', index=False)
+    nwp2_train.to_csv('../xiangzhou/Dataset_training/nwp_2.csv', index=False)
+    nwp3_train.to_csv('../xiangzhou/Dataset_training/nwp_3.csv', index=False)
+    nwp4_train.to_csv('../xiangzhou/Dataset_training/nwp_4.csv', index=False)
+    data_train = pd.concat([nwp1_train, nwp2_train, nwp3_train, nwp4_train])
+    data_train.to_csv('../xiangzhou/Dataset_training/nwp_train.csv', index=False)
+
+if __name__ == '__main__':
+    # df_full, df, class_labels, target_dicts = load_iris_data()
+    df = load_env_data().iloc[:, 1:]
+
+    # 簇数量,鸢尾花数据集有3类
+    c = 4
+    # 最大迭代次数,防止无限循环
+    max_iter = 20
+    # 数据量
+    n_sample = len(df)
+    # 加权指数m,有论文建议 [1.5, 2.5] 范围之间比较好
+    m = 1.7
+    
+    fuzzy_matrix = init_fuzzy_matrix(n_sample, c)
+    centers, labels, acc, delete_labels = fuzzy_c_means(df,
+                                     fuzzy_matrix, 
+                                     n_sample, 
+                                     c, 
+                                     m, 
+                                     max_iter, 
+                                     init_method='multi_normal') # multi_normal, random
+    process_nwp(labels, delete_labels)
+    from visual import cluster_scatter
+    cluster_scatter(x=df.values, y=labels)
+    # plot_random_init_iris_sepal(df)
+    # plot_random_init_iris_petal(df)
+    # plot_cluster_iris_sepal(df, labels, centers)
+    # plot_cluster_iris_petal(df, labels, centers)
+    
+    
+    
+    
+    
+    
+    
+    

+ 151 - 0
light-fcm-clustering/Fuzzy-C-Means/iris_data.csv

@@ -0,0 +1,151 @@
+sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),label
+5.1,3.5,1.4,0.2,0.0
+4.9,3.0,1.4,0.2,0.0
+4.7,3.2,1.3,0.2,0.0
+4.6,3.1,1.5,0.2,0.0
+5.0,3.6,1.4,0.2,0.0
+5.4,3.9,1.7,0.4,0.0
+4.6,3.4,1.4,0.3,0.0
+5.0,3.4,1.5,0.2,0.0
+4.4,2.9,1.4,0.2,0.0
+4.9,3.1,1.5,0.1,0.0
+5.4,3.7,1.5,0.2,0.0
+4.8,3.4,1.6,0.2,0.0
+4.8,3.0,1.4,0.1,0.0
+4.3,3.0,1.1,0.1,0.0
+5.8,4.0,1.2,0.2,0.0
+5.7,4.4,1.5,0.4,0.0
+5.4,3.9,1.3,0.4,0.0
+5.1,3.5,1.4,0.3,0.0
+5.7,3.8,1.7,0.3,0.0
+5.1,3.8,1.5,0.3,0.0
+5.4,3.4,1.7,0.2,0.0
+5.1,3.7,1.5,0.4,0.0
+4.6,3.6,1.0,0.2,0.0
+5.1,3.3,1.7,0.5,0.0
+4.8,3.4,1.9,0.2,0.0
+5.0,3.0,1.6,0.2,0.0
+5.0,3.4,1.6,0.4,0.0
+5.2,3.5,1.5,0.2,0.0
+5.2,3.4,1.4,0.2,0.0
+4.7,3.2,1.6,0.2,0.0
+4.8,3.1,1.6,0.2,0.0
+5.4,3.4,1.5,0.4,0.0
+5.2,4.1,1.5,0.1,0.0
+5.5,4.2,1.4,0.2,0.0
+4.9,3.1,1.5,0.2,0.0
+5.0,3.2,1.2,0.2,0.0
+5.5,3.5,1.3,0.2,0.0
+4.9,3.6,1.4,0.1,0.0
+4.4,3.0,1.3,0.2,0.0
+5.1,3.4,1.5,0.2,0.0
+5.0,3.5,1.3,0.3,0.0
+4.5,2.3,1.3,0.3,0.0
+4.4,3.2,1.3,0.2,0.0
+5.0,3.5,1.6,0.6,0.0
+5.1,3.8,1.9,0.4,0.0
+4.8,3.0,1.4,0.3,0.0
+5.1,3.8,1.6,0.2,0.0
+4.6,3.2,1.4,0.2,0.0
+5.3,3.7,1.5,0.2,0.0
+5.0,3.3,1.4,0.2,0.0
+7.0,3.2,4.7,1.4,1.0
+6.4,3.2,4.5,1.5,1.0
+6.9,3.1,4.9,1.5,1.0
+5.5,2.3,4.0,1.3,1.0
+6.5,2.8,4.6,1.5,1.0
+5.7,2.8,4.5,1.3,1.0
+6.3,3.3,4.7,1.6,1.0
+4.9,2.4,3.3,1.0,1.0
+6.6,2.9,4.6,1.3,1.0
+5.2,2.7,3.9,1.4,1.0
+5.0,2.0,3.5,1.0,1.0
+5.9,3.0,4.2,1.5,1.0
+6.0,2.2,4.0,1.0,1.0
+6.1,2.9,4.7,1.4,1.0
+5.6,2.9,3.6,1.3,1.0
+6.7,3.1,4.4,1.4,1.0
+5.6,3.0,4.5,1.5,1.0
+5.8,2.7,4.1,1.0,1.0
+6.2,2.2,4.5,1.5,1.0
+5.6,2.5,3.9,1.1,1.0
+5.9,3.2,4.8,1.8,1.0
+6.1,2.8,4.0,1.3,1.0
+6.3,2.5,4.9,1.5,1.0
+6.1,2.8,4.7,1.2,1.0
+6.4,2.9,4.3,1.3,1.0
+6.6,3.0,4.4,1.4,1.0
+6.8,2.8,4.8,1.4,1.0
+6.7,3.0,5.0,1.7,1.0
+6.0,2.9,4.5,1.5,1.0
+5.7,2.6,3.5,1.0,1.0
+5.5,2.4,3.8,1.1,1.0
+5.5,2.4,3.7,1.0,1.0
+5.8,2.7,3.9,1.2,1.0
+6.0,2.7,5.1,1.6,1.0
+5.4,3.0,4.5,1.5,1.0
+6.0,3.4,4.5,1.6,1.0
+6.7,3.1,4.7,1.5,1.0
+6.3,2.3,4.4,1.3,1.0
+5.6,3.0,4.1,1.3,1.0
+5.5,2.5,4.0,1.3,1.0
+5.5,2.6,4.4,1.2,1.0
+6.1,3.0,4.6,1.4,1.0
+5.8,2.6,4.0,1.2,1.0
+5.0,2.3,3.3,1.0,1.0
+5.6,2.7,4.2,1.3,1.0
+5.7,3.0,4.2,1.2,1.0
+5.7,2.9,4.2,1.3,1.0
+6.2,2.9,4.3,1.3,1.0
+5.1,2.5,3.0,1.1,1.0
+5.7,2.8,4.1,1.3,1.0
+6.3,3.3,6.0,2.5,2.0
+5.8,2.7,5.1,1.9,2.0
+7.1,3.0,5.9,2.1,2.0
+6.3,2.9,5.6,1.8,2.0
+6.5,3.0,5.8,2.2,2.0
+7.6,3.0,6.6,2.1,2.0
+4.9,2.5,4.5,1.7,2.0
+7.3,2.9,6.3,1.8,2.0
+6.7,2.5,5.8,1.8,2.0
+7.2,3.6,6.1,2.5,2.0
+6.5,3.2,5.1,2.0,2.0
+6.4,2.7,5.3,1.9,2.0
+6.8,3.0,5.5,2.1,2.0
+5.7,2.5,5.0,2.0,2.0
+5.8,2.8,5.1,2.4,2.0
+6.4,3.2,5.3,2.3,2.0
+6.5,3.0,5.5,1.8,2.0
+7.7,3.8,6.7,2.2,2.0
+7.7,2.6,6.9,2.3,2.0
+6.0,2.2,5.0,1.5,2.0
+6.9,3.2,5.7,2.3,2.0
+5.6,2.8,4.9,2.0,2.0
+7.7,2.8,6.7,2.0,2.0
+6.3,2.7,4.9,1.8,2.0
+6.7,3.3,5.7,2.1,2.0
+7.2,3.2,6.0,1.8,2.0
+6.2,2.8,4.8,1.8,2.0
+6.1,3.0,4.9,1.8,2.0
+6.4,2.8,5.6,2.1,2.0
+7.2,3.0,5.8,1.6,2.0
+7.4,2.8,6.1,1.9,2.0
+7.9,3.8,6.4,2.0,2.0
+6.4,2.8,5.6,2.2,2.0
+6.3,2.8,5.1,1.5,2.0
+6.1,2.6,5.6,1.4,2.0
+7.7,3.0,6.1,2.3,2.0
+6.3,3.4,5.6,2.4,2.0
+6.4,3.1,5.5,1.8,2.0
+6.0,3.0,4.8,1.8,2.0
+6.9,3.1,5.4,2.1,2.0
+6.7,3.1,5.6,2.4,2.0
+6.9,3.1,5.1,2.3,2.0
+5.8,2.7,5.1,1.9,2.0
+6.8,3.2,5.9,2.3,2.0
+6.7,3.3,5.7,2.5,2.0
+6.7,3.0,5.2,2.3,2.0
+6.3,2.5,5.0,1.9,2.0
+6.5,3.0,5.2,2.0,2.0
+6.2,3.4,5.4,2.3,2.0
+5.9,3.0,5.1,1.8,2.0

BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/final_clusters.png


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/final_clusters2.png


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/initial_random.png


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/initial_random2.png


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/notes1.PNG


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/notes2.PNG


BIN=BIN
light-fcm-clustering/Fuzzy-C-Means/results/notes3.PNG


+ 82 - 0
light-fcm-clustering/env_data.py

@@ -0,0 +1,82 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# time: 2023/6/12 13:24
+# file: env_data.py
+# author: David
+# company: shenyang JY
+import datetime
+import math
+import pandas as pd
+import numpy as np
+
+
+def process_env_data():
+    path = './xiangzhou/weather/weather-1-process.csv'
+    envn = pd.read_csv(path, usecols=['C_TIME', 'C_GLOBALR', 'C_DIFFUSER', 'C_RH'])     # C_GLOBALR 总辐射 C_DIFFUSER 散辐射 C_RH 湿度
+    envn['C_TIME'] = pd.to_datetime(envn['C_TIME'])
+    path1 = './xiangzhou/power5.csv'
+    power = pd.read_csv(path1, usecols=['C_TIME', 'C_REAL_VALUE'])
+    power['C_TIME'] = pd.to_datetime(power['C_TIME'])
+    envn = pd.merge(envn, power, on='C_TIME')
+    envn_filter = envn[envn['C_GLOBALR'] > 0].reset_index(drop=True)  # 过滤夜间环境 (总辐射大于0)
+    envn = normalize(envn)
+    pre = envn_filter.iloc[0, 0].hour
+    envn.set_index('C_TIME', inplace=True)
+    envs, env = [], []
+    for index, row in envn_filter.iterrows():
+        if pre != row[0].hour:
+            con = pd.concat(env, axis=1).T
+            # 数据不齐,要么联立后缺点,要不是日出或日落时分
+            if len(con) != 12:
+                con = envn.loc[str(con.iloc[0, 0])[:-6]].reset_index()
+                print("数据不齐,该时间点为:", row[0], "新的长度为:", len(con))
+            envs.append(con.reset_index(drop=True))
+            pre = row[0].hour
+            env = [row]
+        else:
+            env.append(row)
+    return envs
+
+
+def envn_features(envs, path):
+    for i, env in enumerate(envs):
+        zero_indexs = env[env['C_GLOBALR'] == 0].index
+        print("----", env)
+        if len(zero_indexs) > 0:
+            env.iloc[zero_indexs, env.columns.get_loc('C_GLOBALR')] = 0.1
+        print("++++", env)
+        x = list(map(lambda x,y: x/y, env['C_DIFFUSER'], env['C_GLOBALR']))
+        f1 = round(np.mean(x), 2)
+        env['diff1'] = env['C_REAL_VALUE'].diff()
+        env['diff_1'] = env['C_REAL_VALUE'].diff(-1)
+        ei = (env['diff1']*env['diff_1']).tolist()[1:-1]
+        ei = [1 if e > 0 else 0 for e in ei]
+        f2 = round(np.mean(ei), 2)
+        f3 = round(np.mean(env['C_RH'].tolist()), 2)
+        time = env.iloc[-1]['C_TIME'].replace(minute=0)
+        time += datetime.timedelta(hours=1)     # 反应的是下一个小时的环境特征
+        envs[i] = [time, f1, f2, f3]
+    envn_features = pd.DataFrame(envs, columns=['C_TIME', 'f1', 'f2', 'f3'])
+    # envn_features = normalize(envn_features)
+    envn_features.to_csv(path, index=False)
+
+
+def normalize(df):
+    """
+    暂时不将C_TIME归一化
+    :param dfs:
+    :return: 归一化后的DataFrame
+    """
+    df1 = df.iloc[:, 1:]
+    mean = np.mean(df1, axis=0)  # 数据的均值
+    std = np.std(df1, axis=0)  # 标准差
+    print("归一化参数,均值为:{},方差为:{}".format(mean.to_dict(), std.to_dict()))
+    df_Zscore = df1.apply(lambda x: np.around((x - x.mean())/math.sqrt(sum((x-x.mean())**2/len(x))), decimals=2))
+    df_Zscore.insert(0, 'C_TIME', df["C_TIME"])
+    return df_Zscore
+
+
+if __name__ == '__main__':
+    feaP = './xiangzhou/features.csv'
+    envs = process_env_data()
+    envn_features(envs, feaP)

+ 44 - 0
light-fcm-clustering/visual.py

@@ -0,0 +1,44 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# time: 2023/6/28 16:03
+# file: visual.py
+# author: David
+# company: shenyang JY
+
+
+import matplotlib.pyplot as plt
+from sklearn.decomposition import PCA  # 加载PCA算法包
+from sklearn.datasets import load_iris
+import numpy as np
+
+
+def cluster_scatter(x, y):
+    #PCA降维
+    pca = PCA(n_components=2)  # 加载PCA算法,设置降维后PC数目为2
+    reduced_x = pca.fit_transform(x)  # 对样本进行降维
+    # reduced_x = np.dot(reduced_x, pca.components_) + pca.mean_  # 还原数据
+
+    # 画图
+    red_x, red_y = [], []
+    blue_x, blue_y = [], []
+    green_x, green_y = [], []
+    yellow_x, yellow_y = [], []
+    # print(reduced_x)
+    for i in range(len(reduced_x)):
+      if y[i] == 0:
+        red_x.append(reduced_x[i][0])
+        red_y.append(reduced_x[i][1])
+      elif y[i] == 1:
+        blue_x.append(reduced_x[i][0])
+        blue_y.append(reduced_x[i][1])
+      elif y[i] == 2:
+        green_x.append(reduced_x[i][0])
+        green_y.append(reduced_x[i][1])
+      elif y[i] == 3:
+        yellow_x.append(reduced_x[i][0])
+        yellow_y.append(reduced_x[i][1])
+    plt.scatter(red_x, red_y, c='r', marker='.')
+    plt.scatter(blue_x, blue_y, c='b', marker='.')
+    plt.scatter(green_x, green_y, c='g', marker='.')
+    plt.scatter(yellow_x, yellow_y, c='y', marker='.')
+    plt.show()

+ 14 - 0
wind-fft-clustering/.gitignore

@@ -0,0 +1,14 @@
+*/__pycache__
+/__pycache__
+/analysis_img
+/.idea
+/checkpoint
+/log
+/data
+/figure
+*.log
+*.swp
+/log
+/data
+
+

+ 6 - 0
wind-fft-clustering/README.md

@@ -0,0 +1,6 @@
+## 功率预测系统
+
+该模块用于风机聚类,找到风机之间规律。
+
+
+

+ 110 - 0
wind-fft-clustering/cluster_analysis.py

@@ -0,0 +1,110 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+# time: 2023/5/11 14:43
+# file: cluster_power.py
+# author: David
+# company: shenyang JY
+
+import os
+import re
+import numpy as np
+import pandas as pd
+
+
+def read_cfs(cfs, input_path, output_path, is_folder=False):
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+    dfs = {}
+    for j, ids in cfs.items():
+        if is_folder:
+            dirname = input_path.split('/')[-1]
+            x = re.findall('(?<=Continuous_Turbine_Data_).*?(?=_)',dirname)[0]
+            dfs_j = [pd.read_csv(os.path.join(input_path, f"turbine-{id}_{int(x)}.csv")) for id in ids]
+        else:
+            dfs_j = [pd.read_csv(os.path.join(input_path, f"turbine-{id}.csv")) for id in ids]
+        dfj, time_series = dfs_j[0].loc[:, ['C_TIME', 'C_WS', 'C_ACTIVE_POWER']], dfs_j[0]['C_TIME']
+        for df in dfs_j[1:]:
+            if df['C_TIME'].equals(time_series) is False:
+                print("风机之间的日期不一致!")
+                raise ValueError
+            dfj['C_ACTIVE_POWER'] += df['C_ACTIVE_POWER']
+            dfj['C_WS'] += df['C_WS']
+        dfj['C_WS'] /= len(dfs_j)
+        dfj.rename(columns=({'C_ACTIVE_POWER':'C_ACTIVE_POWER'+str(j), 'C_WS': 'C_WS'+str(j)}), inplace=True)
+        if is_folder:
+            dfj[20:].to_csv(os.path.join(output_path, 'cluster_' + str(j) + '.csv'), index=False)
+        else:
+            dfj[20:].to_csv(os.path.join(output_path, 'cluster_' + str(j) + '.csv'), index=False)
+        dfs[j] = dfj
+    return dfs
+
+
+def get_cfs(cluster, turbine_id):
+    cfs = {}
+    for j in range(1, max(cluster) + 1):
+        arr_j = np.where(cluster == j)[0]  # cluster中聚类j的索引列表
+        cfs.setdefault(j, [turbine_id[k] for k in arr_j])
+    for key, value in cfs.items():
+        print("第{}组:{}".format(key, cfs[key]))
+    return cfs
+
+
+def cluster_data_indep(dfs_cluster, root_path):
+    df_power = pd.read_csv(root_path + "power.csv")
+    df_nwp = pd.read_csv(root_path + "NWP.csv",
+                         usecols=["C_TIME", "C_WS100", "C_WS170"])
+    df_all = pd.concat([df_power.set_index("C_TIME"), df_nwp.set_index("C_TIME"),
+                        dfs_cluster], axis=1, join="inner")
+    return df_all
+
+
+def cluster_power_list_file(cluster, turbine_id, input_path, output_path):
+    """
+    从turbine-*.csv的文件列表中进行聚类功率相加
+    cluster:聚类的结果
+    turbine_id:风机ID
+    input_path:输入路径 output_filtered_csv_files 所在路径
+    output_path:输出每个聚类的功率,和所有聚类的功率cluster_data
+    """
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+
+    cfs = get_cfs(cluster, turbine_id)
+    dfs = read_cfs(cfs, input_path, output_path)
+    dfs_cluster = pd.concat([df.set_index("C_TIME") for df in dfs.values()], join='inner', axis=1)
+    dfs_cluster['SUM'] = dfs_cluster.filter(like='C_ACTIVE_POWER').sum(axis=1)
+    dfs_cluster = cluster_data_indep(dfs_cluster, '../data-process/data/')
+    dfs_cluster.reset_index().to_csv(os.path.join(output_path, 'cluster_data.csv'), index=False)
+
+
+def cluster_power_list_folder(cluster, turbine_id, input_path, output_path):
+    """
+    从嵌套turbine-*.csv的多个文件夹列表中进行聚类功率相加
+    cluster:聚类的结果
+    turbine_id:风机ID
+    input_path:输入路径 continuous_data 所在路径
+    output_path:输出每个聚类的功率,和所有聚类的功率cluster_data
+    """
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+    continuous_list = [os.path.join(input_path, path) for path in os.listdir(input_path)]
+    cfs = get_cfs(cluster, turbine_id)
+    for con in continuous_list:
+        dirname = con.split('/')[-1]
+        output = os.path.join(output_path, dirname)
+        dfs = read_cfs(cfs, con, output, True)
+        dfs_cluster = pd.concat([df.set_index("C_TIME") for df in dfs.values()], join='inner', axis=1)
+        dfs_cluster.reset_index().to_csv(os.path.join(output, 'cluster_data.csv'), index=False)
+
+
+if __name__ == '__main__':
+    turbine_id = list(range(102, 162))
+    cluster = np.array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
+    cluster[42] = 1
+    output_path = '../data-process/data/cluster_power/'
+
+    cluster_power_list_file(cluster, turbine_id,
+                            input_path='../data-process/data/output_filtered_csv_files/', output_path=output_path)
+    cluster_power_list_folder(cluster, turbine_id, input_path='../data-process/data/continuous_data/',
+                              output_path=output_path)
+

+ 163 - 0
wind-fft-clustering/data_add.py

@@ -0,0 +1,163 @@
+# -*- coding: utf-8 -*-
+
+
+import pandas as pd
+import matplotlib.pyplot as plt
+import matplotlib.dates as mdates
+from sklearn.preprocessing import MinMaxScaler
+import os
+
+
+# cluster_power路径位置
+root_path = "../data-process/data/"
+# 1、2类平均机头风速,总平均机头风速,nwp风速,实际功率
+add_cols = ["C_WS_1", "C_WS_2", "C_WS_ALL",
+            "C_WS100", "C_WS170", "power", "C_REAL_VALUE"]
+
+
+# 处理几个表的数据,拼接在一起,得到上述列
+def data_process():
+    id1 = [142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155,
+           156, 157, 158, 159, 160, 161]
+    id2 = [102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115,
+           116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
+           130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141]
+
+
+    df_power = pd.read_csv(root_path + "power.csv")
+    df_nwp = pd.read_csv(root_path + "NWP.csv",
+                         usecols=["C_TIME", "C_WS100", "C_WS170"])
+    # df_nwp_power = pd.merge(df_power, df_nwp, on="C_TIME", how="inner")
+
+    turbine_path = root_path + "output_filtered_csv_files/"
+    df_turbine = pd.read_csv(
+        turbine_path + "turbine-102.csv", usecols=["C_TIME"])
+    df_turbine["C_WS_1"] = [0] * len(df_turbine)
+    df_turbine["C_WS_2"] = [0] * len(df_turbine)
+    df_turbine["C_WS_ALL"] = [0] * len(df_turbine)
+    df_turbine["power"] = [0] * len(df_turbine)
+
+    for ids in id1:
+        df_temp = pd.read_csv(turbine_path + f"turbine-{ids}.csv")
+        # if len(df_temp) != len(df_turbine):
+        #     print("false")
+        df_turbine["C_WS_1"] += df_temp["C_WS"]
+        df_turbine["C_WS_ALL"] += df_temp["C_WS"]
+        df_turbine["power"] += df_temp["C_ACTIVE_POWER"]
+
+    df_turbine["C_WS_1"] /= len(id1)
+
+    for ids in id2:
+        df_temp = pd.read_csv(turbine_path + f"turbine-{ids}.csv")
+        # if len(df_temp) != len(df_turbine):
+        #     print("false")
+        df_turbine["C_WS_2"] += df_temp["C_WS"]
+        df_turbine["C_WS_ALL"] += df_temp["C_WS"]
+        df_turbine["power"] += df_temp["C_ACTIVE_POWER"]
+
+    df_turbine["C_WS_2"] /= len(id2)
+    df_turbine["C_WS_ALL"] /= (len(id1) + len(id2))
+    df_turbine["power"] /= (len(id1) + len(id2))
+
+    df_all = pd.concat([df_power.set_index("C_TIME"), df_nwp.set_index("C_TIME"),
+                        df_turbine.set_index("C_TIME")], axis=1, join="inner").reset_index()
+    df_all = df_all.reindex(columns=["C_TIME"] + add_cols)
+    # df_all.drop(columns="power", inplace=True)
+    df_all.to_csv(root_path + "df_all.csv", index=False)
+
+
+# data_process()
+
+
+# 在cluster_data.csv中新增若干列(add_cols),得到cluster_data_1.csv
+def data_add(dirname, filename):
+    df_temp = pd.read_csv(dirname + filename)
+    df_all = pd.read_csv(root_path + "df_all.csv")
+    df = pd.merge(df_all, df_temp, on="C_TIME", how="inner")
+    df = df.reindex(columns=["C_TIME", "power_1",
+                    "power_2"] + add_cols + ["SUM"])
+    df.to_csv(dirname + "cluster_data_1.csv", index=False)
+
+
+# 画随时间变化的曲线
+def show_curve(dirname, filename, series1, series2):
+    df = pd.read_csv(dirname + filename)
+
+    cols = df.columns[1:]
+    scaler = MinMaxScaler()
+    # 最大最小归一化
+    df[cols] = scaler.fit_transform(df[cols])
+
+    c_time = pd.to_datetime(df["C_TIME"])
+
+    plt.figure(figsize=(12, 8), dpi=100)
+
+    plt.plot(c_time, df[series1], label=series1)
+    plt.plot(c_time, df[series2], label=series2)
+
+    plt.legend()
+    date_format = mdates.DateFormatter('%Y-%m-%d %H:%M')
+    plt.gca().xaxis.set_major_formatter(date_format)
+    plt.xticks(rotation=30)
+
+    plt.show()
+    plt.savefig(dirname + "curve_" + series1 + "_" + series2 + ".png")
+    plt.close()
+
+
+# 画s型曲线
+def show_scatter(dirname, filename, series1, series2, series3):
+    df = pd.read_csv(dirname + filename)
+
+    cols = df.columns[1:]
+    scaler = MinMaxScaler()
+    # 最大最小归一化
+    # df[cols] = scaler.fit_transform(df[cols])
+
+    plt.figure(figsize=(10, 8), dpi=100)
+    point_size = 10
+    plt.scatter(df[series1], df[series3], label=series1, s=point_size)
+    plt.scatter(df[series2], df[series3], label=series2, s=point_size)
+    plt.xlabel(series1 + " / " + series2)
+    plt.ylabel(series3)
+
+    plt.legend()
+    plt.show()
+    plt.savefig(dirname + "scatter_" + series1 +
+                "_" + series2 + "_" + series3 + ".png")
+    plt.close()
+
+
+# %%
+if __name__ == "__main__":
+    cluster_path = root_path + "cluster_power/"
+    # 新增数据
+    data_add(cluster_path, "cluster_data.csv")
+
+    for root, dirs, files in os.walk(cluster_path):
+        for sub_dir in dirs:
+            subdir_path = os.path.join(root, sub_dir)
+            # print(subdir_path)
+            # file_path = os.path.join(subdir_path, "cluster_data.csv")
+            # print(file_path)
+
+            data_add(subdir_path + '/', "cluster_data.csv")
+
+    # %% 画曲线图
+    show_curve(cluster_path, "cluster_data_1.csv", "SUM", "C_WS_ALL")
+    for root, dirs, files in os.walk(cluster_path):
+        for sub_dir in dirs:
+            subdir_path = os.path.join(root, sub_dir)
+            show_curve(subdir_path + "/",
+                       "cluster_data_1.csv", "SUM", "C_WS_ALL")
+            # show_curve(subdir_path + "/", "cluster_data_1.csv", "power_1", "C_WS_1")
+            # show_curve(subdir_path + "/", "cluster_data_1.csv", "power_2", "C_WS_2")
+
+    # %% 画散点图(s型曲线)
+    show_scatter(cluster_path, "cluster_data_1.csv",
+                 "C_WS_ALL", "C_WS100", "SUM")
+    for root, dirs, files in os.walk(cluster_path):
+        for sub_dir in dirs:
+            subdir_path = os.path.join(root, sub_dir)
+            show_scatter(subdir_path + "/", "cluster_data_1.csv",
+                         "C_WS_ALL", "C_WS100", "SUM")

+ 483 - 0
wind-fft-clustering/data_analysis.py

@@ -0,0 +1,483 @@
+# !usr/bin/env python
+# -*- coding:utf-8 _*-
+"""
+@Author:Lijiaxing
+ 
+@File:data_analysis.py
+@Time:2023/4/24 15:16
+
+"""
+import os.path
+
+import pandas as pd
+# from mpl_toolkits.basemap import Basemap
+from scipy.signal import savgol_filter
+import numpy as np
+import matplotlib.pyplot as plt
+from scipy.cluster.hierarchy import dendrogram, linkage, fcluster
+from sklearn.metrics import silhouette_samples, silhouette_score
+
+def paint_others(y):
+    """ 绘制其他数据 """
+    plt.plot([j for j in range(y)], y)
+    # 添加标题和标签
+    plt.xlabel('x')
+    plt.ylabel('y')
+
+    # 显示图形
+    plt.show()
+
+
+def compute_cos_similarity(a, b):
+    """
+    计算两个向量的余弦相似度
+    :param a: 向量a
+    :param b: 向量b
+    :return: 余弦相似度值
+    """
+    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
+
+
+def compute_pearsonr(a):
+    """
+    计算数据皮尔逊相关系数并返回相似度矩阵
+    :param a: 数据格式为n*m的矩阵,n为数据个数,m为数据维度
+    :return: 返回相似度矩阵,数据格式为n*n的矩阵
+    """
+    return np.corrcoef(a)
+
+
+def compute_distance(a, b):
+    """
+    计算两个向量的欧式距离
+    :param a:
+    :param b:
+    :return: 返回两个向量的欧式距离
+    """
+    return np.linalg.norm(a - b)
+
+
+def hierarchical_clustering(data, threshold, similarity_func):
+    """
+    层次聚类,使用工具包scipy.cluster.hierarchy中的linkage和fcluster函数进行层次聚类
+    :param data: 二维数据,格式为n*m的矩阵,n为数据个数,m为数据维度
+    :param threshold: 阈值,当两个数据的距离小于阈值时,将两个数据归为一类,阈值为根据相似度矩阵层次聚类后的类别距离阈值,可根据需求进行调整,可大于1
+    :param similarity_func: 相似度计算函数,用于计算两个数据的相似度,可以进行替换,若替换为计算距离的函数需对内部进行修改
+    :return: 返回聚类结果,格式为n*1的矩阵,n为数据个数,每个数据的值为该数据所属的类别
+    """
+    # 计算数据的相似度矩阵
+    similarity_matrix = similarity_func(data)
+
+    # 计算数据的距离矩阵
+    distance_matrix = 1 - similarity_matrix
+
+    # 进行层次聚类返回聚类结果
+    Z = linkage(distance_matrix, method='ward')
+    # 根据相似度阈值获取聚类结果
+    clusters = fcluster(Z, t=threshold, criterion='distance')
+    # 画出层次聚类树形结构
+    fig = plt.figure(figsize=(5, 3))
+    dn = dendrogram(Z)
+    plt.show()
+    # clusters[42] = 1
+    silhouette = silhouette_samples(np.abs(distance_matrix), clusters, metric='euclidean')
+    silhouette1 = silhouette_score(np.abs(distance_matrix), clusters, metric='euclidean')
+    print(f"平均轮廓系数为:{silhouette1}, 单个样本的轮廓系数:{silhouette}")
+    return clusters
+
+
+class DataAnalysis:
+    """
+    数据分析类
+    """
+
+    def __init__(self, data_length, data_start, data_end):
+        """
+        初始化
+        :param data_length: 分析数据段长度
+        :param data_start: 分析数据段开始位置
+        :param data_end: 分析数据段结束位置
+        """
+        # 原始风机功率数据傅里叶变换滤波后的数据
+        self.ori_turbine_fft = None
+        # 原始风机功率数据片段
+        self.ori_turbine_pic = None
+        # 聚类结果
+        self.cluster = None
+        # 风机功率差分平滑后的结果
+        self.smooth_turbine_diff = None
+        # 风机功率差分变化情况
+        self.diff_change = None
+        # 风机功率差分
+        self.turbine_diff = None
+        # 全部风机数据
+        self.turbine = None
+        # 风机的标号顺序
+        self.turbine_id = list(range(102, 162))
+        # b1b4 = [142, 143, 144, 145]
+        # self.turbine_id = [id for id in self.turbine_id if id not in b1b4]
+        # 风机功率数据15分钟级别
+        self.power_15min = None
+        # 风机经纬度信息
+        self.info = None
+        # 使用数据长度
+        self.data_length = data_length
+        # 使用数据开始位置
+        self.data_start = data_start
+        # 使用数据结束位置
+        self.data_end = data_end
+        # 导入数据
+        self.load_data(normalize=True)
+        # 计算风机功率差分
+        self.compute_turbine_diff()
+
+    def load_data(self, normalize=False):
+        """
+        加载数据
+        :return:
+        """
+        self.info = pd.read_csv('../data-process/data/风机信息.csv', encoding='utf-8')
+        # power_15min = pd.read_csv('../data/power_15min.csv')
+        # for i in range(len(power_15min)):
+        #     if power_15min.loc[i, 'C_REAL_VALUE'] == -9999:
+        #         # 方便在曲线中看出缺失数据位置
+        #         power_15min.loc[i, 'C_REAL_VALUE'] = -34.56789
+        # self.power_15min = power_15min
+        turbine_path = '../data-process/data/output_filtered_csv_files/turbine-{}.csv'
+        self.turbine, turbines = {}, []
+        for i in self.turbine_id:
+            self.turbine[i] = pd.read_csv(turbine_path.format(i))[20:].reset_index(drop=True)
+        if normalize is True:
+            self.normalize()
+
+    def normalize(self):
+        turbines = [self.turbine[i].values[:, 1:].astype(np.float32) for i in self.turbine_id]
+        turbines = np.vstack(turbines)
+        mean, std = np.mean(turbines, axis=0), np.std(turbines, axis=0)
+        for i in self.turbine_id:
+            c_time = self.turbine[i]['C_TIME']
+            self.turbine[i] = (self.turbine[i].iloc[:, 1:] - mean) / std
+            self.turbine[i].insert(loc=0, column='C_TIME', value=c_time)
+        return self.turbine
+
+    def compute_turbine_diff(self):
+        """
+        计算风机功率差分
+        :return:
+        """
+        turbine_diff = []
+        ori_turbine_pic = []
+        for turbine_i in self.turbine_id:
+            ori = np.array(self.turbine[turbine_i]['C_WS'].values[self.data_start:self.data_end + 1])
+            diff_array = np.diff(ori)
+            smoothness_value = np.std(diff_array)
+            print("turbine-{}的平滑度是:{}".format(turbine_i, round(smoothness_value, 2)))
+            turbine_diff.append(diff_array)
+            ori_turbine_pic.append(self.turbine[turbine_i]['C_WS'].values[self.data_start:self.data_end])
+        self.ori_turbine_pic = ori_turbine_pic
+        self.turbine_diff = turbine_diff
+
+        diff_change = []
+        for diff_i in turbine_diff:
+            single_diff_change = []
+            for diff_i_i in diff_i:
+                if diff_i_i > 0:
+                    single_diff_change.append(1)
+                elif diff_i_i < 0:
+                    single_diff_change.append(-1)
+                else:
+                    single_diff_change.append(0)
+            diff_change.append(single_diff_change)
+        self.diff_change = diff_change
+        self.ori_turbine_fft = [self.turbine_fft(i + 1) for i in range(len(self.ori_turbine_pic))]
+
+        # 平滑
+        self.turbine_smooth(window_size=21)
+
+    def paint_map(self):
+        """
+        绘制经纬度地图
+        :return:
+        """
+        lats = self.info['纬度'].values
+        lons = self.info['经度'].values
+        map = Basemap()
+
+        # 绘制海岸线和国家边界
+        map.drawcoastlines()
+        map.drawcountries()
+
+        # 绘制经纬度坐标
+        map.drawmeridians(range(0, 360, 30))
+        map.drawparallels(range(-90, 90, 30))
+
+        # 绘制点
+
+        x, y = map(lons, lats)
+        map.plot(x, y, 'bo', markersize=10)
+
+        # 显示图表
+        plt.show()
+
+    def paint_power15min(self):
+        """
+        绘制15分钟功率曲线
+        :return:
+        """
+
+        plt.plot(self.power_15min['C_REAL_VALUE'])
+
+        # 设置图表标题和轴标签
+        plt.title('Data Time Change Curve')
+        plt.xlabel('Date')
+        plt.ylabel('Value')
+
+        # 显示图表
+        plt.show()
+
+    def paint_lats_lons(self):
+        """
+        绘制经纬度图
+        :return:
+        """
+        x = self.info['纬度'].values
+        y = self.info['经度'].values
+
+        # 绘制散点图
+        fig, ax = plt.subplots()
+        plt.scatter(x, y)
+
+        for i, txt in enumerate(self.info['id'].values):
+            ax.annotate(txt, (x[i], y[i]))
+
+        # 设置图表标题和轴标签
+        plt.xlabel('lats')
+        plt.ylabel('lons')
+
+        # 显示图表
+        plt.show()
+
+    def similarity_score(self, turbine_diff, threshold=0.5):
+        """
+        使用余弦相似度计算相似度分数并返回相似度大于阈值的index矩阵
+        :param turbine_diff: 需要计算相似的矩阵,数据格式n*m,n为数据条数,m为数据维数
+        :param threshold: 相似度阈值
+        :return: 返回相似计算后的矩阵
+        """
+        similarity = {i: [] for i in range(49)}
+        similarity_index = {i: [] for i in range(49)}
+        for turbine_i in range(49):
+            for turbine_j in range(49):
+                cos_similarity = compute_cos_similarity(turbine_diff[turbine_i], turbine_diff[turbine_j])
+                similarity[turbine_i].append(cos_similarity)
+                if cos_similarity > threshold:
+                    similarity_index[turbine_i].append(turbine_j)
+        return similarity_index
+
+    def paint_turbine(self, paint_default=True):
+        """
+        绘制风机地理位置图
+        :param paint_default:默认True,绘制聚类后每个类别的数据折线图
+        :return: None
+        """
+
+        # y = self.info['纬度'].values
+        # x = self.info['经度'].values
+        #
+        # fig, ax = plt.subplots(figsize=(15, 15))
+        #
+        # plt.scatter(x, y, c=self.cluster)
+        # for i, txt in enumerate(self.info['C_ID'].values):
+        #     ax.annotate(txt, (x[i], y[i]))
+
+        # 设置图表标题和轴标签
+        # plt.xlabel('lons')
+        # plt.ylabel('lats')
+        # plt.legend()
+        #
+        # # 显示图表
+        # plt.savefig('analysis_img/turbine_cluster.png')
+        # plt.show()
+
+        plt.figure(figsize=(20, 10))
+        cmap = plt.get_cmap('viridis')
+        linestyle= ['solid', 'dashed']
+        for i in range(max(self.cluster)):
+            cluster, cluster_fft = [], []
+            for j, item in enumerate(self.cluster):
+                if item == i + 1:
+                    cluster.append(self.ori_turbine_pic[j])
+                    cluster_fft.append(self.ori_turbine_fft[j])
+            cluster_fft = np.average(cluster_fft, axis=0)
+            cluster = np.average(cluster, axis=0)
+            diff_array = np.diff(cluster)
+            smoothness_value = np.std(diff_array)
+            print("聚类-{}的平滑度是:{}".format(i+1, smoothness_value))
+            color = cmap(i*200)
+            plt.subplot(2, 1, 1)
+            plt.plot([j for j in range(len(cluster))], cluster, color=color, label='cluster'+str(i), linestyle=linestyle[i])
+            plt.subplot(2, 1, 2)
+            plt.plot([j for j in range(len(cluster_fft))], cluster_fft, color=color, label='cluster'+str(i), linestyle=linestyle[i])
+
+        # 添加图例
+        plt.legend()
+        # 显示图形
+        plt.savefig('analysis_img/cluster/clusters.png')
+        plt.show()
+        if paint_default:
+            for i in range(max(self.cluster)):
+                self.paint_turbine_k(i + 1)  # 画出聚类中每个风机的曲线
+
+
+
+    def turbine_smooth(self, window_size=50):
+        """
+        使用滑动平均平滑数据。
+
+        参数:
+        data -- 需要平滑的数据,numpy数组类型
+        window_size -- 滑动窗口大小,整数类型
+
+        返回值:
+        smooth_data -- 平滑后的数据,numpy数组类型
+        """
+
+        # weights = np.repeat(1.0, window_size) / window_size
+        smooth_data = []
+        for turbine_diff_i in self.turbine_diff:
+            smooth_y = savgol_filter(turbine_diff_i, window_length=window_size, polyorder=3)
+            smooth_data.append(smooth_y)
+        #     smooth_data.append(np.convolve(turbine_diff_i, weights, 'valid'))
+        self.smooth_turbine_diff = smooth_data
+
+    def paint_turbine_k(self, k):
+        """
+        绘制第k聚类的风机数据折线图
+        :param k:
+        :return:
+        """
+        pic_label = []
+        y = []
+        plt.figure(figsize=(20, 10))
+        cmap = plt.get_cmap('viridis')
+        for i, item in enumerate(self.cluster):
+            if item == k:
+                pic_label.append('turbine-' + str(self.turbine_id[i]))
+                y.append(self.ori_turbine_fft[i])
+        for i in range(len(y)):
+            color = cmap(i / 10)
+            plt.plot([j for j in range(len(y[i]))], y[i], color=color, label=pic_label[i])
+        # 添加标签和标题
+        plt.xlabel('x')
+        plt.ylabel('y')
+        plt.title('Cluster {}'.format(k))
+
+        # 添加图例
+        plt.legend()
+        # 显示图形
+        plt.savefig('analysis_img/cluster/cluster_{}.png'.format(k))
+        plt.show()
+
+    def turbine_fft(self, k):
+        """
+        对第k台原始风机数据进行傅里叶变换,并绘制变换前后曲线
+        :param k: 数据读入时的风机顺序index,从1开始
+        :return: 傅里叶变换清洗后的数据,数据格式
+        """
+        y = self.ori_turbine_pic
+        t = np.linspace(0, 1, self.data_length)
+        signal = y[k - 1]
+
+        # 进行傅里叶变换
+        freq = np.fft.fftfreq(len(signal), t[1] - t[0])
+        spectrum = np.fft.fft(signal)
+        spectrum_abs = np.abs(spectrum)
+        threshold = np.percentile(spectrum_abs, 98)
+        indices = spectrum_abs > threshold
+        spectrum_clean = indices * spectrum
+
+        # 进行傅里叶逆变换
+        signal_clean = np.fft.ifft(spectrum_clean)
+        # plt.figure(figsize=(20, 10))
+        #
+        # # 绘制时域信号
+        # plt.subplot(4, 1, 1)
+        # plt.plot(t, signal)
+        # plt.title(self.turbine_id[k-1])
+        #
+        # # 绘制频域信号
+        # plt.subplot(4, 1, 2)
+        # plt.plot(freq, np.abs(spectrum))
+        #
+        # # 绘制过滤后的频域信号
+        # plt.subplot(4, 1, 3)
+        # plt.plot(freq, np.abs(spectrum_clean))
+        #
+        # # 绘制过滤后的时域信号
+        # plt.subplot(4, 1, 4)
+        # plt.plot(t, signal_clean)
+        #
+        # plt.savefig('analysis_img/fft/{}_turbine_fft.png'.format(self.turbine_id[k-1]))
+        # plt.show()
+        return signal_clean
+
+    def paint_double(self, i, j):
+        """
+        绘制两台风机的数据变换对比
+        :param i: 风机数据读入时数据编号,从1开始
+        :param j: 风机数据读入时数据编号,从1开始
+        :return:
+        """
+        y = self.ori_turbine_fft
+        x = [index for index in range(self.data_length)]
+        data_i = y[i - 1]
+        data_j = y[j - 1]
+
+        plt.figure(figsize=(20, 10))
+        plt.plot(x, data_i, label='turbine {}'.format(self.turbine_id[i - 1]), linestyle='solid')
+        plt.plot(x, data_j, label='turbine {}'.format(self.turbine_id[j - 1]), linestyle='dashed')
+
+        plt.title('{} and {}'.format(i, j))
+        plt.legend()
+        plt.savefig('analysis_img/{}_{}_turbine.png'.format(self.turbine_id[i - 1], self.turbine_id[j - 1]))
+        plt.show()
+
+    def process_ori_data(self):
+        """
+        对原始数据进行处理,聚类和绘图
+        :return:
+        """
+        self.turbine_clusters(self.ori_turbine_fft)
+        self.paint_turbine()
+
+    def turbine_clusters(self, data=None):
+        """
+        风机数据聚类,聚类信息保存在self.cluster中
+        :param data: 默认为空,也可以使用其他数据聚类,并体现在绘图中,
+        数据格式:二维数据n*m,n为数据条数,m为每条数据维数
+        :return: None
+        """
+        if data is None:
+            cluster = hierarchical_clustering(self.turbine_diff, threshold=1.4,
+                                              similarity_func=compute_pearsonr)  # 层次聚类
+        else:
+            cluster = hierarchical_clustering(data, threshold=0.8,
+                                              similarity_func=compute_pearsonr)
+        self.cluster = cluster
+        # 在这里保存cluster变量
+        from cluster_analysis import cluster_power_list_file, cluster_power_list_folder
+
+        output_path = '../data-process/data/cluster_power/'
+
+        cluster_power_list_file(self.cluster, self.turbine_id,
+                                input_path='../data-process/data/output_filtered_csv_files/', output_path=output_path)
+        cluster_power_list_folder(self.cluster, self.turbine_id, input_path='../data-process/data/continuous_data/',
+                                  output_path=output_path)
+
+
+data_analysis = DataAnalysis(data_length=9773,
+                             data_start=0,
+                             data_end=9773)
+
+data_analysis.process_ori_data()
+data_analysis.paint_double(1, 56)

+ 78 - 0
wind-fft-clustering/data_clean.py

@@ -0,0 +1,78 @@
+# !usr/bin/env python
+# -*- coding:utf-8 _*-
+"""
+@Author:Lijiaxing
+ 
+@File:data_clean.py
+@Time:2023/4/26 18:06
+
+"""
+import os.path
+
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+
+
+def paint_data(clean_data, clean_index=1):
+    x = [index for index in range(len(clean_data))]
+    plt.figure(figsize=(20, 10))
+    plt.title('clean_{}'.format(clean_index))
+
+    # 绘制曲线
+    plt.plot(x, clean_data, color='red', label='clean_data')
+
+    plt.savefig('data_{}.png'.format(clean_index))
+    plt.show()
+
+
+class clean_file:
+    """
+        清洗数据
+    """
+
+    def __init__(self, output_path='./'):
+        """
+        :param output_path: 清洗后的数据存放路径 ,只传入路径,不包括文件名
+        """
+        self.data = []
+        output_path = os.path.join(output_path, 'clean_data')
+        if not os.path.exists(output_path):
+            os.makedirs(output_path)
+        self.output_path = output_path
+
+    def clean_data(self, file_path, clean_name, clean_value, multi_value=False, clean_index=1, paint=True):
+        """
+        数据清洗
+        将-9999或-99数据进行插值处理,并绘制处理后的数据图像保存至output_path路径下
+        :param paint: 是否绘制图像
+        :param multi_value: 若为True,则clean_value为list
+        :param clean_value: 清洗数据中异常值
+        :param clean_name: 清洗数据中异常值列名
+        :param file_path: 需要清洗的数据,csv格式
+        :param clean_index: 清洗数据输出文件名格式为 clean_${clean_index}.csv
+        :return: None
+        """
+        data = pd.read_csv(file_path)
+        if paint:
+            paint_old_data = [item for item in data[clean_name].values]
+        old_data = data[clean_name].values
+        if multi_value:
+            for clean_value_i in clean_value:
+                data[clean_name][old_data == clean_value_i] = np.nan
+        else:
+            data[clean_name][old_data == clean_value] = np.nan
+
+        data[clean_name] = data[clean_name].interpolate()
+        data.to_csv(os.path.join(self.output_path, 'clean_{}.csv'.format(clean_index)), index=False)
+        already_clean = data[clean_name].values
+        if paint:
+            paint_data(already_clean, clean_index)
+
+
+# 使用示例
+cleaner = clean_file(output_path='Dataset_training/power/')
+for i in range(6):
+    cleaner.clean_data(file_path='Dataset_training/power/power_{}.csv'.format(i), clean_name='C_REAL_VALUE',
+                       clean_value=[-9999.0, -99], multi_value=True,
+                       clean_index=i)

+ 21 - 0
wind-fft-clustering/聚类结果说明/README.md

@@ -0,0 +1,21 @@
+# 聚类结果
+
+---
+
+## 目录结构
+
+|------ cluster		                           			            # 聚类结果目录
+|------------- cluster_1.png	       					        # 聚类数据趋势图片,index为类别标签
+|------------- cluster_2.png
+|------------- cluster_3.png
+|------------- cluster_4.png
+|------ fft                                           			        # 傅里叶变换滤波结果目录
+|------------- index_turbine_fft.png 				    # 傅里叶变换滤波前后数据对比
+|------------- index_turbine_fft.png
+|------------- index_turbine_fft.png
+|------------- ......
+|------ turbine_cluster.png               			        # 聚类体现在经纬度位置图中的表现
+|------ 风机标签与风机名称对应表.xlsx               # 使用的index与风机名称对应表
+
+
+

+ 11 - 0
wind-fft-clustering/聚类结果说明/cluster/README.md

@@ -0,0 +1,11 @@
+# 聚类文件说明
+
+---
+
+![样例图片,对应类目1](./cluster_1.png)
+
+## Title:1对应类目编号
+
+## 图例:该类目下对应的风机编号,从1开始,1-49
+
+## 纵坐标:对应功率值

BIN=BIN
wind-fft-clustering/聚类结果说明/cluster/cluster_1.png


BIN=BIN
wind-fft-clustering/聚类结果说明/cluster/cluster_2.png


BIN=BIN
wind-fft-clustering/聚类结果说明/cluster/cluster_3.png


BIN=BIN
wind-fft-clustering/聚类结果说明/cluster/cluster_4.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/10_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/11_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/12_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/13_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/14_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/15_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/16_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/17_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/18_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/19_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/1_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/20_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/21_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/22_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/23_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/24_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/25_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/26_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/27_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/28_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/29_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/2_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/30_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/31_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/32_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/33_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/34_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/35_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/36_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/37_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/38_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/39_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/3_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/40_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/41_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/42_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/43_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/44_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/45_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/46_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/47_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/48_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/49_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/4_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/5_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/6_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/7_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/8_turbine_fft.png


BIN=BIN
wind-fft-clustering/聚类结果说明/fft/9_turbine_fft.png


+ 15 - 0
wind-fft-clustering/聚类结果说明/fft/README.md

@@ -0,0 +1,15 @@
+# FFT 图片说明
+
+---
+
+![样例图片,对应风机1](1_turbine_fft.png)
+
+## Title:1对应风机编号
+
+## 图1:原始数据曲线
+
+## 图2:使用傅里叶变换后将原始数据曲线从时域转换为频域
+
+## 图3:频域曲线过滤噪声后结果
+
+## 图4:将滤噪后的曲线转换回时域的结果

BIN=BIN
wind-fft-clustering/聚类结果说明/turbine_cluster.png


BIN=BIN
wind-fft-clustering/聚类结果说明/风机标签与风机名称对应表.xlsx


Algúns arquivos non se mostraron porque demasiados arquivos cambiaron neste cambio