ボトムアップ式のクラスタリングのグループ化

機械学習 Python3

データ作成 shape: (5, 3) のランダム行列を作成 import pandas as pd import numpy as np np.random.seed(123) variables = ['X', 'Y', 'Z'] labels = ['ID_0', 'ID_1', 'ID_2', 'ID_3', 'ID_4'] X = np.random.random_sample([5, 3])*10 ## pandas のデー…

2018-06-13

教師なしデータのクラスタ分析の検証

機械学習 Python3

## クラスタリングのサンプルを作成 from sklearn.datasets import make_blobs X, y = make_blobs( n_samples=150, n_features=2, centers=3, cluster_std=0.5, shuffle=True, random_state=True ) ## クラスタリングを描画 plt.scatter(X[:, 0], X[:, 1], c…

2018-06-09

機械学習の勉強コード+サイト

機械学習

ここのコードを再利用することで、実装も簡単かも。 github.com Python Data Science Handbook | Python Data Science Handbook sebastianraschka.com https://sebastianraschka.com/pdf/books/dlb/appendix_d_calculus.pdf

2018-06-09

アンサンブル分類器の実装

機械学習 Python3

一般的に、アンサンブル分類器の方が、個別の分類器より性能が高い from sklearn import datasets from sklearn.model_selection import train_test_split from sklearn.preprocessing import ( StandardScaler, LabelEncoder, ) iris = datasets.load_iris(…

2018-06-09

sklearn にて、適合率と再現率

Python3 機械学習

以下の投稿で load したX_train, y_train,... を利用。 kidnohr.hatenadiary.com 適合率と再現率と F1 スコア適合率（PRE）と再現率（REC）について、F1 スコアという性能指標が存在する。 PRE = TP / ( TP + FP ) REC = TP / ( TP + FN ) f1 = 2 * ( PRE *…

2018-06-08

グリッドサーチを使ったチューニング

機械学習 Python3

サポートベクトルマシンのパイプラインのトレーニング from sklearn.preprocessing import LabelEncoder from sklearn.preprocessing import StandardScaler from sklearn.pipeline import make_pipeline from sklearn.model_selection import GridSearchCV …

2018-06-06

sklearnのpipelineの使い方

Python3 機械学習

make_pipelineを通して、(入力)=>(変換器(複数))=>(推定器)=>(出力) のwrapperを利用できる。変換器は fit & transform 推定器は fit import pandas as pd from sklearn.cross_validation import train_test_split from sklearn.preprocessing import Label…

2018-06-05

ランダムフォレストで特徴選択する方法

機械学習 Python3

次元削減で特徴抽出する方法を本から抜粋 df_wine = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data', header=None) from sklearn.ensemble import RandomForestClassifier feat_labels = df_wine.columns[1:] fore…

2018-05-31

二次元の分類結果を plot

機械学習 Python3

機械学習の結果、2 個の特徴の座標と分類結果をわかりやすく図で出力している import numpy as np import matplotlib.pyplot as plt from matplotlib.colors import ListedColormap def plot_dicision_regions(X, y, classifier, test_idx=None, resolusions…

2018-05-27

テンソルの勉強について

機械学習

以下のサイトで、テンソル積やその他諸々の解説をしてた。 http://www.mm.civil.tohoku.ac.jp/renzokutai/0_suugaku.pdf

2018-04-26

LDA（Latent Dirichlet Allocation）でのトピック抽出

Python3 機械学習

以下の形式のsample.csvからデータを取得し、sklean の LDA でトピック抽出する。 id text 1 今日は晴れ。明日は雨 2 今日はカープが優勝した。 ... ... text2topic.py #!/usr/bin/env python # coding:utf-8 from __future__ import print_function from ti…

日に日に分からんことが増えていく…

φ(..)メモメモ

機械学習

ボトムアップ式のクラスタリングのグループ化

教師なしデータのクラスタ分析の検証

機械学習の勉強コード+サイト

アンサンブル分類器の実装

sklearn にて、適合率と再現率

グリッドサーチを使ったチューニング

sklearnのpipelineの使い方

ランダムフォレストで特徴選択する方法

二次元の分類結果を plot

テンソルの勉強について

LDA（Latent Dirichlet Allocation）でのトピック抽出