pandas の nan の判定方法

Python3

a = Nan assert pd.isna(a)

2019-05-20

CNN の layer のイメージ（メモ）

機械学習 Python3

keras のクラス名に準拠 GlobalAveragePooling は各チャネルごとの平均値を軸 0として1次元に出力 AveragePooingとの違いから、Global は全特徴マップを一つに pooling するみたいな Flatten はただ、軸0 として1次元に並べる SeparableConv は、各チャネ…

2019-05-06

VSCode で Python の import error を出さないようにしたい

Python3

設定から、Pythonのsettings.json を追加編集することで回避できる "python.linting.pylintArgs": [ "--max-line-length=80", "--disable=W0142,W0403,W0613,W0232,R0903,R0913,C0103,R0914,C0304,F0401,W0402,E1101,W0614,C0111,C0301" ] stackoverflow.com

2019-05-02

Mac (Homebrew) の Python3 を Python3.7 --> Python3.6 に変える話

Python3 Mac

brew の使い方もちょっと勉強になった brew info python3 # Python3.7をbrewの管理下から外す brew unlink python3 # 最新のPython3.6 をダウンロード (依存関係は無視) brew install --ignore-dependencies https://raw.githubusercontent.com/Homebrew/hom…

2019-04-19

keras の history plot と移動平均

Python3 深層学習

history plot acc = history.history['acc'] val_acc = history.history['val_acc'] loss = history.history['loss'] val_loss = history.history['val_loss'] epochs = range(1, len(acc) + 1) plt.plot(epochs, acc, 'bo', label='Training acc') plt.plot…

2019-04-18

statsmodels で季節性のトレンドを見る

Python3

import warnings import itertools import numpy as np import matplotlib.pyplot as plt warnings.filterwarnings("ignore") plt.style.use('fivethirtyeight') import pandas as pd import statsmodels.api as sm import matplotlib matplotlib.rcParams['…

2019-04-17

statsmodels の summary の t について

Python3 統計学

どうやら、tは標本分布を t分布と仮定した際の coef=0 の T の値っぽい。 t が 0から遠ければ遠いほど、coef が 0でない確率が高い。要は相関がある。 from patsy import dmatrices import statsmodels.api as sm df = sm.datasets.get_rdataset("Guerry", …

2019-04-14

バイナリ分類器の訓練と検証

機械学習 Python3

ロジスティック回帰とXGB分類器のホールドアウト from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.neural_network import MLPClas…

2019-04-12

Sparse データの次元圧縮

Python3

どうやら、普通のPCAは使えないっぽい。MFとかTruncartedSVD とか使えばいいのか。 >>> from sklearn.decomposition import TruncatedSVD >>> from scipy import sparse as sp >>> X = sp.rand(1000, 1000, density=0.0001) >>> clf = TruncatedSVD(100) >>>…

2019-04-09

Python の keras で画像を見る

Python3

pillow と plt と keras より。 import pandas as pd import numpy as np from IPython.display import display, HTML from pandas.tools.plotting import table import matplotlib.pyplot as plt import seaborn as sns import scipy.stats as st import wa…

2019-04-03

stacking 分類用

機械学習 Python3

# 複数のモデルの計算結果の傾向より、分類を予測するメタモデル from sklearn.base import BaseEstimator, TransformerMixin, ClassifierMixin, clone from sklearn.model_selection import StratifiedKFold class StackingAveragedModels(BaseEstimator, C…

2019-03-24

Python の線形回帰（信頼区間95 %）

統計学 Python3

アットマーク@ は .dot() の代わりらしい（内積）。 np.array([1,2]) @ np.array([2,3]) 単回帰について from sklearn.linear_model import LinearRegression X = np.array([167, 168, 168, 183, 170, 165, 163, 173, 177, 170]) y = np.array([59, 58, 65, …

2019-03-23

χ二乗検定と独立性の検定

統計学 Python3

from scipy import stats, integrate l = np.array([4,4,22,26,36,45,39,21,16,4]) score_l = np.arange(5, 105, 10) # 標本平均と不変標本分散 mu = (l * score_l).sum() / l.sum() sigma2 = ((l * ((score_l-mu)**2)).sum() / (l.sum()-1)) # 確率プロット…

2019-03-21

python の仮説検定

統計学 Python3

a = np.array([70, 69, 72, 74, 66, 68, 69, 70, 71, 69, 73, 72, 68, 72, 67]) b = np.array([69, 72, 71, 74, 68, 67, 72, 72, 72, 70, 75, 73, 71, 72, 69]) a.mean(), b.mean() # (70.0, 71.13333333333334) sgm = (a.var() * len(a) + b.var() * len(b)…

2019-03-19

信頼区間のコード

統計学 Python3

import numpy as np from scipy import stats import statsmodels as sms # 例 a = np.array([12.7,6.6,5.6,14.3,11.4,10.8,13.8,11.2,10.0,12.8,7.1,14.0]) # 正規分布の標本と仮定したときの、母平均の信頼区間 sigma = np.sqrt(dev / len(a)) stats.norm.…

2019-03-15

クラス別の割合

Python3

クラス別の割合について、seaborn のこれ使ったら便利 import pandas as pd import seaborn as sns # バイオリンプロット sns.violinplot(x = 'target', y = 'rate', data = df) # 箱髭プロット（直感的な図だが、詳細が不明） sns.boxplot(x = 'target', y …

2019-03-08

コサイン距離系

Python3

コサイン類似度を求めるためには、 sklearn.metrics.pairwise.cosine_similarity の方。 sklearn.metrics.pairwise.cosine_similarity — scikit-learn 0.20.3 documentation docs.scipy.org

2019-02-24

python で memory の状況を取得

Python3

psutil は標準ライブラリではないので、pip install する。 from psutil import virtual_memory def get_mem_available(): mem = virtual_memory() print("{} GB".format(mem.available / (1024 ** 3))) get_mem_available() stackoverflow.com

2019-02-19

pandas の full outer join

Python3

df = pd.DataFrame(np.random.randint(0,100,size=(3, 4)), columns=list('ABCD')) df['_index'] = 1 pd.merge(df[['_index', 'A']], df[['_index','B']], how='outer', on='_index') df.drop(columns=['_index'], inplace=True) stackoverflow.com

2019-02-19

get_dummies を逆変換する方法

Python3

In [1]: import pandas as pd In [2]: s = pd.Series(['a', 'b', 'a', 'c']) In [3]: s Out[3]: 0 a 1 b 2 a 3 c dtype: object In [4]: dummies = pd.get_dummies(s) In [5]: dummies Out[5]: a b c 0 1 0 0 1 0 1 0 2 1 0 0 3 0 0 1 In [6]: s2 = dummies.…

2019-02-12

pandas の Time Series の分析

Python3 機械学習

towardsdatascience.com pandas.pydata.org stackoverflow.com statsmodel の分析も結構使えそう www.statsmodels.org

2019-02-08

tffm レコメンド性能高そう

Python3 機械学習

github.com $ pip install tffm order とかよく挙動がわからない。パラメータがあるけど。この辺、レコメンドエンジンに使えそう。 from sklearn.model_selection import train_test_split X_tr, X_te, y_tr, y_te = train_test_split(df.values, df['tfidf…

2019-02-04

xgboost のstratifiedkfold は使っても意味ないのか

Python3

xgboost の training にstratifiedkfoldを使ってみた。 from xgboost import XGBClassifier from sklearn.model_selection import GridSearchCV from sklearn.model_selection import StratifiedKFold from sklearn.metrics import accuracy_score param_gri…

2019-02-04

feature enginearing のカテゴリ系

Python3

# 量で分割 --> categorical (pd.Interval) df['price_range'] = pd.qcut(allfeat['price'], 5) # 値で分割 --> categorical (pd.Interval) df['age_range'] = pd.qcut(allfeat['age'], 5) >>> iv = pd.Interval(left=0, right=5) >>> iv Interval(0, 5, clo…

2019-01-30

アクセスログの統計処理

Python3 統計学

httpd.apache.org LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D" from datetime import (datetime, date, timedelta) import re import pandas as pd import numpy as np from IPython.display import display, HTML from p…

2019-01-29

目盛りの訳は locator だった...

Python3

python の plt で、時間の目盛りをもっと細かく取りたいと思った。目盛りは scale でなく、locatorの方が正しいらしい。 stackoverflow.com qiita.com

2019-01-09

LSTM を書いてみた

Python3 機械学習

出典元）Python 機械学習プログラミング https://www.amazon.co.jp/dp/4295003379/ tar 解凍 import tarfile with tarfile.open('aclImdb_v1.tar.gz', 'r:gz') as tar: tar.extractall() ai.stanford.edu データ作成 import pandas as pd import os base_pat…

2019-01-04

str.translate() が便利

Python3

trans_dict = ('0123456789', 'abcdefghij') a = {ord(i):ord(h) for i, h in zip(*trans_dict)} '0120-333-666'.translate(a) 'abca-ddd-ggg' qiita.com

2019-01-04

データ分析の流れ

機械学習 Python3

準備 Prepare Problem a) Load libraries b) Load dataset Summarize Data a) Descriptive statistics b) Data visualizations Prepare Data a) Data Cleaning b) Feature Selection c) Data Transforms (Normalize,...) direcotory構成 echo '.DS_Store .ip…

2018-12-31

python の future の使い方

Python3 機械学習

import concurrent.futures import os score_list = [] def worker(my_random_seed): model = CatBoostClassifier( iterations=300, learning_rate=0.1, random_seed=my_random_seed ) model.fit( X_train, y_train, cat_features=cat_features, eval_set=(X…

日に日に分からんことが増えていく…

φ(..)メモメモ

Python3

pandas の nan の判定方法

CNN の layer のイメージ（メモ）

VSCode で Python の import error を出さないようにしたい

Mac (Homebrew) の Python3 を Python3.7 --> Python3.6 に変える話

keras の history plot と移動平均

statsmodels で季節性のトレンドを見る

statsmodels の summary の t について

バイナリ分類器の訓練と検証

Sparse データの次元圧縮

Python の keras で画像を見る

stacking 分類用

Python の線形回帰（信頼区間95 %）

χ二乗検定と独立性の検定

python の仮説検定

信頼区間のコード

クラス別の割合

コサイン距離系

python で memory の状況を取得

pandas の full outer join

get_dummies を逆変換する方法

pandas の Time Series の分析

tffm レコメンド性能高そう

xgboost のstratifiedkfold は使っても意味ないのか

feature enginearing のカテゴリ系

アクセスログの統計処理

目盛りの訳は locator だった...

LSTM を書いてみた

str.translate() が便利

データ分析の流れ

python の future の使い方