emsembling の有名記事

Python3 機械学習

Kaggle Ensembling Guide | MLWave github.com github.com

2018-12-25

t-SNE の有名記事

Python3 機械学習

github.com Comparison of Manifold Learning methods — scikit-learn 0.20.2 documentation distill.pub lvdmaaten.github.io t-SNE: The effect of various perplexity values on the shape — scikit-learn 0.20.2 documentation Interaction Practical Le…

2018-12-25

Jupyter Notebook の tips

Python3 便利ツール・アプリ

www.dataquest.io おまけ kaggletils github.com

2018-12-15

Hayperparameter tuning の便利サイト

機械学習 Python3

3.2. Tuning the hyper-parameters of an estimator — scikit-learn 0.20.1 documentation fastml.com www.analyticsvidhya.com

2018-12-10

mean encoding の方法、Kfold

機械学習 Python3

import pandas as pd import numpy as np index_cols = ['shop_id', 'item_id', 'cnt'] global_mean = 0.2 df = pd.read_csv(filename) # groupby した gb = df.groupby(index_cols,as_index=False).agg({'cnt':{'target':'sum'}}) #fix column names gb.col…

2018-12-03

rank, cluster のscoring 優良記事

Python3 機械学習

rank https://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf The Lemur Project / Wiki / RankLib Learning to Rank Overviewwellecks.wordpress.com…

2018-12-03

binary の logloss と auc

Python3

Binary Class の測定 logloss l_pred = [0.5, 0.5, 0.5, 0.5] l_label = [0, 0, 0, 0] def logloss(l_pred, l_label): n = len(l_pred) score = 0 for t in range(n): i = l_pred[t] k = l_label[t] score += k * np.log(i) + (1 - k) * np.log(i) return - …

2018-11-20

pandas の visualization のライブラリ

機械学習 Python3

seaborn: statistical data visualization — seaborn 0.9.0 documentation plot.ly github.com ggplot | Home NetworkX — NetworkX A demo of the Spectral Biclustering algorithm — scikit-learn 0.20.0 documentation

2018-11-19

pandas の最初に df を確認する関数

Python3

df.dtypes() df.info() df.value_counts() df.isnull() plt.scatter(x1, x2) pd.scatter_matrix(df) df.corr() plt.matshow(...) df.mean()sotr_values().plot(style='.')

2018-11-16

Python の文字列ハイライト機能

Python3

サジェストのハイライト機能をPython側で実装 import re # 前方一致 def hilight_apply_pre(word, _list): return [re.sub('^{}'.format(re.escape(word)), '<{0}>{1}</{0}>'.format('em', word), l, 1) for l in _list] # 部分一致 def hilight_apply_sub(word, _…

2018-11-13

特徴エンジニアリングのおすすめブログ

機械学習 Python3

特徴抽出 4.3. Preprocessing data — scikit-learn 0.20.0 documentation 特徴作成 machinelearningmastery.com What are some best practices in Feature Engineering? - Quora

2018-11-11

pecentile で外れ値を調整する。

Python3

numpy の clip でpercentile の上限下限で外れ値を調整する。 a = [1,2,3,4,1000,5,6,7,5,4] UPPER_BOUND, LOWER_BOUND = np.percentile(a, [1,99]) b = np.clip(a, UPPER_BOUND, LOWER_BOUND) print(b) [ 1.09 2. 3. 4. 910.63 5. 6. 7. 5. 4. ]

2018-11-11

data frame のブログとして

機械学習 Python3

機械学習のおすすめブログ Datas-frame tomaugspurger.github.io

2018-11-08

一つのカラムから、複数カラムへ分割する

Python3

date --> [day, month, year] のカラムに変更。 expand=true にして、rename すればいい。 date 02.01.2013 transactions[['day', 'month', 'year']] = transactions.date.str.split( '.', 2, expand=True ).rename(columns = {0:'day', 1:'month', 2:'year'…

2018-11-08

機械学習のライブラリ（Python）

Python3 機械学習

Python の機械学習の有名ライブラリのまとめ。ライブラリ scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation Overview — H2O 3.22.0.1 documentation www.tensorflow.org github.com github.com github.com github.com サイト…

2018-10-27

標本平均と不偏標本分散とか信頼区間をpythonでする

統計学 Python3

train.loc[train.paytype == 1, :].pa.sum() # pa人数 # cash (paytype=1) で払った人 train_iscash = train.paytype == 1 # cash 出払った人の割合の平均値の信頼区間 99% from statsmodels.stats.proportion import proportion_confint proportion_confint…

2018-10-03

subplot で、ax の xtick を傾ける

Python3

fig,ax_ = plt.subplots(nrows=10, ncols=2, figsize=(14, 20)) ax_ = ax_.ravel() for i in range(20): list_ = M_feature_inverse[i][:3] ax = ax_[i] for l in list_: all_df_tmp = all_df_.loc[all_df_['pk']==l, :].groupby('request_at_dt').size().re…

2018-09-21

House Priceの分析6

Python3 機械学習

大まかな流れを把握 --> 提出まで読み込み #import some necessary librairies import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) %matplotlib inline import matplotlib.pyplot as plt # Matl…

2018-09-20

wc -l のアクセス集計

Python3

wc -l のアクセス集計を pythonで集計した。 wc -l accesslog.* a = ''' 10914 accesslog.20180828010002 8636 accesslog.20180829010001 4742 accesslog.20180830010002 6399 accesslog.20180831010001 6901 accesslog.20180901010001 5503 accesslog.2018…

2018-09-14

House Priceの分析5

Python3 機械学習

前処理 import pandas as pd import numpy as np import seaborn as sns import matplotlib import matplotlib.pyplot as plt from scipy.stats import skew from scipy.stats.stats import pearsonr %config InlineBackend.figure_format = 'retina' #set '…

2018-09-07

MySQL から pandas.Dataframe へ読み込む

Python3

pandasから、mysqlに読み込む方法 import pandas as pd import MySQLdb def pd_dbread(table, columns_list): """ 接続サンプル """ # 接続する con = MySQLdb.connect( user='aaa', passwd='aaa', host='127.0.0.1', db='aaa', charset='utf8' ) # カーソル…

2018-09-07

ramdom でshuffleすると、return Noneになる問題

Python3

こうすればいいらしい。 >>> import random >>> x = ['foo', 'bar', 'black', 'sheep'] # O(N) operations・・・shuffle と同じロジック >>> random.sample(x, len(x)) ['bar', 'sheep', 'black', 'foo'] # O(NlogN) operation >>> sorted(x, key=lambda k: …

2018-09-07

House Priceの分析4

Python3 機械学習

XGBRegressorっていう、回帰モデルがあるので確認。そもそも xgboost が結構界隈では有名らしい。 import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import Imputer data = pd.read_csv('kaggle/kagg…

2018-08-29

pyplot の円グラフをいい感じに描く

機械学習 Python3

f,a = plt.subplots(nrows=5, ncols=2, figsize=(14, 20)) a = a.ravel() for idx,ax in enumerate(a): v_list = km_center[idx] df_timeband_meanrate = pd.DataFrame( { 'timeband': name_list, 'rate': v_list }, ) print(idx, np.bincount(y_km)[idx]) d…

2018-08-26

House Priceの分析2

Python3 機械学習

前処理 %matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import scipy.stats as stats import sklearn.linear_model as linear_model import seaborn as sns import xgboost as xgb # <-- アンサンブル学習に使…

2018-08-25

House Priceの分析1

機械学習 Python3

タスク Goal It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. Metric Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarit…

2018-08-03

pandasで円グラフ作成

Python3 機械学習

pandasでpltは直接できて便利 defaulte_fig_size = plt.rcParams["figure.figsize"] plt.rcParams["figure.figsize"] = [12.0, 10.0] # plt.figure() # fig, axes = plt.subplots(nrows=4, ncols=1, ) fig = plt.figure() ax1 = fig.add_subplot(221) ax1.ti…

2018-08-03

FFM の実装をtensorflowでもgitにあげている人いた。

Python3 機械学習

そろそろ使えるようになりたいなと。 github.com github.com

2018-08-03

tensflow で CNN を試す

機械学習 Python3

CNN のチュートリアルをやってみた。画像以外でも使いたい。 import numpy as np import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) train_data = mni…

2018-08-02

matplotlib で figureの大きさを変える方法

Python3

import math import numpy as np from matplotlib import pyplot fig = pyplot.figure(figsize=(12, 4)) pi = math.pi #mathモジュールのπを利用 x = np.linspace(0, 2*pi, 100) #0から2πまでの範囲を100分割したnumpy配列 y = np.sin(x) # adjustFigAspect(…

日に日に分からんことが増えていく…

φ(..)メモメモ

Python3

emsembling の有名記事

t-SNE の有名記事

Jupyter Notebook の tips

Hayperparameter tuning の便利サイト

mean encoding の方法、Kfold

rank, cluster のscoring 優良記事

binary の logloss と auc

pandas の visualization のライブラリ

pandas の最初に df を確認する関数

Python の文字列ハイライト機能

特徴エンジニアリングのおすすめブログ

pecentile で外れ値を調整する。

data frame のブログとして

一つのカラムから、複数カラムへ分割する

機械学習のライブラリ（Python）

標本平均と不偏標本分散とか信頼区間をpythonでする

subplot で、ax の xtick を傾ける

House Priceの分析6

wc -l のアクセス集計

House Priceの分析5

MySQL から pandas.Dataframe へ読み込む

ramdom でshuffleすると、return Noneになる問題

House Priceの分析4

pyplot の円グラフをいい感じに描く

House Priceの分析2

House Priceの分析1

pandasで円グラフ作成

FFM の実装をtensorflowでもgitにあげている人いた。

tensflow で CNN を試す

matplotlib で figureの大きさを変える方法