python の future の使い方

Python3 機械学習

import concurrent.futures import os score_list = [] def worker(my_random_seed): model = CatBoostClassifier( iterations=300, learning_rate=0.1, random_seed=my_random_seed ) model.fit( X_train, y_train, cat_features=cat_features, eval_set=(X…

2018-12-29

機械学習の黄色本について

機械学習

機械学習の黄色本 www.amazon.co.jp https://www.amazon.co.jp/dp/4621061240www.amazon.co.jp Web PDF https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf 演習問題 (www) の解答 h…

2018-12-29

emsembling の有名記事

Python3 機械学習

Kaggle Ensembling Guide | MLWave github.com github.com

2018-12-25

t-SNE の有名記事

Python3 機械学習

github.com Comparison of Manifold Learning methods — scikit-learn 0.20.2 documentation distill.pub lvdmaaten.github.io t-SNE: The effect of various perplexity values on the shape — scikit-learn 0.20.2 documentation Interaction Practical Le…

2018-12-15

Hayperparameter tuning の便利サイト

機械学習 Python3

3.2. Tuning the hyper-parameters of an estimator — scikit-learn 0.20.1 documentation fastml.com www.analyticsvidhya.com

2018-12-10

mean encoding の方法、Kfold

機械学習 Python3

import pandas as pd import numpy as np index_cols = ['shop_id', 'item_id', 'cnt'] global_mean = 0.2 df = pd.read_csv(filename) # groupby した gb = df.groupby(index_cols,as_index=False).agg({'cnt':{'target':'sum'}}) #fix column names gb.col…

2018-12-03

rank, cluster のscoring 優良記事

Python3 機械学習

rank https://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf The Lemur Project / Wiki / RankLib Learning to Rank Overviewwellecks.wordpress.com…

2018-11-26

validation 参考サイト

機械学習

3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.20.1 documentation www.chioka.in

2018-11-20

pandas の visualization のライブラリ

機械学習 Python3

seaborn: statistical data visualization — seaborn 0.9.0 documentation plot.ly github.com ggplot | Home NetworkX — NetworkX A demo of the Spectral Biclustering algorithm — scikit-learn 0.20.0 documentation

2018-11-13

特徴エンジニアリングのおすすめブログ

機械学習 Python3

特徴抽出 4.3. Preprocessing data — scikit-learn 0.20.0 documentation 特徴作成 machinelearningmastery.com What are some best practices in Feature Engineering? - Quora

2018-11-11

data frame のブログとして

機械学習 Python3

機械学習のおすすめブログ Datas-frame tomaugspurger.github.io

2018-11-08

機械学習のライブラリ（Python）

Python3 機械学習

Python の機械学習の有名ライブラリのまとめ。ライブラリ scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation Overview — H2O 3.22.0.1 documentation www.tensorflow.org github.com github.com github.com github.com サイト…

2018-09-21

House Priceの分析6

Python3 機械学習

大まかな流れを把握 --> 提出まで読み込み #import some necessary librairies import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) %matplotlib inline import matplotlib.pyplot as plt # Matl…

2018-09-14

House Priceの分析5

Python3 機械学習

前処理 import pandas as pd import numpy as np import seaborn as sns import matplotlib import matplotlib.pyplot as plt from scipy.stats import skew from scipy.stats.stats import pearsonr %config InlineBackend.figure_format = 'retina' #set '…

2018-09-07

House Priceの分析4

Python3 機械学習

XGBRegressorっていう、回帰モデルがあるので確認。そもそも xgboost が結構界隈では有名らしい。 import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import Imputer data = pd.read_csv('kaggle/kagg…

2018-08-29

pyplot の円グラフをいい感じに描く

機械学習 Python3

f,a = plt.subplots(nrows=5, ncols=2, figsize=(14, 20)) a = a.ravel() for idx,ax in enumerate(a): v_list = km_center[idx] df_timeband_meanrate = pd.DataFrame( { 'timeband': name_list, 'rate': v_list }, ) print(idx, np.bincount(y_km)[idx]) d…

2018-08-26

House Priceの分析2

Python3 機械学習

前処理 %matplotlib inline import numpy as np import pandas as pd import matplotlib.pyplot as plt import scipy.stats as stats import sklearn.linear_model as linear_model import seaborn as sns import xgboost as xgb # <-- アンサンブル学習に使…

2018-08-25

House Priceの分析1

機械学習 Python3

タスク Goal It is your job to predict the sales price for each house. For each Id in the test set, you must predict the value of the SalePrice variable. Metric Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarit…

2018-08-03

pandasで円グラフ作成

Python3 機械学習

pandasでpltは直接できて便利 defaulte_fig_size = plt.rcParams["figure.figsize"] plt.rcParams["figure.figsize"] = [12.0, 10.0] # plt.figure() # fig, axes = plt.subplots(nrows=4, ncols=1, ) fig = plt.figure() ax1 = fig.add_subplot(221) ax1.ti…

2018-08-03

FFM の実装をtensorflowでもgitにあげている人いた。

Python3 機械学習

そろそろ使えるようになりたいなと。 github.com github.com

2018-08-03

tensflow で CNN を試す

機械学習 Python3

CNN のチュートリアルをやってみた。画像以外でも使いたい。 import numpy as np import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) train_data = mni…

2018-07-11

tf の mnist をニューラルネットワークで分析

機械学習 Python3

正解率が90%と低めに出た。。原因は今度調べよう import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) mnist.train.images.shape => (55000, 784) n_feat…

2018-07-05

word2vecすごいぞ

機械学習 Python3

結構すごい。。表記ゆれとかも吸収できそう。 from gensim.models import word2vec ls = [] for row in df_id['review_comment'].values[:100000]: ls.append(_split_to_rawwords(row)) model = word2vec.Word2Vec(ls, size=500, window=5, min_count=5, wor…

2018-07-05

LDA（Latent Dirichlet Allocation）でのトピック抽出でレビュー分析

機械学習 Python3

レビューの分析方法をまとめる。 import os import glob import sys from datetime import (datetime, date, timedelta) import logging import re import shutil import tempfile import pandas as pd import numpy as np from scipy.sparse.csc import csc…

2018-07-04

GCPでのレコメンド実装について

機械学習 Python3

Building a Recommendation System in TensorFlow: Overview | Solutions | Google Cloud

2018-06-30

行列の固有値の話

機械学習

まとまっている pdf を貼る実対称行列についてのまとめも記載 http://www.cs.shinshu-u.ac.jp/~maruyama/lin/pdf/lin09.pdf dora.bk.tsukuba.ac.jp おまけ（線形代数のわかりやすいサイト） oguemon.com

2018-06-22

FMとかFFMとかの論文

機械学習

レコメンドエンジンで使えそうな論文。 FM https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf FFM https://www.andrew.cmu.edu/user/yongzhua/conferences/ffm.pdf 参考までに。FM の実装 github.com "SLIM: Sparse Linear Methods for Top-N Rec…

2018-06-22

レコメンドシステムで、Explicit と Implicit とは（レコメンド例のSlide）

機械学習

レビューとかでユーザが明示的に評価したら、Explicit。 PVとかCVとかのlogで評価を判断するのがImplicit。らしい。 There are two ways to gather the data. The first method is to ask for explicit ratings from a user, typically on a concrete rating…

2018-06-15

レコメンドで参考にしたサイト

機械学習

dsnotes.com ebaytech.berlin netflix https://beta.vu.nl/nl/Images/werkstuk-fernandez_tcm235-874624.pdf 協調フィルタリング http://yifanhu.net/PUB/cf.pdf

2018-06-14

クラスタリング : DBSCAN の実装

機械学習 Python3

クラスタリングアルゴリズムの中で、クラスタが球状という前提を持たずに、クラスタラベルを割り当てる。 from sklearn.datasets import make_moons X, y = make_moons( n_samples=200, noise=0.05, random_state=0 ) plt.scatter(X[:, 0 ], X[:, 1]) plt.t…

日に日に分からんことが増えていく…

φ(..)メモメモ

機械学習

python の future の使い方

機械学習の黄色本について

emsembling の有名記事

t-SNE の有名記事

Hayperparameter tuning の便利サイト

mean encoding の方法、Kfold

rank, cluster のscoring 優良記事

validation 参考サイト

pandas の visualization のライブラリ

特徴エンジニアリングのおすすめブログ

data frame のブログとして

機械学習のライブラリ（Python）

House Priceの分析6

House Priceの分析5

House Priceの分析4

pyplot の円グラフをいい感じに描く

House Priceの分析2

House Priceの分析1

pandasで円グラフ作成

FFM の実装をtensorflowでもgitにあげている人いた。

tensflow で CNN を試す

tf の mnist をニューラルネットワークで分析

word2vecすごいぞ

LDA（Latent Dirichlet Allocation）でのトピック抽出でレビュー分析

GCPでのレコメンド実装について

行列の固有値の話

FMとかFFMとかの論文

レコメンドシステムで、Explicit と Implicit とは（レコメンド例のSlide）

レコメンドで参考にしたサイト

クラスタリング : DBSCAN の実装