python の future の使い方

Python3 機械学習

import concurrent.futures import os score_list = [] def worker(my_random_seed): model = CatBoostClassifier( iterations=300, learning_rate=0.1, random_seed=my_random_seed ) model.fit( X_train, y_train, cat_features=cat_features, eval_set=(X…

2018-12-29

機械学習の黄色本について

機械学習

機械学習の黄色本 www.amazon.co.jp https://www.amazon.co.jp/dp/4621061240www.amazon.co.jp Web PDF https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf 演習問題 (www) の解答 h…

2018-12-29

emsembling の有名記事

Python3 機械学習

Kaggle Ensembling Guide | MLWave github.com github.com

2018-12-25

t-SNE の有名記事

Python3 機械学習

github.com Comparison of Manifold Learning methods — scikit-learn 0.20.2 documentation distill.pub lvdmaaten.github.io t-SNE: The effect of various perplexity values on the shape — scikit-learn 0.20.2 documentation Interaction Practical Le…

2018-12-25

Jupyter Notebook の tips

Python3 便利ツール・アプリ

www.dataquest.io おまけ kaggletils github.com

2018-12-21

CoreOSとかいうもの

Linux

アプリケーションは基本 container で立ち上げる前提のOS っぽい。便利そうだけど、複雑。 https://coreos.com/ignition/docs/latest/what-is-ignition.html

2018-12-21

--super-read-only の時に、MySQLをupdateする方法

MySQL

mysql> CREATE DATABASE `test` DEFAULT CHARSET utf8mb4; ERROR 1290 (HY000): The MySQL server is running with the --super-read-only option so it cannot execute this statement SET GLOBAL super_read_only= 0;

2018-12-20

docker の shell で変な表示になる問題

docker

$ docker exec -it jenkins env COLUMNS=200 LINES=50 TERM=xterm bash qiita.com

2018-12-19

CentOS の filesystem の記事

Linux

パート I. ファイルシステム - Red Hat Customer Portal

2018-12-15

Hayperparameter tuning の便利サイト

機械学習 Python3

3.2. Tuning the hyper-parameters of an estimator — scikit-learn 0.20.1 documentation fastml.com www.analyticsvidhya.com

2018-12-10

mean encoding の方法、Kfold

機械学習 Python3

import pandas as pd import numpy as np index_cols = ['shop_id', 'item_id', 'cnt'] global_mean = 0.2 df = pd.read_csv(filename) # groupby した gb = df.groupby(index_cols,as_index=False).agg({'cnt':{'target':'sum'}}) #fix column names gb.col…

2018-12-04

MySQL のトランザクション消化ステータス確認

MySQL

KILLED のプロセスが transaction 掴んで焦った話。KILLED を消すためにmysqldを強制終了すると、dead lock が発生するのでやめたほうがいい。以下のメッセージが出て追加deleteができなかった。。 transaction mysql Lock wait timeout exceeded; try rest…

2018-12-03

rank, cluster のscoring 優良記事

Python3 機械学習

rank https://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf The Lemur Project / Wiki / RankLib Learning to Rank Overviewwellecks.wordpress.com…

2018-12-03

binary の logloss と auc

Python3

Binary Class の測定 logloss l_pred = [0.5, 0.5, 0.5, 0.5] l_label = [0, 0, 0, 0] def logloss(l_pred, l_label): n = len(l_pred) score = 0 for t in range(n): i = l_pred[t] k = l_label[t] score += k * np.log(i) + (1 - k) * np.log(i) return - …

2018-11-26

validation 参考サイト

機械学習

3.1. Cross-validation: evaluating estimator performance — scikit-learn 0.20.1 documentation www.chioka.in

2018-11-22

syslogとか cronlogとか確認

Linux

OOM が出たときとかに syslog を確認ログファイル名内容 /var/log/messages 一般的なシステム関連のメッセージ /var/log/secure セキュリティに関するメッセージ /var/log/cron 定期的に実行される処理結果に関するメッセージ /var/log/maillog メールに関…

2018-11-21

本番サーバと検証サーバを間違えないように

Linux

.bashprofile に以下の設定をすると、色付きになる export PS1="\[\033[31m\]\u@\h\[\033[00m\]:\[\033[01m\]\w\[\033[00m\]\\$ "

2018-11-20

pandas の visualization のライブラリ

機械学習 Python3

seaborn: statistical data visualization — seaborn 0.9.0 documentation plot.ly github.com ggplot | Home NetworkX — NetworkX A demo of the Spectral Biclustering algorithm — scikit-learn 0.20.0 documentation

2018-11-19

pandas の最初に df を確認する関数

Python3

df.dtypes() df.info() df.value_counts() df.isnull() plt.scatter(x1, x2) pd.scatter_matrix(df) df.corr() plt.matshow(...) df.mean()sotr_values().plot(style='.')

2018-11-16

Python の文字列ハイライト機能

Python3

サジェストのハイライト機能をPython側で実装 import re # 前方一致 def hilight_apply_pre(word, _list): return [re.sub('^{}'.format(re.escape(word)), '<{0}>{1}</{0}>'.format('em', word), l, 1) for l in _list] # 部分一致 def hilight_apply_sub(word, _…

2018-11-13

特徴エンジニアリングのおすすめブログ

機械学習 Python3

特徴抽出 4.3. Preprocessing data — scikit-learn 0.20.0 documentation 特徴作成 machinelearningmastery.com What are some best practices in Feature Engineering? - Quora

2018-11-11

pecentile で外れ値を調整する。

Python3

numpy の clip でpercentile の上限下限で外れ値を調整する。 a = [1,2,3,4,1000,5,6,7,5,4] UPPER_BOUND, LOWER_BOUND = np.percentile(a, [1,99]) b = np.clip(a, UPPER_BOUND, LOWER_BOUND) print(b) [ 1.09 2. 3. 4. 910.63 5. 6. 7. 5. 4. ]

2018-11-11

data frame のブログとして

機械学習 Python3

機械学習のおすすめブログ Datas-frame tomaugspurger.github.io

2018-11-11

AWS スポットインスタンス

AWS

データ分析の際にAWSでスポットインスタンスを立ち上げる。 docs.aws.amazon.com datasciencebowl.com

2018-11-08

一つのカラムから、複数カラムへ分割する

Python3

date --> [day, month, year] のカラムに変更。 expand=true にして、rename すればいい。 date 02.01.2013 transactions[['day', 'month', 'year']] = transactions.date.str.split( '.', 2, expand=True ).rename(columns = {0:'day', 1:'month', 2:'year'…

2018-11-08

機械学習のライブラリ（Python）

Python3 機械学習

Python の機械学習の有名ライブラリのまとめ。ライブラリ scikit-learn: machine learning in Python — scikit-learn 0.20.0 documentation Overview — H2O 3.22.0.1 documentation www.tensorflow.org github.com github.com github.com github.com サイト…

2018-11-06

kibana に index登録

Elasticsearch kibana logstash

PUT /index { "mappings": { "doc": { "properties": { "id": {"type": "integer"}, "count": {"type": "integer"} } } } } GET /_cat/indices?v $ grep -rF 'pipeline.workers' config/logstash.yml # pipeline.workers: 2 pipeline.workers: 1 csv_pipelin…

2018-10-31

コマンド履歴をhistoryに残さない

Linux

history のコマンド履歴を消す。 echo 'a' echo 'b' echo 'c' set +o history echo 'd' set -o history history | tail -4 出力結果 685 echo 'b' 686 echo 'c' 687 set +o history 688 history | tail -4

2018-10-27

標本平均と不偏標本分散とか信頼区間をpythonでする

統計学 Python3

train.loc[train.paytype == 1, :].pa.sum() # pa人数 # cash (paytype=1) で払った人 train_iscash = train.paytype == 1 # cash 出払った人の割合の平均値の信頼区間 99% from statsmodels.stats.proportion import proportion_confint proportion_confint…

2018-10-26

kibana を公開するには、HTMLにiframeで埋め込めば良さそう

kibana

iframe で埋め込めば対応可能。あと、chromeでいろいろ確認するには、Elements を直接いじれば良さそう。 www.elastic.co

日に日に分からんことが増えていく…

φ(..)メモメモ

2018-01-01から1年間の記事一覧

python の future の使い方

機械学習の黄色本について

emsembling の有名記事

t-SNE の有名記事

Jupyter Notebook の tips

CoreOSとかいうもの

--super-read-only の時に、MySQLをupdateする方法

docker の shell で変な表示になる問題

CentOS の filesystem の記事

Hayperparameter tuning の便利サイト

mean encoding の方法、Kfold

MySQL のトランザクション消化ステータス確認

rank, cluster のscoring 優良記事

binary の logloss と auc

validation 参考サイト

syslogとか cronlogとか確認

本番サーバと検証サーバを間違えないように

pandas の visualization のライブラリ

pandas の最初に df を確認する関数

Python の文字列ハイライト機能

特徴エンジニアリングのおすすめブログ

pecentile で外れ値を調整する。

data frame のブログとして

AWS スポットインスタンス

一つのカラムから、複数カラムへ分割する

機械学習のライブラリ（Python）

kibana に index登録

コマンド履歴をhistoryに残さない

標本平均と不偏標本分散とか信頼区間をpythonでする

kibana を公開するには、HTMLにiframeで埋め込めば良さそう