2019-10-23

特徴量の自動選択

機械学習

xgboost の特徴量選択について、total_gain で特徴量の重要度を抽出。

programmer.ink

datanerd.hateblo.jp

2019-10-23

seaborn の distplot logscale

Python3

dispplot の y軸 log scale

sns.distplot(..., hist_kws={'log':True})

github.com

2019-10-20

スニーカーについて

スニーカー

気に入っているスニーカーメーカーが、SPINGLE って名前なのに今更知った。。メーカーとかそんな興味なかったんやけどな。

似てる靴なのに型番が違うのは何かが違うんだけど、違いがわからない。

多分、前はSPM-356履いてて、今はSPM-443履いてる。どっちも履き慣れるまで時間はかかったな。

www.spingle.jp

このブログに初めてIT系以外の事書いたけど、スニーカー結構ハマりそう。

2019-10-16

ZIPファイルから、skimage と PIL で画像を読み込む

Python3

zip を解凍せずに処理していきたい（Diskを圧迫するため）。

import pandas as pd
import numpy as np

import skimage.io
from PIL import Image, ImageFile

import io


def get_pil_inzip(_zip_path, image_idx):
    image_id = files_in_zip_dic[image_idx][0]
    image_path = os.path.join(image_id, 'images', files_in_zip_dic[image_idx][1]['images'][0])
    with ZipFile(_zip_path) as zf:
        with zf.open(image_path) as f:
            image = Image.open(f)
    return image


# 一度 BytesIO でバイナリファイルとして取り込む。
def get_skimage_inzip(_zip_path, image_idx):
    image_id = files_in_zip_dic[image_idx][0]
    image_path = os.path.join(image_id, 'images', files_in_zip_dic[image_idx][1]['images'][0])
    with ZipFile(_zip_path) as zf:
        with zf.open(image_path) as f:
            img_bin = io.BytesIO(f.read())
            image = skimage.io.imread(img_bin)
    return image

2019-10-16

pytorch の gather と scatter の理解

gather と scatter の理解が難しかったので、まとめた。

input を dim の方向に、arg 指定ごとに取得するイメージ。

# gather 
# torch.gather(input, dim, index, out=None, sparse_grad=False) → Tensor
out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2

stackoverflow.com

scatter_ はその逆。dim 方向に、self の indexに値を送る(これが散りばめるイメージ)。

self, index and src (if it is a Tensor) should have same number of dimensions. It is also required that index.size(d) <= src.size(d) for all dimensions d, and that index.size(d) <= self.size(d) for all dimensions d != dim.

# scatter
# scatter_(dim, index, src) → Tensor
self[index[i][j][k]][j][k] = src[i][j][k]  # if dim == 0
self[i][index[i][j][k]][k] = src[i][j][k]  # if dim == 1
self[i][j][index[i][j][k]] = src[i][j][k]  # if dim == 2

>>> x = torch.rand(2, 5)
>>> x
tensor([[ 0.3992,  0.2908,  0.9044,  0.4850,  0.6004],
        [ 0.5735,  0.9006,  0.6797,  0.4152,  0.1732]])
>>> torch.zeros(3, 5).scatter_(0, torch.tensor([[0, 1, 2, 0, 0], [2, 0, 0, 1, 2]]), x)
tensor([[ 0.3992,  0.9006,  0.6797,  0.4850,  0.6004],
        [ 0.0000,  0.2908,  0.0000,  0.4152,  0.0000],
        [ 0.5735,  0.0000,  0.9044,  0.0000,  0.1732]])

discuss.pytorch.org

例: One-Hot の実装で、 scatter_ を利用

dim=0 に、image_tensor(label) の one-hot する。ってイメージ。

image_tensorが mask などの場合は、mask の label が one-hot encoding される。

def onehot(image_tensor, n_clsses):
    h, w = image_tensor.size()
    onehot = torch.LongTensor(n_clsses, h, w).zero_()
    image_tensor = image_tensor.unsqueeze_(0)
    onehot = onehot.scatter_(0, image_tensor, 1)
    return onehot

onehot(torch.tensor([[1, 0], [0, 2]]), 3)

==> 

tensor([[[0, 1],
         [1, 0]],

        [[1, 0],
         [0, 0]],

        [[0, 0],
         [0, 1]]])

discuss.pytorch.org

src が指定されない場合は、value でも代替可能。

2019-10-08

pandas groupby した後のカラム

Python3

grouypby agg した後の、columns を階層1に統合。

tmp_kigo2.columns = ['_'.join(col) if col[1]!='' else col[0] for col in tmp_kigo2.columns]

2019-10-08

seaborn の figsize

Python3

seaborn で figsize 変更するには以下のメソッドが使いやすい。

g = sns.catplot(
    x='year',
    y='金額',
    hue='フラグ',
    data=df,
    kind='bar'
)
g.fig.set_size_inches(15, 4)
plt.show()

日に日に分からんことが増えていく…

φ(..)メモメモ

特徴量の自動選択

seaborn の distplot logscale

スニーカーについて

ZIPファイルから、skimage と PIL で画像を読み込む

pytorch の gather と scatter の理解

例: One-Hot の実装で、 scatter_ を利用

pandas groupby した後のカラム

seaborn の figsize