models の CharField に regex の判定を追加

alphanumeric の RegexValidator で追加。

alphanumeric = RegexValidator(r'^[0-9a-zA-Z]*$', 'Only alphanumeric characters are allowed.')

name = models.CharField(max_length=50, blank=True, null=True, validators=[alphanumeric])
email = models.EmailField(max_length=50, unique=True, validators=[alphanumeric])

stackoverflow.com

Django の models に対応したテーブルを MySQL から grep する方法

以下のコマンドで、取り出す。

mysql -uroot -N information_schema -e "select table_name from tables where table_schema = 'tablename' and table_name like 'prefix_%'" > table.txt

tf の mnist をニューラルネットワークで分析

正解率が90%と低めに出た。。原因は今度調べよう

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
mnist.train.images.shape

=>
(55000, 784)
n_feature = mnist.train.images.shape[1]
y_onehot = mnist.train.labels
n_classes = 10
random_seed = 123
np.random.seed(random_seed)

g = tf.Graph()
with g.as_default():
    tf.set_random_seed(random_seed)
    tf_x = tf.placeholder(
        dtype=tf.float32,
        shape=(None, n_features),
        name='tf_x'
    )
    tf_y = tf.placeholder(
        dtype=tf.int32,
        shape=(None, n_classes),
        name='tf_y'
    )
    h1 = tf.layers.dense(
        inputs=tf_x,
        units=50,
        activation=tf.tanh,
        name='layer1',
    )
    h2 = tf.layers.dense(
        inputs=h1,
        units=50,
        activation=tf.tanh,
        name='layer2',
    )
    logits = tf.layers.dense(
        inputs=h2,
        units=10,
        activation=None,
        name='layer3'
    )
    predictions = {
        'classes': tf.argmax(logits, axis=1, name='predicted_classes'),
        'probabilities': tf.nn.softmax(logits, name='softmax_tensor')
    }

with g.as_default():
    cost = tf.losses.softmax_cross_entropy(
        onehot_labels=tf_y,
        logits=logits
    )
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(loss=cost)
    init_op = tf.global_variables_initializer()

sess = tf.Session(graph=g)
sess.run(init_op)

training_costs = []

for epoch in range(50):
    training_loss = []
    batch_size = 128
    for i in range( (mnist.train.images.shape[0] // batch_size) + 1):
        batch_xs, batch_ys = mnist.train.next_batch(batch_size, shuffle=True)
        feed = {tf_x: batch_xs, tf_y: batch_ys}
        _, batch_cost = sess.run([train_op, cost], feed_dict=feed)
        training_costs.append(batch_cost)
    print(' -- Epoch %2d   Avg. Training Loss: %.4f' % (epoch + 1, np.mean(training_costs)))
 -- Epoch  1   Avg. Training Loss: 2.2021
 -- Epoch  2   Avg. Training Loss: 2.0037
 -- Epoch  3   Avg. Training Loss: 1.8452
 -- Epoch  4   Avg. Training Loss: 1.7153
 -- Epoch  5   Avg. Training Loss: 1.6072
 -- Epoch  6   Avg. Training Loss: 1.5165
 -- Epoch  7   Avg. Training Loss: 1.4384
 -- Epoch  8   Avg. Training Loss: 1.3709
 -- Epoch  9   Avg. Training Loss: 1.3118
 -- Epoch 10   Avg. Training Loss: 1.2595
 -- Epoch 11   Avg. Training Loss: 1.2130
 -- Epoch 12   Avg. Training Loss: 1.1712
 -- Epoch 13   Avg. Training Loss: 1.1335
 -- Epoch 14   Avg. Training Loss: 1.0990
 -- Epoch 15   Avg. Training Loss: 1.0678
 -- Epoch 16   Avg. Training Loss: 1.0392
 -- Epoch 17   Avg. Training Loss: 1.0125
 -- Epoch 18   Avg. Training Loss: 0.9880
 -- Epoch 19   Avg. Training Loss: 0.9652
 -- Epoch 20   Avg. Training Loss: 0.9441
 -- Epoch 21   Avg. Training Loss: 0.9241
 -- Epoch 22   Avg. Training Loss: 0.9055
 -- Epoch 23   Avg. Training Loss: 0.8881
 -- Epoch 24   Avg. Training Loss: 0.8716
 -- Epoch 25   Avg. Training Loss: 0.8562
 -- Epoch 26   Avg. Training Loss: 0.8416
 -- Epoch 27   Avg. Training Loss: 0.8278
 -- Epoch 28   Avg. Training Loss: 0.8146
 -- Epoch 29   Avg. Training Loss: 0.8021
 -- Epoch 30   Avg. Training Loss: 0.7902
 -- Epoch 31   Avg. Training Loss: 0.7789
 -- Epoch 32   Avg. Training Loss: 0.7683
 -- Epoch 33   Avg. Training Loss: 0.7579
 -- Epoch 34   Avg. Training Loss: 0.7480
 -- Epoch 35   Avg. Training Loss: 0.7385
 -- Epoch 36   Avg. Training Loss: 0.7295
 -- Epoch 37   Avg. Training Loss: 0.7208
 -- Epoch 38   Avg. Training Loss: 0.7125
 -- Epoch 39   Avg. Training Loss: 0.7046
 -- Epoch 40   Avg. Training Loss: 0.6969
 -- Epoch 41   Avg. Training Loss: 0.6894
 -- Epoch 42   Avg. Training Loss: 0.6823
 -- Epoch 43   Avg. Training Loss: 0.6754
 -- Epoch 44   Avg. Training Loss: 0.6687
 -- Epoch 45   Avg. Training Loss: 0.6623
 -- Epoch 46   Avg. Training Loss: 0.6560
 -- Epoch 47   Avg. Training Loss: 0.6500
 -- Epoch 48   Avg. Training Loss: 0.6442
 -- Epoch 49   Avg. Training Loss: 0.6386
feed = {tf_x: mnist.test.images,}
y_pred = sess.run(predictions['classes'], feed_dict=feed)

y_pred
=>
array([7, 2, 1, ..., 4, 5, 6])

y_test = np.argmax(mnist.test.labels, axis=1)

100 * np.sum(y_pred == y_test) / y_test.shape[0]
=>
90.77

Solrのパフォーマンスチューニング

JVM Settings | Apache Solr Reference Guide 6.6

SolrPerformanceFactors - Solr Wiki

ShawnHeisey - Solr Wiki

word2vecすごいぞ

結構すごい。。表記ゆれとかも吸収できそう。

from gensim.models import word2vec
ls = []
for row in df_id['review_comment'].values[:100000]:
    ls.append(_split_to_rawwords(row))
model = word2vec.Word2Vec(ls, size=500, window=5, min_count=5, workers=4)

model.wv.most_similar(positive=['エアコン'])
...

model.save("./review.model")
model = word2vec.Word2Vec.load("./review.model")

deepage.net

radimrehurek.com

towardsdatascience.com