Transformer の pytorch での実装してるサイトのメモ - 日に日に分からんことが増えていく…

nlp.seas.harvard.edu

Label scaling と temperature scaling があり、予測結果のoverconfidence を抑制する。

.unsqueeze(1) は縦長。scatter_で one-hot的に、置換している。

# true_dist.shape == (n, d)
# target.shape == (n, )

true_dist.scatter_(1, target.data.unsqueeze(1), self.confidence)

[0, x / d, 1 / d, 1 / d, 1 / d] は、multiclass の overconfidence を表現している。xが大きくなるにつれて overconfidenceする。

crit = LabelSmoothing(5, 0, 0.1)
def loss(x):
    d = x + 3 * 1
    predict = torch.FloatTensor([[0, x / d, 1 / d, 1 / d, 1 / d],
                                 ])
    #print(predict)
    return crit(Variable(predict.log()),
                 Variable(torch.LongTensor([1]))).data[0]
plt.plot(np.arange(1, 100), [loss(x) for x in range(1, 100)])
None

beam search の実装

github.com