2019/1/27

Recurrent Neural Networks (RNN) 實作

29 Recurrent Neural Networks (RNN) 實作

這個單元來看看兩個RNN 實作的例子

► RNN for MINST 資料集
RNN 是適用於處理樣本前後相關聯的資料集, 而MINT 資料集樣本彼此間是不相關的,但因此資料集比較簡單,所以也用來練習 RNN 也不錯。參考文獻裡的例子,把feature裡的一維當成time step,另一維當成input size,此28 個features 再轉成150 個cells的數值,在最後一級再用展平層把150 個數值轉成10 個數值,並和目標的label 比較求 loss。系統的示意圖如下,
RNN 處理MINST 資料集

首先用之前討論過的方法讀進此資料集,
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from keras.datasets import mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255.0
X_test /= 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
這個例子中我們要用的label是整數,所以不用轉成one-hot ,只要轉成int32 就可以了,但是features 還是要轉成實數。
tf.reset_default_graph()
n_steps = 28
n_inputs = 28
n_neurons = 150
n_outputs = 10
learning_rate = 0.001
n_epochs = 10 #加大此值試試性能改善情形
batch_size = 150

train_batches = int(X_train.shape[0]/batch_size)
test_batches = int(X_test.shape[0]/batch_size)
x, y = tf.placeholder(tf.float32, shape=
                      [None,28,28]), tf.placeholder(tf.int32, shape=[None])
dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(
    batch_size).repeat()
dataset = dataset.shuffle(400, seed=np.random.randint(0, 1024))
iter = dataset.make_initializable_iterator()
feature, label = iter.get_next()
這裡我們用Dataset 抽取樣本,用法和先前單元討論的一樣。
#basic_cell = tf.keras.layers.SimpleRNNCell(units=n_neurons, activation=None)
# basic_cell = tf.nn.rnn_cell.BasicRNNCell(num_units=n_neurons)
basic_cell = tf.nn.rnn_cell.GRUCell(num_units=n_neurons)
outputs, states = tf.nn.dynamic_rnn(basic_cell, feature, dtype=tf.float32)
logits = tf.layers.dense(states, n_outputs)
xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=label,
                                                          logits=logits)
loss = tf.reduce_mean(xentropy)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
correct = tf.nn.in_top_k(logits, label, 1)
accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
第30行註解若拿掉改把註解加到第32行,則是用之前討論過的keras裡的SimpleRNNCell,我用此cell試過,性能不是很好。相似的,第31行是指用被棄置的BasicRNNCell,用cell試性能還不錯。第35行sparse_softmax 中的參數labels是整數,而logits是被cells被展平後的data輸入,此函數的labels 不用one-hot編碼,對輸出標籤種類寺別多的資料較具效率,所以函數名稱中有個sparse。

其他較重要的是第40行,in_top_k(logits, label, 1),label 是0~9的一個整數,此函數是求logits 裡每一個記錄的輸出前幾個最大的(此例中的第3參數1表示前1個最大的,若改成2則是前兩個最大的)index 是否包含label 這個整數,若有則傳回True 否則是False。此例中logits有10個值,若是label所指的index是10個數中的最大則表示估計正確,否則是錯誤。第41行叫用第40行並將結果轉型成實數並求平均。
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #sess.run(iter.initializer, feed_dict={ x: X_train, y: y_train})
    for epoch in range(n_epochs):
        sess.run(iter.initializer, feed_dict={ x: X_train, y: y_train})
        acc_total = 0.0
        for batch in range(train_batches):
            sess.run(training_op)
            acc_batch = accuracy.eval()
            acc_total  += acc_batch
        acc_epoch = acc_total/train_batches
        print("Epoch: {0:04d}   train accuracy = {1:0.6f}".format(epoch,acc_epoch))
    sess.run(iter.initializer, feed_dict={ x: X_test, y: y_test})    
    acc_total = 0.0
    for batch in range(test_batches):                                                                                      
        acc_batch = accuracy.eval()
        acc_total  += acc_batch
    acc_test = acc_total/test_batches
    print("test accuracy = {0:0.6f}".format(acc_test))
Epoch: 0000   train accuracy = 0.826133
Epoch: 0001   train accuracy = 0.959433
Epoch: 0002   train accuracy = 0.973417
Epoch: 0003   train accuracy = 0.979283
Epoch: 0004   train accuracy = 0.982917
Epoch: 0005   train accuracy = 0.985217
Epoch: 0006   train accuracy = 0.986600
Epoch: 0007   train accuracy = 0.987300
Epoch: 0008   train accuracy = 0.988233
Epoch: 0009   train accuracy = 0.988983
test accuracy = 0.987273
試著用GRUCell,由結果看來經過5個epoch後性能已經是不錯了,到了第10個epoch 性能是0.9889,已直逼0.99,但用test set 測試性能只有 0.987。若用BasicRNNCell性能稍差,但還是可達到0.98,但epoch 要大一點,可能是50~60之類的。但不知為什麼用SimpleRNNCell時性能不是很好,約只有0.9,增加epoch 對性能也沒有改善。另外此例中我沒有試RSTMCell,若要用此cell ,第34行的一個參數要改成 states[1],因為RSTMCell 有兩個states,第33行傳回的 states是一個tuple,含 c_states 和 h_states,用[1]取出h_states。

► RNN for Time Series (時間序列 )
在上一單元最後我們產生了時間序列函數 ,在這個單元我們要用那函數來預測下一個time step的值。在整個序列中我們每次隨機取出21個,前20個當成train data 後20個當成target,也就是當輸入20個連續的序列時,我們要讓RNN 預測下一個time step的數值。
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
# %matplotlib inline
t_min, t_max = 0, 30
resolution = 0.1
def time_series(t):
    return t * np.sin(t) / 3 + 2 * np.sin(t*5)

def next_batch(batch_size, n_steps):
    n_pitch = n_steps + 1
    t = np.linspace(t_min, t_max, int((t_max - t_min) / resolution))
    series = time_series(t)
    for i in range(batch_size):
        ti = np.random.randint(300-n_pitch)
        tt = np.arange(ti,ti+n_pitch)
        series_i =series[tt]
        series_i = series_i.reshape([n_pitch,1])
        if i == 0:
            t_data = series_i
        else:
            t_data = np.concatenate((t_data,series_i),axis = 1)
    t_data = t_data.T
    x_data = t_data[:,:-1]
    y_data = t_data[:,1:]
    x_data = x_data.reshape([batch_size, n_steps, 1])
    y_data = y_data.reshape([batch_size, n_steps, 1])
    return (x_data, y_data)
我們定義了兩個函數,一是第7行的時間序列,傳回序列的值,另一是第10行的next_batch,傳回x_data 和y_data,y_data 是x-data 向前位移一個 time step的值,也就是要預測的值。我閃先是產生21個序列,並在第24及25行分別取出前和後20的值。
tf.reset_default_graph()
n_steps = 20
n_inputs = 1
n_neurons = 100
n_outputs = 1
n_iterations = 1000
batch_size = 50
X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])
cell = tf.contrib.rnn.OutputProjectionWrapper(
    tf.nn.rnn_cell.GRUCell(num_units=n_neurons, activation=tf.nn.relu),
    output_size=n_outputs)
outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)
learning_rate = 0.001

loss = tf.reduce_mean(tf.square(outputs - y)) # MSE
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)
init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
    init.run()
    for iteration in range(n_iterations):
        X_batch, y_batch = next_batch(batch_size, n_steps)
        sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        if iteration % 100 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE:", mse)
    saver.save(sess, "./time_series_model") 
0  MSE: 17.73803
100  MSE: 1.107656
200  MSE: 0.51062953
300  MSE: 0.3362114
400  MSE: 0.198908
500  MSE: 0.08236814
600  MSE: 0.056760896
700  MSE: 0.041651376
800  MSE: 0.046033528
900  MSE: 0.047433507
第39行可看出季們是用GRUCell,第48行把init 後的變數設定到saver 以供儲存,並於 第57行儲存在當夢目錄的time_series_model變數裡,之後當sess 結束後我們還可叫出變數來做預測測試。
batch_size = 1
X_batch, y_batch = next_batch(batch_size, n_steps)
with tf.Session() as sess:                          
    saver.restore(sess, "./time_series_model")   
    y_pred = sess.run(outputs, feed_dict={X: X_batch})
print("Predicted:", y_pred )
print("Predicted:", y_batch )
INFO:tensorflow:Restoring parameters from ./time_series_model
Predicted: [[[ 4.360943  ]
  [ 4.5571156 ]
  [ 3.5922024 ]
  [ 2.104814  ]
  [ 0.35122305]
  [-1.3605688 ]
  [-2.8866918 ]
  [-3.9836164 ]
  [-4.6160665 ]
  [-4.7818694 ]
  [-4.5697036 ]
  [-4.1823564 ]
  [-3.925632  ]
  [-3.9739323 ]
  [-4.4331884 ]
  [-5.2809978 ]
  [-6.3555207 ]
  [-7.7230587 ]
  [-8.839774  ]
  [-9.604588  ]]]
True Values: [[[ 4.65731254]
  [ 4.26439926]
  [ 3.34959886]
  [ 1.97477496]
  [ 0.30993767]
  [-1.40698252]
  [-2.92704771]
  [-4.0499039 ]
  [-4.67138264]
  [-4.80740013]
  [-4.58828222]
  [-4.22493202]
  [-3.95519571]
  [-3.98366173]
  [-4.42974611]
  [-5.29687408]
  [-6.47036788]
  [-7.74457543]
  [-8.87256697]
  [-9.62616383]]]
於第58行把batch_size 改成1,因我們只要測試一段20個樣本的資料,第61行取出儲存的變數,第62行算出y_pred,之後列印出來,也列印出真正的y 值以供比較,可以看出預測的值和實際的值是相當接近。

參考文獻
Aurélien Géron, Hands-On Machine Learning with Scikit-Learn and TensorFlow, O'Reilly, 2017

沒有留言:

張貼留言