ì´ìŠ¤íŠ¸ì†Œí”„íŠ¸ AI Plus Lab * 컴퓨터 ë¹„ì „ * ìžì—°ì–´ 처리 * 악성코드 íƒì§€ * 금융시장 예측 [[Tableofcontents]] == 딥러ë‹ì˜ ì›ë¦¬ == === 딥러ë‹ì˜ ì´í•´ === * ì¸ê³µì§€ëŠ¥ * 사람만í¼ì´ë‚˜ 사람보다 똑똑한 기계 * ë¨¸ì‹ ëŸ¬ë‹ * ë” ë§Žì€ ë°ì´í„°ë¥¼ ë” ë¹ ë¥´ê²Œ ì²˜ë¦¬í• ìˆ˜ 있지만, 처리 ì•Œê³ ë¦¬ì¦˜ì€ ì‚¬ëžŒì´ ë§Œë“¤ì–´ì•¼ 한다 * 사람보다 똑똑한 기계를 만들기 위해 사람(ìƒë¬¼)ì´ í•™ìŠµí•˜ëŠ” ë°©ë²•ì„ ëª¨ë¸ë§ * ë°ì´í„°ë¡œë¶€í„° 경험ì , 귀납ì 으로 만들어낸 ì•Œê³ ë¦¬ì¦˜ì— ì˜í•´ ì¶”ë¡ * feature와 label * ë” ì¢‹ì€ featureë“¤ì„ ì¶”ì¶œí• ìˆ˜ 있다면, ë”ìš± 쉽게 ë¶„ë¥˜í• ìˆ˜ 있다 * feature engineering * ë”¥ëŸ¬ë‹ ì´ì „ ë¨¸ì‹ ëŸ¬ë‹ì—서 반드시 수행ë˜ì–´ì•¼í•˜ëŠ” 작업 * ì‚¬ëžŒì´ ì •ë‹µì„ ë” ìž˜ 찾기 위한 íŠ¹ì§•ì„ ì°¾ì•„ë‚´ê³ (->ë„ë©”ì¸ ì „ë¬¸ê°€) * ì‚¬ëžŒì´ ë°ì´í„°ì—서 그런 íŠ¹ì§•ì„ ì‹ë³„í•˜ê³ ì¶”ì¶œí•˜ëŠ” ì•Œê³ ë¦¬ì¦˜ì„ ì„¤ê³„í•˜ëŠ” 것 (->ì•Œê³ ë¦¬ì¦˜) * ë‹¨ì¼ ì¸µìœ¼ë¡œ 분류가 불가능하다면, 분류가 ê°€ëŠ¥í• ë•Œê¹Œì§€ ê³µê°„ì„ ë³€í˜•ì‹œí‚¤ë©´ ëœë‹¤ * 즉, 다중 ì¸µì„ ì‚¬ìš©í•˜ë©´ ëœë‹¤ * ë”¥ëŸ¬ë‹ * 여러 ì¸µì„ í†µì§¸ë¡œ 학습시키면(ì°¨ì›ì´ 늘어나면) * ì‚¬ëžŒì´ ì°¾ì„ ìˆ˜ 있는 특징보다 ë” ì¢‹ì€ íŠ¹ì§•ì„ ì°¾ì•„ 분류를 í• ìˆ˜ 있다 * ì´ íŠ¹ì§•ì€ ì‚¬ëžŒì´ ì´í•´í•˜ê¸°ì—는 ì–´ë ¤ìš¸ 수 있다 * ë” ì–´ë µê³ ë³µìž¡í•œ ë¬¸ì œë¥¼ í•´ê²°í• ìˆ˜ ìžˆëŠ”ë° ì´ë¥¼ ìˆ˜ìš©ë ¥ì´ë¼ 함 * 딥러ë‹ì€ ë¨¸ì‹ ëŸ¬ë‹ ë” ë„“ì€ ìˆ˜ìš©ë ¥ì„ ê°€ì§€ëŠ” 학습 방법 * í•™ìŠµí• ë°ì´í„°ê°€ 충분히 많아야 ë”ìš± ì •í™•í•˜ê³ ë³µìž¡í•œ íŠ¹ì§•ì„ ì¶”ì¶œí• ìˆ˜ 있게 ëœë‹¤ * 딥러ë‹ì— 필요한 것들 * ì„ í˜•ëŒ€ìˆ˜í•™ * í™•ë¥ ê³¼ 통계 * 다변수 미ì ë¶„ * ì•Œê³ ë¦¬ì¦˜, 최ì í™” * 기타 등등 * ì •ë³´ì´ë¡ (엔트로피 ..) * 기타 수학ì ì´ë¡ 들 === 딥러ë‹ì˜ ì›ë¦¬ 1 - similarity === * 딥러ë‹ì—서는 스스로 ì¢‹ì€ feature를 찾아가기 ë•Œë¬¸ì— ìµœëŒ€í•œ ì†ì‹¤ì´ 없는 ë°ì´í„°ë¥¼ 사용하는 ê²ƒì´ ìœ ë¦¬í•˜ë‹¤ * ë²¡í„°ì˜ ìœ ì‚¬ë„를 나타내는 방법 * ìœ í´ë¦¬ë“œ 거리(ë‘ ë²¡í„°ì˜ ì°¨ì´) * ì½”ì‚¬ì¸ ìœ ì‚¬ë„(ë‘ ë²¡í„°ì˜ ë‚´ì ) * 하지만 ìœ ì‚¬í•œ ë°ì´í„°ë¼ê³ 해서 ìœ ì‚¬í•œ ë²¡í„°ì¸ ê²ƒì€ ì•„ë‹ˆë‹¤ * ì´ë¥¼ 해결하는 ê²ƒì´ ë¨¸ì‹ ëŸ¬ë‹ì˜ 핵심ì ì¸ ëª©êµ * ìž…ë ¥ì¸µ 벡터를 ìœ ì‚¬í•˜ê²Œ -> ìž…ë ¥ ë°ì´í„°ì˜ 모ë¸ë§ * 마지막 ì€ë‹‰ 층 벡터를 ìœ ì‚¬í•˜ê²Œ -> ë„¤íŠ¸ì›Œí¬ ëª¨ë¸ë§ * Supervised Learning * ë¶„ë¥˜ì˜ ì •ë‹µì¸ í•¨ìˆ˜ ê°’ì„ í†µí•´ ì–´ë–¤ ë°ì´í„°ë“¤ì´ 분류가 같아야 í•˜ê³ , 분류가 달ë¼ì•¼í•˜ëŠ”ì§€ * ê¸°ì¤€ì„ ê°€ë¥´ì¹˜ëŠ” 것 * 뉴럴 네트워í¬ë¥¼ 학습한다는 것 * 학습한다는 ê²ƒì€ ì •ë‹µê³¼ì˜ ì˜¤ì°¨ë¥¼ 줄ì´ëŠ” 방향으로 ëª¨ë¸ íŒŒë¼ë¯¸í„°ë¥¼ ìˆ˜ì •í•´ 나가는 것 * í•™ìŠµì´ ìž˜ ëœ í›„ì—는 ì–´ë–¤ ìž…ë ¥ì— ëŒ€í•´ 강하게 ë°˜ì‘í•˜ë ¤ë©´ 가중치가 ê·¸ 벡터와 ìœ ì‚¬í•´ 진다 * 모ë¸ë§ * ë” ì ì€ ë°ì´í„°ë¥¼ ê°€ì§€ê³ ë” ì 합한 í•™ìŠµì„ ì‹œí‚¬ 수 있ë„ë¡í•˜ëŠ” ê²ƒì´ ëª¨ë¸ë§ * ex) CNN * ì‘ìš© * 변종 ì•…ì„±ì½”ë“œì˜ íƒì§€ì™€ 분류 * ì•…ì„±ì½”ë“œì˜ ìœ ì‚¬ë„ * 메타ë°ì´í„°ê°€ ìœ ì‚¬í•˜ë‹¤ * íŒŒì¼ êµ¬ì¡°ì™€ ë‚´ìš©ì´ ìœ ì‚¬ * 코드 패턴, 행위 íŒ¨í„´ì´ ìœ ì‚¬ * 사용하는 APIê°€ ìœ ì‚¬ * 개발ìžì˜ ìŠµê´€ì´ ìœ ì‚¬ * 노리는 대ìƒì´ë‚˜ 취약ì ì´ ìœ ì‚¬ * í†µì‹ í•˜ëŠ” 대ìƒì´ë‚˜ íŒ¨í‚·ì´ ìœ ì‚¬ * ì••ì¶•, 난ë…í™”, 암호화, 패키지 ë°©ì‹ì´ ìœ ì‚¬ * 악성 여부? * ì •ë§ ëª¨ë“ ì•…ì„± 코드를 ìœ ì‚¬í•œê°€? * ì •ë§ ëª¨ë“ ì •ìƒ ì½”ë“œëŠ” ìœ ì‚¬í•œê°€? * ì •ë§ ì •ìƒì½”드와 악성코드는 ìœ ì‚¬í•˜ì§€ 않ì€ê°€? * íƒì§€ëª… * 보안 ë¶„ì„ê°€ë“¤ì´ ì˜¤ëžœ 기간ë™ì•ˆ 악성코드를 분류 * ìœ ì‚¬í•œ 악성코드를 ê°™ì€ íƒì§€ëª…으로 ë¶„ë¥˜í• ê¹Œ? * ê°™ì€ íƒì§€ëª…으로 ë¶„ë¥˜ëœ ì•…ì„±ì½”ë“œë“¤ì€ ìœ ì‚¬í• ê¹Œ? * í•˜ë‚˜ì˜ ì•…ì„±ì½”ë“œëŠ” í•˜ë‚˜ì˜ íƒì§€ëª…으로만 ë¶„ë¥˜í• ìˆ˜ 있ì„까? * íƒì§€ëª…ì„ label로 í•™ìŠµì„ ì‹œí‚¨ë‹¤ 하ë”ë¼ë„ * ì‹ ì¢… ì•…ì„±ì½”ë“œì˜ íƒì§€ì™€ 분류 * 학습한 ë°ì´í„°ì˜ ìœ ì‚¬í•œ 악성코드가 없다 * 학습한 í›ˆë ¨ ë°ì´í„°ì— ê°™ì€ ë¶„ë¥˜ë¡œ ë¶„ë¥˜ëœ ì•…ì„±ì½”ë“œê°€ 없다 === 딥러ë‹ì˜ ì›ë¦¬ 2 - Probability === * 변별 ëª¨ë¸ * classification, regression * ìž…ë ¥ ë°ì´í„° xê°€ 주어질 때, 간단한 ì‘답 y를 ê²°ì • * ì¡°ê±´ë¶€ í™•ë¥ ì„ ì‚¬ìš© * ì¡°ê±´ë¶€ í™•ë¥ ì˜ í•¨ì • * ì¡°ê±´ë¶€ í™œë¥ ë’¤ì§‘ê¸° * 수 ë§Žì€ ê²½ìš°ì˜ ìˆ˜ê°€ 존재하므로 ì§ì ‘ 계산하는 ê²ƒì€ êµ‰ìž¥ížˆ ì–´ë µë‹¤ * ê²°í•© í™•ë¥ * x,yê°€ ë™ì‹œì— ì¼ì–´ë‚ í™•ë¥ * ë² ì´ì¦ˆ ì •ë¦¬ì— ì˜í•´ 계산 * ê²°í•© í™•ë¥ ì„ ê³„ì‚° * ìƒì„±ëª¨ë¸ * yë¼ëŠ” ì†ì„±ì„ 같는 그럴듯한 x를 ìƒì„± * 딥러ë‹ì„ 활용하면 yë¿ë§Œì•„ë‹ˆë¼ ëŒ€ëŸ‰ì˜ x로부터 숨겨진 특징 z를 찾아낼 수 있다 * 가르친 ì 없는 x를 ìƒì„± * ë”¥ëŸ¬ë‹ ê¸°ë°˜ ìƒì„±ëª¨ë¸ VAEs, GANs, Auto-Regressive model, DBNs, RBMs == ~~(딥)~~ëŸ¬ë‹ == [http://cs231n.stanford.edu/] ì´ë¯¸ì§€ 분류기 * ì´ë¯¸ì§€ ë°ì´í„°ë¥¼ ê° í”½ì…€ì— ëŒ€í•´ rgb로 표현 * 다른 시ê°ì—서, 다른 ì„±ì§ˆì„ ê°€ì§„ ê°™ì€ ë¬¼ì²´ë¥¼ 어떻게 ê°™ì€ label로 ë¶„ë¥˜í• ê²ƒì¸ê°€ * hard-code로는 분류기를 만드는 ê²ƒì´ ë¶ˆê°€ëŠ¥í•¨ === K Nearest Neighbor Classifier === * ì´ë¯¸ì§€ë¥¼ 학습하여 ì €ìž¥ * 새로 들어온 ì´ë¯¸ì§€ë¥¼ ì°¨ì´ê°€ 가장 ìž‘ì€ ê°’ì„ í†µí•´ 분류 * 비êµí•˜ëŠ” 방법 * 픽셀값으 ì ˆëŒ€ê°’ 거리 계산 * ì´ë¥¼ distance로 사용 * 분류 ì‹œê°„ì´ linear하게 ì¦ê°€ * CNN 모ë¸ì—서는 í•™ìŠµì‹œê°„ì€ ì¦ê°€í•˜ì§€ë§Œ, 분류가 ë¹ ë¦„ * hyper parameter * L1 distance vs L2 distance * K? * 실험ì 으로 hyper parameter를 ê²°ì • * 학습 ë°ì´í„° 중 ì¼ë¶€ë¥¼ validation data로 사용 * cross-validation === Linear Classification === * ì„ ì„ ê·¸ì–´ 분류하는 방법 * non-parametric approach * KNN 등 * parametric approach * ì„ ì˜ ê¸°ìš¸ê¸°ë¥¼ 사용하여 * f(x, W) -> 분류 * score function f = W*x + b = ì ìˆ˜ì˜ ë²¡í„° * loss function * 학습 ë°ì´í„°ì— 대해 unhappiness를 ì¸¡ì •í•˜ëŠ” 함수 * loss functionì„ ìž‘ê²Œ 만드는 ê³¼ì • -> optimization * ex) SVM loss function * softmax classifier * score = íŠ¹ì • ë¶„ë¥˜ì— ëŒ€í•œ normalized ë˜ì§€ ì•Šì€ ì¡°ê±´ë¶€ í™•ë¥ * loss function = -log(normalized score) * optimization * loss functionì„ ìµœì†Œí™”í•˜ëŠ” Wì„ êµ¬í•˜ëŠ” ê³¼ì • * random search * W를 ëžœë¤í•˜ê²Œ ìƒì„±í•˜ì—¬ ì°¾ì€ W 중 가장 loss ê°’ì„ ê°€ì§€ëŠ” W를 ì„ íƒ * ë‚®ì€ ì •í™•ë„ * gradient descent * 가능한 ìž‘ì€ h를 사용하여 gradient를 ì¸¡ì • * lossê°€ 작아지는 방향으로 ì ì§„ì 으로 ì ‘ê·¼ == ê³ ëž˜ ë“±ì— íƒœìš´ í…서플로 == https://goo.gl/M1zJVP 1. sudo docker rm -f tensorflow 1. sudo docker run -d --net=host --name tensorflow gcr.io/tensorflow/tensorflow 1. connect http://127.0.0.1:8888/ 1. try login 1. sudo docker logs tensorflow 1. http://localhost:8888/?token=00bca78fdb2b3ab7b23eab7105dc639dab72d021ecd5c54f <- copy link like this to browser/ ``` from __future__ import absolute_import from __future__ import division from __future__ import print_function import argparse import os import sys import time from six.moves import xrange import tensorflow as tf from tensorflow.examples.tutorials.mnist import input_data from tensorflow.examples.tutorials.mnist import mnist FLAGS = None def placeholder_inputs(batch_size): images_placeholder = tf.placeholder(tf.float32, shape=(batch_size, mnist.IMAGE_PIXELS)) labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size)) return images_placeholder, labels_placeholder def fill_feed_dict(data_set, images_pl, labels_pl): images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, FLAGS.fake_data) feed_dict = { images_pl: images_feed, labels_pl: labels_feed, } return feed_dict def do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_set): true_count = 0 steps_per_epoch = data_set.num_examples // FLAGS.batch_size num_examples = steps_per_epoch * FLAGS.batch_size for step in xrange(steps_per_epoch): feed_dict = fill_feed_dict(data_set, images_placeholder, labels_placeholder) true_count += sess.run(eval_correct, feed_dict=feed_dict) precision = float(true_count) / num_examples print(' Num examples: %d Num correct: %d Precision @ 1: %0.04f' % (num_examples, true_count, precision)) def run_training(): data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data) with tf.Graph().as_default(): images_placeholder, labels_placeholder = placeholder_inputs( FLAGS.batch_size) logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2) loss = mnist.loss(logits, labels_placeholder) train_op = mnist.training(loss, FLAGS.learning_rate) eval_correct = mnist.evaluation(logits, labels_placeholder) summary = tf.summary.merge_all() init = tf.global_variables_initializer() saver = tf.train.Saver() sess = tf.Session() summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph) sess.run(init) for step in xrange(FLAGS.max_steps): start_time = time.time() feed_dict = fill_feed_dict(data_sets.train, images_placeholder, labels_placeholder) _, loss_value = sess.run([train_op, loss], feed_dict=feed_dict) duration = time.time() - start_time if step % 100 == 0: print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)) summary_str = sess.run(summary, feed_dict=feed_dict) summary_writer.add_summary(summary_str, step) summary_writer.flush() if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps: checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt') saver.save(sess, checkpoint_file, global_step=step) print('Training Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.train) print('Validation Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.validation) print('Test Data Eval:') do_eval(sess, eval_correct, images_placeholder, labels_placeholder, data_sets.test) def main(_): if tf.gfile.Exists(FLAGS.log_dir): tf.gfile.DeleteRecursively(FLAGS.log_dir) tf.gfile.MakeDirs(FLAGS.log_dir) run_training() if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument( '--learning_rate', type=float, default=0.01, help='Initial learning rate.' ) parser.add_argument( '--max_steps', type=int, default=2000, help='Number of steps to run trainer.' ) parser.add_argument( '--hidden1', type=int, default=128, help='Number of units in hidden layer 1.' ) parser.add_argument( '--hidden2', type=int, default=32, help='Number of units in hidden layer 2.' ) parser.add_argument( '--batch_size', type=int, default=100, help='Batch size. Must divide evenly into the dataset sizes.' ) parser.add_argument( '--input_data_dir', type=str, default=os.path.join(os.getenv('TEST_TMPDIR', '/tmp'), 'tensorflow/mnist/input_data'), help='Directory to put the input data.' ) parser.add_argument( '--log_dir', type=str, default=os.path.join(os.getenv('TEST_TMPDIR', '/tmp'), 'tensorflow/mnist/logs/fully_connected_feed'), help='Directory to put the log data.' ) parser.add_argument( '--fake_data', default=False, help='If true, uses fake data for unit testing.', action='store_true' ) FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) ``` ``` from __future__ import absolute_import from __future__ import division from __future__ import print_function import math import tensorflow as tf NUM_CLASSES = 10 IMAGE_SIZE = 28 IMAGE_PIXELS = IMAGE_SIZE * IMAGE_SIZE def inference(images, hidden1_units, hidden2_units): with tf.name_scope('hidden1'): weights = tf.Variable( tf.truncated_normal([IMAGE_PIXELS, hidden1_units], stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))), name='weights') biases = tf.Variable(tf.zeros([hidden1_units]), name='biases') hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases) with tf.name_scope('hidden2'): weights = tf.Variable( tf.truncated_normal([hidden1_units, hidden2_units], stddev=1.0 / math.sqrt(float(hidden1_units))), name='weights') biases = tf.Variable(tf.zeros([hidden2_units]), name='biases') hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases) with tf.name_scope('softmax_linear'): weights = tf.Variable( tf.truncated_normal([hidden2_units, NUM_CLASSES], stddev=1.0 / math.sqrt(float(hidden2_units))), name='weights') biases = tf.Variable(tf.zeros([NUM_CLASSES]), name='biases') logits = tf.matmul(hidden2, weights) + biases return logits def loss(logits, labels): labels = tf.to_int64(labels) cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits( labels=labels, logits=logits, name='xentropy') return tf.reduce_mean(cross_entropy, name='xentropy_mean') def training(loss, learning_rate): tf.summary.scalar('loss', loss) optimizer = tf.train.GradientDescentOptimizer(learning_rate) global_step = tf.Variable(0, name='global_step', trainable=False) train_op = optimizer.minimize(loss, global_step=global_step) return train_op def evaluation(logits, labels): correct = tf.nn.in_top_k(logits, labels, 1) return tf.reduce_sum(tf.cast(correct, tf.int32)) ``` == 딥러ë‹ê³¼ ìžì—°ì–´ 처리 == == ì¸ê³µì§€ëŠ¥ì˜ ë¯¸ëž˜ ==