[MLOps] MLflow ์‹œ์ž‘ & ๊ฐœ๋… ์ •๋ฆฌ

2022. 4. 14. 22:17ใ†๐Ÿ›  Data Engineering/MLOps

 

 

 

mlflow ๋กœ๊ณ 

 

 

 

 

์ตœ๊ทผ MLOps๋ž€ ๋‹จ์–ด๊ฐ€ ๋งŽ์ด ๋“ฑ์žฅํ–ˆ๋‹ค.

 

 

MLOps : ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์—์„œ ๊ธฐ๊ณ„ํ•™์Šต์„ ์•ˆ์ •์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์ผ๋ จ์˜ ๊ณผ์ •

MLOps = DevOps + Machine Learning

 

 

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ, ๋ถ„์„, ๋ชจ๋ธ๋ง์„ ์ฃผ๋กœ ๊ณต๋ถ€ํ•ด์™”์ง€๋งŒ, ์ตœ๊ทผ ๋“ค์–ด ์ด๋Ÿฐ ์‹œ์Šคํ…œ์˜ ํ•„์š”์„ฑ์„ ๋Š๊ผˆ๋‹ค.

๋งค ๋ถ„์„ ์ˆœ๊ฐ„๋งˆ๋‹ค ์ „์ฒ˜๋ฆฌ-EDA-FE-Modeling-๊ฒ€์ฆ ๊ณผ์ •์„ ๊ฑฐ์น˜๋ฉด ์‹œ๊ฐ„์ด ๋งŽ์ด ๋“ ๋‹ค.

๋ชจ๋ธ์„ ์ €์žฅํ•˜๊ณ , ๋‹ค์‹œ ์„œ๋น™ํ•˜๋Š” ๊ณผ์ •๋„ ๊ฐ„ํŽธํ™”๋  ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

๊ทธ๋Ÿฐ ์ƒ๊ฐ์˜ ํ๋ฆ„์—์„œ MLflow๋ฅผ ๊ณต๋ถ€ํ•˜๊ฒŒ ๋˜์—ˆ์œผ๋ฉฐ, ๊ณต๋ถ€ํ•œ ๊ฒƒ๋“ค์„ ๋ธ”๋กœ๊ทธ๋กœ ์ •๋ฆฌํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค.

 

 

 

 

[์ฐธ๊ณ ์ž๋ฃŒ]

 

https://github.com/mlflow/mlflow/

 

GitHub - mlflow/mlflow: Open source platform for the machine learning lifecycle

Open source platform for the machine learning lifecycle - GitHub - mlflow/mlflow: Open source platform for the machine learning lifecycle

github.com

 

https://mlflow.org/docs/latest/quickstart.html

 

Quickstart — MLflow 1.25.1 documentation

Downloading the Quickstart Download the quickstart code by cloning MLflow via git clone https://github.com/mlflow/mlflow, and cd into the examples subdirectory of the repository. We’ll use this working directory for running the quickstart. We avoid runni

mlflow.org

 


 

 

 

 

1. MLflow ๊ฐœ๋…


A Machine Learning Lifecycle Platform

๋จธ์‹ ๋Ÿฌ๋‹์„ ์‹คํ—˜ํ•˜๊ณ , ๋ฐฐํฌํ•˜๋Š” ๊ฒƒ์„ ์‰ฝ๊ฒŒ ๊ด€๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ์˜คํ”ˆ์†Œ์Šค

* GUI, CLI๋ฅผ ์ง€์›

 

 

 

MLflow Main Function

 

1) Tracking (ํŠธ๋ž˜ํ‚น)

์‹คํ—˜ ํ›„, ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์ €์žฅํ•œ๋‹ค.

 

2) Projects

๋จธ์‹ ๋Ÿฌ๋‹ ์ฝ”๋“œ๋ฅผ ์žฌ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๊ณ  ์žฌํ˜„ ๊ฐ€๋Šฅํ•œ ํ˜•ํƒœ๋กœ ๋งŒ๋“ ๋‹ค.

 

3) Models

๋ชจ๋ธ์„ ๊ด€๋ฆฌํ•˜๊ณ , ๋ฐฐํฌ, Serving, ์ถ”๋ก ํ•ฉ๋‹ˆ๋‹ค.

 

4) Model Registry

MLflow ๋ชจ๋ธ์˜ ์ „์ฒด ๋ผ์ดํ”„์‚ฌ์ดํด์„ ๊ด€๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ Centralized model store, UI.

MLflow์˜ ์ „ ๊ณผ์ •์„ ์‰ฝ๊ฒŒ ์ปจํŠธ๋กค ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ๋‚จ๊ฒผ๋‹ค.

 

 

 

REST API, CLI๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ๊ธฐ๋Šฅ์— ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ๋‹ค. API๋Š” Python, R, Java ์กด์žฌํ•œ๋‹ค.

๋Œ€๊ฒŒ ๋จธ์‹ ๋Ÿฌ๋‹์„ ๋Œ๋ ค๋ณธ ์‚ฌ๋žŒ์ด๋ผ๋ฉด ์•Œ ๊ฒƒ์ด๋‹ค. ์•„๋ฌด๋ฆฌ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž˜ ๊ตฌ์ถ•ํ•ด๋†“๊ณ  ์‹คํ—˜์„ ํ•ด๋„, ๋ชจ๋ธ๊ณผ ํŒŒ๋ผ๋ฏธ๋” ์ˆ˜๊ฐ€ ๋งŽ์•„์ง€๋ฉด ๋ฒˆ๊ฑฐ๋กœ์›Œ์ง„๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ๋ชจ๋ธ์€ ์˜ˆ์ธกํ•œ ํ›„, ๋ฒ„๋ ค์ง€๋‹ค์‹œํ”ผ ํ•œ๋‹ค. ๋ชจ๋ธ์„ ์ผ์ผ์ด save ํ•ด์„œ ๋ชจ์•„๋†“๋Š” ๊ฒƒ๋„ ๊ณจ์น˜ ์•„ํ”„๋‹ค. ์ด๋Ÿฐ ์ผ์„ MLflow๊ฐ€ ์ˆ˜์›”ํ•˜๊ฒŒ ํ•ด ์ค€๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

 

 

๊ทธ๋Ÿผ ์‹œ์ž‘ํ•ด๋ณด์ž.

 

 

 

 

 

2. MLflow ์‹œ์ž‘ & ํด๋ก 


 

 

๋จผ์ € mlflow๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜์ž.

 

pip install mlflow

 

 

์‹ค์Šต์„ ์œ„ํ•ด, MLflow github๋ฅผ clone ํ•˜์ž.

 

git clone https://github.com/mlflow/mlflow
cd mlflow/examples/quickstart

 

 

mlflow_tracking.py๋ฅผ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

import os
from random import random, randint

from mlflow import log_metric, log_param, log_artifacts

if __name__ == "__main__":
    print("Running mlflow_tracking.py")

    log_param("param1", randint(0, 100))

    log_metric("foo", random())
    log_metric("foo", random() + 1)
    log_metric("foo", random() + 2)

    if not os.path.exists("outputs"):
        os.makedirs("outputs")
    with open("outputs/test.txt", "w") as f:
        f.write("hello world!")

    log_artifacts("outputs")

 

์—ฌ๊ธฐ์„œ ํ™•์ธํ•  ๊ฒƒ์€ ์–ด๋–ป๊ฒŒ MLflow์™€ ์†Œํ†ตํ•˜๋Š” ๊ฐ€์ด๋‹ค.

 

1) log_param() 

ํŒŒ๋ผ๋ฏธํ„ฐ ๋กœ๊ทธ๋ฅผ MLflow db๋กœ ์ €์žฅํ•˜๋Š” ํ•จ์ˆ˜๋‹ค.

 

2) log_metric()

๋ชจ๋ธ ํ‰๊ฐ€ Score๋ฅผ MLflow db๋กœ ์ €์žฅํ•˜๋Š” ํ•จ์ˆ˜๋‹ค.

 

3) log_artifacts()

๋ชจ๋ธ ๋“ฑ ๋จธ์‹ ๋Ÿฌ๋‹ ๊ฒฐ๊ณผ๋ฌผ์„ MLflow db๋กœ ์ €์žฅํ•˜๋Š” ํ•จ์ˆ˜๋‹ค.

 

 

 

 

mlflow_tracking.py๋ฅผ ์‹คํ–‰์‹œ์ผœ๋ณด์ž.

 

python mlflow_tracking.py
> Running mlflow_tracking.py

 

๋ชฉ๋ก์„ ๋ณด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋œฌ๋‹ค.

 

 

1) mlruns : ๊ธฐ๋ณธ์ ์œผ๋กœ mlflow์˜ ์‹คํ—˜๋“ค์„ ์ €์žฅํ•˜๋Š” ํด๋”์ด๋‹ค.

2) ์ˆซ์ž 0 : Default experiment. mlflow๋Š” ์‚ฌ์šฉ์ž๊ฐ€ ์‹คํ—˜ ์•ˆ์—์„œ ์›ํ•˜๋Š” ์‹คํ—˜์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์‚ฌ์šฉ์ž๋Š” ์›ํ•˜๋Š” ์‹คํ—˜ ํด๋”๋ฅผ ๋งŒ๋“ค์–ด, ๊ทธ ์•ˆ์—์„œ ํŠน์ • ์‹คํ—˜์„ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š”๋‹ค. ์‹คํ—˜ ํด๋” ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์€ ํ›„์— ์„ค๋ช…ํ•˜๊ฒ ๋‹ค.

3) ์ผ๋ จ๋ฒˆํ˜ธ : ์‹คํ—˜์„ ํ•จ์œผ๋กœ์จ ์ €์žฅ๋œ ๋ชจ๋ธ ๊ธฐ๋ก์ด๋‹ค. ํ•˜๋‚˜ํ•˜๋‚˜๊ฐ€ ๋‹ค๋ฅธ ์‹คํ—˜ ๋ชจ๋ธ์ด๋‹ค.

 

 

 

์‹คํ—˜์„ ํ•œ ํ›„, params, artifacts, metrics ํด๋”์— ๊ฐ๊ฐ์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋œ๋‹ค.

 

 

 

 

์›น ๋Œ€์‹œ๋ณด๋“œ

 

์›น์„ ํ†ตํ•ด MLflow์˜ ์ „์ฒด์ ์ธ ์ƒํ™ฉ์„ ์ง๊ด€ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

mlflow ui

 

 

 

ํŠน์ • ์‹คํ—˜์„ ๋ˆ„๋ฅด๋ฉด ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๋ฉ”ํŠธ๋ฆญ์Šค, artifacts๋ฅผ ์ž์„ธํžˆ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

 

 

 

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์—ฌ๊ธฐ๊นŒ์ง€. ์‰ฝ์‹œ๋‹ค.

 

 

 

 

 

'๐Ÿ›  Data Engineering > MLOps' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

[MLOps] MLflow Tracking  (0) 2022.04.16