2024. 7. 3. 17:51ใ๐งช Data Science/Paper review
์ด๋ฒ ํฌ์คํ ์์ ๋ฆฌ๋ทฐํ ๋ ผ๋ฌธ์
'Towards maximizing expected possession outcome in soccer(2023)' by Pegah Rahimian
https://journals.sagepub.com/doi/10.1177/17479541231154494?icid=int.sj-full-text.similar-articles.4
๋ฅ๋ฌ๋ ์ ๊ฒฝ๋ง์ ์ด์ฉํ์ฌ ํ์ ๋๋ต์ ์ธ strategy๋ฅผ ํ์
ํ๊ณ , ๊ฑฐ๊ธฐ ์์ ๊ฐํํ์ต์ ํ์ฌ Optimal policy๋ฅผ ์ฐพ์๋ด๋ ๋
ผ๋ฌธ์ด๋ค. ํค์๋๋ RL๊ณผ ์ถ๊ตฌ๊ฐ ๋๊ฒ ๋ค.
** ์ด ํฌ์คํ
์ ๋ชจ๋ figure๊ณผ ํ๋ ๋ฆฌ๋ทฐ ๋
ผ๋ฌธ์์ ๊ฐ์ ธ์จ ๊ฒ์์ ๋ฐํ๋ค.
1. Important Question
- How to split the whole game into different game phases and expound the short/long-term objectives accordingly to address the non-stationary nature of decision making in soccer?
- How to analyze the current strategy of a team, and obtain the optimal strategy?
- How to evaluate the obtained optimal strategy without the cumbersome and expensive process of applying it in a real match?
์ถ๊ตฌ์์ phases๊ฐ ํฌ๊ฒ 4๊ฐ๋ก ๋๋๋ค๊ณ ๋ณด์์. Transition, Build-up play, Established possession, Attack. ์ถ๊ตฌ ๊ฐํํ์ต์ ์งํํ ๊ฒฐ๊ณผ, ๋ชจ๋ ํ์ด์ฆ์์ short passes๊ฐ ์ ์๋์๊ณ , short shots๊ฐ ์ถ๊ตฌ๋์๋ค.
๋ ผ๋ฌธ์ work
- ๋ฅ๋ฌ๋์ผ๋ก ball์ ๋ชฉํ ์ง์ ์ ์ ํํ๋ ํ์ ๊ฒฝํฅ ๊ด์ ์์ ํ ์ ๋ต์ ๋ถ์ํ๊ณ
- ๋๊ฐ์ ๋ชจ๋ธ์ ๋ํด ์ฃผ์ด์ง ๋ณด์์ ๊ธฐ๋ฐ์ผ๋ก RL์ ํ์ฉํ์ฌ ๋ณผ๊ณผ ์ ์๋ค์ ์์น๋ฅผ ๊ณ ๋ คํ optimal tactics๋ฅผ ๋ฐ๊ฒฌํ๋ค.
Dataset
- (x, y) of 22 players, the ball at 25 observations per second
- event data: on-ball action types(passes, dribbles..) with additional features
- every 0.04์ด์ ๋ชจ๋ ๋ฐ์ดํฐ ์กด์ฌํ๋๋ก merge
Team’s behavior prediction method : In order to represent these propensities for any situation of the game and use them for further decision making, we need to estimate two probabilities
- The selection probability surface
- The success probability surface
2. State Representation
์ด 11๊ฐ์ Input Channels๋ก ์ด๋ค์ ธ ์๋ค. ์ ์๋ค์ x, y ์ขํ, ๊ณจ๋์์ ๊ฑฐ๋ฆฌ ๋ฐ ๊ฐ๋ ๋ฑ ๋ฌผ๋ฆฌ์ ๋ฐ์ดํฐ๋ค์ด 11๊ฐ์ channels๋ก ๊ฒน๊ฒน์ด ์์ฌ์๋ค๊ณ ๋ณด๋ฉด ๋๋ค.
3. Policy network architecture
: ํ๋ฅ ํ๋ฉด์ ์ถ๋ก ํ๊ณ , ์ ์ฑ
์ ํ์ํ ํ๊ธฐ ์ํด deep learning ๊ธฐ์ ํ์ฉ.
policy network = neural network that takes a huge number of game states as input and produces the probability surfaces as outputs.
2๊ฐ์ probability surface์ ์์ํ์ผ๋ก.
- Selection surface as policy
- Success surface
Surface๋ deep learning์ผ๋ก ์ถ๋ก ⇒ CNN-GRU๊ฐ ๊ฐ์ฅ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์ ”we use the trained CNN-GRU model as our policy network in the rest of the paper”
Selection surface ์์ฒด๊ฐ policy ๋ก ์๋ํ๊ธฐ ๋๋ฌธ์, CNN-GRU๋ฅผ policy๋ก ์ฌ์ฉํ๋ค.
์๋ฌธ 1) policy๋ ๋์์ ํ๋ ์์์ธ๋ฐ..?
๋๋ต 1) selection probability ์์ฒด๊ฐ ์ ์ ํน์ ํ์ ์ ํ ๋ฐฉํฅ์ฑ์ ๋ํ๋ธ๋ค!
4. Optimization method
: ์ด๋ฏธ game state๋ฅผ ์ํด selection/success probability surface๋ฅผ ๋ง๋ จํจ.
ํ ๋ถ์์ ํตํด ๋์จ selection surface๋ optimal ํ๋ค๊ณ ํ ์ ์์ผ๋ฉฐ, success surface๋ short-term ํ ๋ณด์์ด ๋ ๊ฒ์. ์ ์ด์ success ์์ฒด๊ฐ possession์ ์ค์ ์ ๋ ๊ฒ์ด๊ธฐ ๋๋ฌธ์.
”we elaborate on our proposed optimization algorithm that can estimate the optimal full probability surfaces”
5. Markov decision process(MDP) : (S, A, R, π)
→ ๋ณผ ์์ ์๊ฐ ๋ณผ ์ต์ข ์์น ์ ํํ๋ ํ๋ฅ ์ ๋ชจ๋ธ๋งํ๋ค.
- Episode(τ): ์ฒ์ Event๋ก๋ถํฐ ์ฐ๋ฆฌํ ๋ณผ์ด (๊ณจ, ๋นผ์๊น, ์์ ๋ฅผ ์์) ๋ฑ์ผ๋ก ํต์ ๊ถ์ ์๊ธฐ ์ ๊น์ง ์ง์.
- State(s): player, ball์ ์์น, ์๋, ๊ฑฐ๋ฆฌ, ๊ณจ๊ณผ์ ๊ฐ๋ ๋ฑ input channels. two absorbing staets(goal scoring and loss of possession)
- Action space(a): two types of action space. a continuous action space selected by ball carrier + specific action like short pass, long ball and shot.
- Policy(π): the probability with which the ball carrier selects any specific location on the field.
- Reward signal(R(s,a)): goal ์ํฉ์ผ๋ก๋ง reward ์ก๋ ๊ฒ์ too sparse. field๋ฅผ 4๊ฐ์ phase๋ก ๋๋์ด์ ๊ฐ๊ฐ ๋ค๋ฅธ ๋ณด์ํจ์๋ฅผ ์ ์ฉํ๋ค. ์ฑ๊ณต(kept possession), ์คํจ(loss of possession)์ ๋ฐ๋ผ reward๋ฅผ ๋ค๋ฅด๊ฒ ์ค๋ค. ์คํจ์ ๊ฒฝ์ฐ, negative of the expected-goals value for the opponent’s shot(xG). ์ฑ๊ณต์ ๊ฒฝ์ฐ, 4๊ฐ์ ๋ณด์ ํจ์๋ฅผ phase์ ๋ฐ๋ผ ์ ์ฉ.
Reward-1) Transition: From start of the possession until the player completes the first pass or loses the ball.
objective: move the ball away from contact and change the horizontal channel
reward: ์๋ํ์ 3๊ฐ์ pressure clusters๋ก ๋๋๊ณ , destination๊ณผ ๊ฐ cluster์ centroid์์ ๊ฑฐ๋ฆฌ๋ฅผ ๋ฐ์ํจ. P(s)๋ success probability๋ฅผ ์๋ฏธ.
Reward-2) Build-up: Start from playing in their own half until the ball reaches the opposition half.
objective: Looking for opportunities to break through the midfield line of the opponent team
reward: attack ํ์ ์ ์๊ฐ defense ํ์ ์ ์๋ณด๋ค ๋๋, ๋ฎ๋์ ๋ฐ๋ผ reward๊ฐ ๋ค๋ฅด๊ฒ ์ ์ฉ๋๋ค.
Reward-3) Established possession: From the first pass in the opposition’s half until the final third of the pitch with over two consecutive actions.
objective: Retain possession
reward: larger reward to the actions moving the ball to the location on the pitch with the most success probability
Reward-4) Attack: Having controlled possession in the attacking third.
objective: Creat chance and goal scoring
reward: larger reward to the actions moving the ball to the location on the pitch with a higher success probability and a higher expected goals value
6. Objective function: expected possession outcome
: EPO is a real-valued number in the range (-1, 1). The value can be interpreted as the likelihood of the respective possession ending in a goal for the attacking team (1) or a goal for the opposing team (-1). ๊ฒฐ๊ตญ ๋ง์ง๋ง์ ๊ณจ๋ก ๋๋ ํ๋ฅ ์ ๊ฐ๋ฆฌํจ๋ค. ๋ฐ๋ผ์ can be regarded as the objective function of our optimization framework to be maximized.
์ต์ ์ solutions๋ฅผ ์ฐพ๊ธฐ ์ํด RL์ ์ ์ฉ, Policy Gradient(PG) algorithm์ ํ์ฉํ๋ค. PG algorithm
- offline RL workflow without online interaction with the environment.
- The policy network presented in the previous๋ ๊ธฐ์กด ์ ์๋ค์ behavioral policy๋ฅผ ์ด๋์ด๋ด๊ธฐ ์ํ ์ฉ๋ ↔ ์ง๊ธ์ ๊ฑฐ๊ธฐ์ ์ต์ ํ๋ฅผ ์ํจ๋ค.
actual possession policy ⇒ optimal possession policy
7. Result
: the behavioral policy vs optimal policy ⇒ ๋น๊ต๋ OPE(Off-policy Policy Evaluation method)๋ฅผ ํตํด.
optimal policy ์ ์ฉํ์ ๋, EPO ์ ์๊ฐ ํฌ๊ฒ ๋์จ๋ค.
optimal ํ์ ๋, short pass ์ฆ๊ฐํ๊ณ long ball ์ค์ด๋ ๋ค. ํนํ Attack ์ง์ ์์ short pass๋ก ๊ณต์ ์ด์ํ๋ค๊ฐ shot ํ๋ ๋น์จ ์ฆ๊ฐ.
์ฌ๋ฌ ๊ตฌ๋จ๋ค์ ๋ํ์ฌ ๊ธฐ์กด ์ ๋ต ๋ถ์ ๋ฐ optimal policy ์ ์ฉ ๊ฒฐ๊ณผ, ๋ชจ๋ ํ์ EPO๋ฅผ ์ฆ๊ฐ์ํฌ ์ ์์์ ๋ณด์.
ํฅํ ์ฐ๊ตฌ ๊ณผ์
- multi-agent๋ฅผ ํตํด opponent team๋ ์ต์ ์ ์ ๋ต ํ์ต ์ํจ๋ค. (competition ๋ฐฉ๋ฒ)
- ๊ฐ๋ณ ์ ์๋ค์ ๊ธฐ์ฌ๋๋ฅผ ํฌ์ฐฉํ๊ธฐ ์ํ framework๋ก ํ์ฅ์ํฌ ์ ์๋ค.