[Paper review] Towards maximizing expected possession outcome in soccer

2024. 7. 3. 17:51ใ†๐Ÿงช Data Science/Paper review

 

 

์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋ฆฌ๋ทฐํ•  ๋…ผ๋ฌธ์€

'Towards maximizing expected possession outcome in soccer(2023)' by Pegah Rahimian

 

https://journals.sagepub.com/doi/10.1177/17479541231154494?icid=int.sj-full-text.similar-articles.4

 

 

๋”ฅ๋Ÿฌ๋‹ ์‹ ๊ฒฝ๋ง์„ ์ด์šฉํ•˜์—ฌ ํŒ€์˜ ๋Œ€๋žต์ ์ธ strategy๋ฅผ ํŒŒ์•…ํ•˜๊ณ , ๊ฑฐ๊ธฐ ์œ„์— ๊ฐ•ํ™”ํ•™์Šต์„ ํ•˜์—ฌ Optimal policy๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๋…ผ๋ฌธ์ด๋‹ค. ํ‚ค์›Œ๋“œ๋Š” RL๊ณผ ์ถ•๊ตฌ๊ฐ€ ๋˜๊ฒ ๋‹ค.

** ์ด ํฌ์ŠคํŒ…์˜ ๋ชจ๋“  figure๊ณผ ํ‘œ๋Š” ๋ฆฌ๋ทฐ ๋…ผ๋ฌธ์—์„œ ๊ฐ€์ ธ์˜จ ๊ฒƒ์ž„์„ ๋ฐํžŒ๋‹ค.

 

 

1. Important Question

  • How to split the whole game into different game phases and expound the short/long-term objectives accordingly to address the non-stationary nature of decision making in soccer?
  • How to analyze the current strategy of a team, and obtain the optimal strategy?
  • How to evaluate the obtained optimal strategy without the cumbersome and expensive process of applying it in a real match?

 

 

 

 

์ถ•๊ตฌ์—์„œ phases๊ฐ€ ํฌ๊ฒŒ 4๊ฐœ๋กœ ๋‚˜๋‰œ๋‹ค๊ณ  ๋ณด์•˜์Œ. Transition, Build-up play, Established possession, Attack. ์ถ•๊ตฌ ๊ฐ•ํ™”ํ•™์Šต์„ ์ง„ํ–‰ํ•œ ๊ฒฐ๊ณผ, ๋ชจ๋“  ํŽ˜์ด์ฆˆ์—์„œ short passes๊ฐ€ ์ œ์•ˆ๋˜์—ˆ๊ณ , short shots๊ฐ€ ์ถ”๊ตฌ๋˜์—ˆ๋‹ค.

 

 

๋…ผ๋ฌธ์˜ work

  1. ๋”ฅ๋Ÿฌ๋‹์œผ๋กœ ball์˜ ๋ชฉํ‘œ ์ง€์ ์„ ์„ ํƒํ•˜๋Š” ํŒ€์˜ ๊ฒฝํ–ฅ ๊ด€์ ์—์„œ ํŒ€ ์ „๋žต์„ ๋ถ„์„ํ•˜๊ณ 
  2. ๋˜‘๊ฐ™์€ ๋ชจ๋ธ์— ๋Œ€ํ•ด ์ฃผ์–ด์ง„ ๋ณด์ƒ์„ ๊ธฐ๋ฐ˜์œผ๋กœ RL์„ ํ™œ์šฉํ•˜์—ฌ ๋ณผ๊ณผ ์„ ์ˆ˜๋“ค์˜ ์œ„์น˜๋ฅผ ๊ณ ๋ คํ•œ optimal tactics๋ฅผ ๋ฐœ๊ฒฌํ•œ๋‹ค.

 

Dataset

  1. (x, y) of 22 players, the ball at 25 observations per second
  2. event data: on-ball action types(passes, dribbles..) with additional features
  3. every 0.04์ดˆ์— ๋ชจ๋“  ๋ฐ์ดํ„ฐ ์กด์žฌํ•˜๋„๋ก merge

 

Team’s behavior prediction method : In order to represent these propensities for any situation of the game and use them for further decision making, we need to estimate two probabilities

  1. The selection probability surface
  2. The success probability surface

 

 

2. State Representation

 

์ด 11๊ฐœ์˜ Input Channels๋กœ ์ด๋ค„์ ธ ์žˆ๋‹ค. ์„ ์ˆ˜๋“ค์˜ x, y ์ขŒํ‘œ, ๊ณจ๋Œ€์™€์˜ ๊ฑฐ๋ฆฌ ๋ฐ ๊ฐ๋„ ๋“ฑ ๋ฌผ๋ฆฌ์  ๋ฐ์ดํ„ฐ๋“ค์ด 11๊ฐœ์˜ channels๋กœ ๊ฒน๊ฒน์ด ์Œ“์—ฌ์žˆ๋‹ค๊ณ  ๋ณด๋ฉด ๋œ๋‹ค.

 

 

 

3. Policy network architecture

: ํ™•๋ฅ  ํ‘œ๋ฉด์„ ์ถ”๋ก ํ•˜๊ณ , ์ •์ฑ…์„ ํ˜•์‹ํ™” ํ•˜๊ธฐ ์œ„ํ•ด deep learning ๊ธฐ์ˆ  ํ™œ์šฉ.
policy network = neural network that takes a huge number of game states as input and produces the probability surfaces as outputs.


2๊ฐœ์˜ probability surface์„ ์•„์›ƒํ’‹์œผ๋กœ.

  1. Selection surface as policy
  2. Success surface

 

Surface๋Š” deep learning์œผ๋กœ ์ถ”๋ก  ⇒ CNN-GRU๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ž„ ”we use the trained CNN-GRU model as our policy network in the rest of the paper”

Selection surface ์ž์ฒด๊ฐ€ policy ๋กœ ์ž‘๋™ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, CNN-GRU๋ฅผ policy๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
์˜๋ฌธ 1) policy๋Š” ๋Œ€์ƒ์˜ ํ–‰๋™ ์–‘์‹์ธ๋ฐ..?
๋Œ€๋‹ต 1) selection probability ์ž์ฒด๊ฐ€ ์„ ์ˆ˜ ํ˜น์€ ํŒ€์˜ ์„ ํƒ ๋ฐฉํ–ฅ์„ฑ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค!

 

 

 

 

 

4. Optimization method

: ์ด๋ฏธ game state๋ฅผ ์œ„ํ•ด selection/success probability surface๋ฅผ ๋งˆ๋ จํ•จ.

ํŒ€ ๋ถ„์„์„ ํ†ตํ•ด ๋‚˜์˜จ selection surface๋Š” optimal ํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์—†์œผ๋ฉฐ, success surface๋Š” short-term ํ•œ ๋ณด์ƒ์ด ๋  ๊ฒƒ์ž„. ์• ์ดˆ์— success ์ž์ฒด๊ฐ€ possession์— ์ค‘์ ์„ ๋‘” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์ž„.

”we elaborate on our proposed optimization algorithm that can estimate the optimal full probability surfaces”

 

 

5. Markov decision process(MDP) : (S, A, R, π)

→ ๋ณผ ์†Œ์œ ์ž๊ฐ€ ๋ณผ ์ตœ์ข… ์œ„์น˜ ์„ ํƒํ•˜๋Š” ํ™•๋ฅ ์„ ๋ชจ๋ธ๋งํ•œ๋‹ค.

  1. Episode(τ): ์ฒ˜์Œ Event๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌํŒ€ ๋ณผ์ด (๊ณจ, ๋นผ์•—๊น€, ์†Œ์œ ๋ฅผ ์žƒ์Œ) ๋“ฑ์œผ๋กœ ํ†ต์ œ๊ถŒ์„ ์žƒ๊ธฐ ์ „๊นŒ์ง€ ์ง€์†.
  2. State(s): player, ball์˜ ์œ„์น˜, ์†๋„, ๊ฑฐ๋ฆฌ, ๊ณจ๊ณผ์˜ ๊ฐ๋„ ๋“ฑ input channels. two absorbing staets(goal scoring and loss of possession)
  3. Action space(a): two types of action space. a continuous action space selected by ball carrier + specific action like short pass, long ball and shot.
  4. Policy(π): the probability with which the ball carrier selects any specific location on the field.
  5. Reward signal(R(s,a)): goal ์ƒํ™ฉ์œผ๋กœ๋งŒ reward ์žก๋Š” ๊ฒƒ์€ too sparse. field๋ฅผ 4๊ฐœ์˜ phase๋กœ ๋‚˜๋ˆ„์–ด์„œ ๊ฐ๊ฐ ๋‹ค๋ฅธ ๋ณด์ƒํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ๋‹ค. ์„ฑ๊ณต(kept possession), ์‹คํŒจ(loss of possession)์— ๋”ฐ๋ผ reward๋ฅผ ๋‹ค๋ฅด๊ฒŒ ์ค€๋‹ค. ์‹คํŒจ์˜ ๊ฒฝ์šฐ, negative of the expected-goals value for the opponent’s shot(xG). ์„ฑ๊ณต์˜ ๊ฒฝ์šฐ, 4๊ฐœ์˜ ๋ณด์ƒ ํ•จ์ˆ˜๋ฅผ phase์— ๋”ฐ๋ผ ์ ์šฉ.

 

 

Reward-1) Transition: From start of the possession until the player completes the first pass or loses the ball.

objective: move the ball away from contact and change the horizontal channel

reward: ์ƒ๋Œ€ํŒ€์„ 3๊ฐœ์˜ pressure clusters๋กœ ๋‚˜๋ˆ„๊ณ , destination๊ณผ ๊ฐ cluster์˜ centroid์™€์˜ ๊ฑฐ๋ฆฌ๋ฅผ ๋ฐ˜์˜ํ•จ. P(s)๋Š” success probability๋ฅผ ์˜๋ฏธ.

 

 

Reward-2) Build-up: Start from playing in their own half until the ball reaches the opposition half.

objective: Looking for opportunities to break through the midfield line of the opponent team

reward: attack ํŒ€์˜ ์ ์ˆ˜๊ฐ€ defense ํŒ€์˜ ์ ์ˆ˜๋ณด๋‹ค ๋†’๋ƒ, ๋‚ฎ๋ƒ์— ๋”ฐ๋ผ reward๊ฐ€ ๋‹ค๋ฅด๊ฒŒ ์ ์šฉ๋œ๋‹ค.

 

 

 

Reward-3) Established possession: From the first pass in the opposition’s half until the final third of the pitch with over two consecutive actions.

objective: Retain possession

reward: larger reward to the actions moving the ball to the location on the pitch with the most success probability

 

 

 

Reward-4) Attack: Having controlled possession in the attacking third.

objective: Creat chance and goal scoring

reward: larger reward to the actions moving the ball to the location on the pitch with a higher success probability and a higher expected goals value

 

 

 

 

 

 

6. Objective function: expected possession outcome

: EPO is a real-valued number in the range (-1, 1). The value can be interpreted as the likelihood of the respective possession ending in a goal for the attacking team (1) or a goal for the opposing team (-1). ๊ฒฐ๊ตญ ๋งˆ์ง€๋ง‰์€ ๊ณจ๋กœ ๋๋‚  ํ™•๋ฅ ์„ ๊ฐ€๋ฆฌํ‚จ๋‹ค. ๋”ฐ๋ผ์„œ can be regarded as the objective function of our optimization framework to be maximized.

์ตœ์ ์˜ solutions๋ฅผ ์ฐพ๊ธฐ ์œ„ํ•ด RL์„ ์ ์šฉ, Policy Gradient(PG) algorithm์„ ํ™œ์šฉํ–ˆ๋‹ค. PG algorithm

 

  • offline RL workflow without online interaction with the environment.
  • The policy network presented in the previous๋Š” ๊ธฐ์กด ์„ ์ˆ˜๋“ค์˜ behavioral policy๋ฅผ ์ด๋Œ์–ด๋‚ด๊ธฐ ์œ„ํ•œ ์šฉ๋„ ↔ ์ง€๊ธˆ์€ ๊ฑฐ๊ธฐ์„œ ์ตœ์ ํ™”๋ฅผ ์‹œํ‚จ๋‹ค.

actual possession policy ⇒ optimal possession policy

 

 

 

 

 

7. Result

: the behavioral policy vs optimal policy ⇒ ๋น„๊ต๋Š” OPE(Off-policy Policy Evaluation method)๋ฅผ ํ†ตํ•ด.

 

 

 

optimal policy ์ ์šฉํ–ˆ์„ ๋•Œ, EPO ์ ์ˆ˜๊ฐ€ ํฌ๊ฒŒ ๋‚˜์˜จ๋‹ค.

optimal ํ–ˆ์„ ๋•Œ, short pass ์ฆ๊ฐ€ํ•˜๊ณ  long ball ์ค„์–ด๋“ ๋‹ค. ํŠนํžˆ Attack ์ง€์ ์—์„œ short pass๋กœ ๊ณต์„ ์šด์˜ํ•˜๋‹ค๊ฐ€ shot ํ•˜๋Š” ๋น„์œจ ์ฆ๊ฐ€.

 

 

 

์—ฌ๋Ÿฌ ๊ตฌ๋‹จ๋“ค์— ๋Œ€ํ•˜์—ฌ ๊ธฐ์กด ์ „๋žต ๋ถ„์„ ๋ฐ optimal policy ์ ์šฉ ๊ฒฐ๊ณผ, ๋ชจ๋“  ํŒ€์˜ EPO๋ฅผ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ž„.

 

 

 

 

 

ํ–ฅํ›„ ์—ฐ๊ตฌ ๊ณผ์ œ

  1. multi-agent๋ฅผ ํ†ตํ•ด opponent team๋„ ์ตœ์ ์˜ ์ „๋žต ํ•™์Šต ์‹œํ‚จ๋‹ค. (competition ๋ฐฉ๋ฒ•)
  2. ๊ฐœ๋ณ„ ์„ ์ˆ˜๋“ค์˜ ๊ธฐ์—ฌ๋„๋ฅผ ํฌ์ฐฉํ•˜๊ธฐ ์œ„ํ•œ framework๋กœ ํ™•์žฅ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.