This project is maintained by vishaltiwari
We aimed to use traditional computer vision methods to extract high-level features from broadcast tennis video, and use these features to analyze the player play-style pattern. We extracted features like 2D ball features based on data association method by F Yan, 3D ball trajectory using numerical optimization, player location.
We used matches from Australian Open 2017 videos that were publically available on the web. These are complete broadcast videos, contains gameplay, commercials, crowd, etc.
Before attempting to extract the features, we need to segment out only the gameplay segments and ignore the rest, like crowd, commercials, etc. This pre-processing was done by finding the SIFT features of an initial court template and matching every frame’s SIFT to get the gameplay segments. Further, stabilization of the videos is required. By computing a geometric transformation matrix from the matching SIFT features, followed by the actual transformation and interpolation, we achieve video stabilization.
The method starts by taking a temporal difference across frames to find ball candidates. Refinement of these ball candidates based on the size and shape of the blob is carried out. This filtering is based on thresholding the number of pixels on each blob, and fitting an ellipse in each of the blobs and filtering out high eccentricity ellipses and non-elliptical blobs. These blobs are aggregated across frames by fitting a constant acceleration model to form tracklets. Later these tracklets are optimized to grow to find better set of tracklets. Finally, these tracklets are grouped to get the ball trajectory in the video sequence. For more details see F Yans thesis and my implementaiton of it on this page. Results: We annotated a total of 1338 frames, and detecting a ball position with 5-pixel radius is marked as detection. Every frame contains a ball. Accuracy is the ratio of detections, and we get an accuracy of 0.9006. Shown below in yellow circles are ball detection from ball candidates and blacks circles are from the tracklets.
As the videos are monocular, its impossible to find the 3D trajectory of the ball, unless we include the ball physics. We use the momentum equation to get three coupled ordinary differential equations are shown below, whose numerical solution for a set of translation velocity, angular velocity, the coefficient of restitution and coefficient of friction will give a 3D trajectory. Boundry conditions have to be taken care of, for example when the ball hits the court, the angular and translational velocities need to be changed, depending on the coefficient of friction and restitution.
Now to find the 3D trajectory corresponding to a 2D path, a least squared loss function is defined between a projected initialized trajectory to 2D(using a camera calibration matrix) and the corresponding 2D path. The parameters that minimize our loss function gives us the 3D trajectory of the ball. This objective function is minimized using the interior point method in Matlab. Seen below is such a trajectory. The first pair shows the initial trajectory, which is projected to 2D (red) and the actual 2D path is shown in green, while in the 2nd row, displayed is the optimized trajectory. Note the bump at the bottom; it’s because of time discretization, which defines the resolution of the trajectory. As we didn’t have actual 3D ball trajectory, the results were qualitatively evaluated based on the player’s height and where he is hitting the ball.
One of the features of interest is the player’s location in the video sequences. Our dataset is a singles tournament, so only two players, the near and the far player need to be detected. We start by extracting possible bounding boxes of players by running a pre-trained faster-RCNN for person class on each of the frames. These boxes contain players along with some of the side umpires, the match referee, etc. By using a suitable assumption that only players move during a game sequence, non-player bounding boxes can be filtered out. When the faster-RCNN fails to detect any of the players in a frame, the previously detected bounding box(with some degree of expansion) combined with foreground blobs which represent players are used to locate the location of the player in the frame. This works because players couldn’t move large distances in a matter of a single frame. These foreground blobs are also used to refine the faster-RCNN results by taking the minimum bounding box of the player’s foreground pixels.
The results were not satisfactory, so in the end a YOLO network was fine tuned to detect near and far-end player.(Carried out by Anurag)
Based on the y coordinates of the ball (x,y, frame) need to be classified as bounce, hit or net. This naive approach uses graphs as shown below. Legend:
From the rally in the player detection and ball tracking, we can see that the upper player is in control of the rally and making the lower player run around the court to gain a point, and he does win the point.