PhD video research: From the ATHENA lab to Bitmovin products

Originally published at: ATHENA Lab and Bitmovin: PhD Video Research and Applications

Table of Contents

Introduction

The story of Bitmovin began with video research and innovation back in 2012, when our co-founders Stefan Lederer and Christopher Mueller were students at Alpen-Adria-Universität (AAU) Klagenfurt. Together with their professor Dr. Christian Timmerer, the three co-founded Bitmovin in 2013, with their research providing the foundation for Bitmovin’s groundbreaking MPEG-DASH player and Per-Title Encoding. Five years later in 2018, a joint project between Bitmovin and AAU called ATHENA was formed, with a new laboratory and research program that would be led by Dr. Timmerer. The aim of ATHENA was to research and develop new approaches, tools and evaluations for all areas of HTTP adaptive streaming, including encoding, delivery, playback and end-to-end quality of experience (QoE). Bitmovin could then take advantage of the knowledge gained to further innovate and enhance its products and services. In the late spring and summer of 2023, the first cohort of ATHENA PhD students completed their projects and successfully defended their dissertations. This post will highlight their work and its potential applications.


Video Research Projects

Optimizing QoE and Latency of Live Video Streaming Using Edge Computing and In-Network Intelligence

Dr. Alireza Erfanian

The work of Dr. Erfanian focused on leveraging edge computing and in-network intelligence to enhance the QoE and reduce end-to-end latency in live ABR streaming. The research also addresses improving transcoding performance and optimizing costs associated with running live streaming services and network backhaul utilization. 

  1. Optimizing resource utilization – Two new methods ORAVA and OSCAR, utilize edge computing, network function virtualization, and software-defined networking (SDN). At the network’s edge, virtual reverse proxies collect clients’ requests and send them to an SDN controller, which creates a multicast tree to deliver the highest requested bitrate efficiently. This approach minimizes streaming cost and resource utilization while considering delay constraints. ORAVA, a cost-aware approach, and OSCAR, an SDN-based live video streaming method, collectively save up to 65% bandwidth compared to state-of-the-art approaches, reducing OpenFlow commands by up to 78% and 82%, respectively.
  2. Light-Weight Transcoding – These three new approaches utilize edge computing and network function virtualization to significantly improve transcoding efficiency. LwTE is a novel light-weight transcoding approach at the edge that saves time and computational resources by storing optimal results as metadata during the encoding process. It employs store and transcode policies based on popularity, caching popular segments at the edge. CD-LwTE extends LwTE by proposing Cost- and Delay-aware Light-weight Transcoding at the Edge, considering resource constraints, introducing a fetch policy, and minimizing total cost and serving delay for each segment/bitrate. LwTE-Live investigates the cost efficiency of LwTE in live streaming, leveraging the approach to save bandwidth in the backhaul network. Evaluation results demonstrate LwTE processes transcoding at least 80% faster, while CD-LwTE reduces transcoding time by up to 97%, decreases streaming costs by up to 75%, and reduces delay by up to 48% compared to state-of-the-art approaches.

Slides and more detail


Video Coding Enhancements for HTTP Adaptive Streaming using Machine Learning

Dr. Ekrem Çetinkaya

The research of Dr. Çetinkaya involved several applications of machine learning techniques for improving the video coding process across 4 categories:

  1. Fast Multi-Rate Encoding with Machine Learning – These two techniques address the challenge of encoding multiple representations of a video for ABR streaming. FaME-ML utilizes convolutional neural networks to guide encoding decisions, reducing parallel encoding time by 41%. FaRes-ML extends this approach to multi-resolution scenarios, achieving a 46% reduction in overall encoding time while preserving visual quality.
  2. Enhancing Visual Quality on Mobile Devices – These three methods focused on improving visual quality on mobile devices with limited hardware. SR-ABR integrates super-resolution into adaptive bitrate selection, saving up to 43% bandwidth. LiDeR addresses computational complexity, achieving a 428% increase in execution speed while maintaining visual quality. MoViDNN facilitates the evaluation of machine learning solutions for enhanced visual quality on mobile devices.
  3. Light-Field Image Coding with Super-Resolution – This new approach addresses the data size challenge of light field images in emerging media formats. LFC-SASR utilizes super-resolution to reduce data size by 54%, ensuring a more immersive experience while preserving visual quality.
  4. Blind Visual Quality Assessment Using Vision Transformers – A new technique, BQ-ViT, tackles the blind visual quality assessment problem for videos. Leveraging the vision transformer architecture, BQ-ViT achieves a high correlation (0.895 PCC) in predicting video visual quality using only the encoded frames.

Slides and more detail


Policy-driven Dynamic HTTP Adaptive Streaming Player Environment

Dr. Minh Nguyen

The work of Dr. Ngyuen addressed critical issues impacting QoE in adaptive bitrate (ABR) streaming, with four main contributions:

  1. Days of Future Past Plus (DoFP+) – This approach uses HTTP/3 features to enhance QoE by upgrading low-quality segments during streaming sessions, resulting in a 33% QoE improvement and a 16% reduction in downloaded data.
  2. WISH ABR – This is a weighted sum model that allows users to customize their ABR switching algorithm by specifying preferences for parameters like data usage, stall events, and video quality. WISH considers throughput, buffer, and quality costs, enhancing QoE by up to 17.6% and reducing data usage by 36.4%.
  3. WISH-SR – This is an ABR scheme that extends WISH by incorporating a lightweight Convolutional Neural Network (CNN) to improve video quality on high-end mobile devices. It can reduce downloaded data by up to 43% and enhance visual quality with client-side Super Resolution upscaling. 
  4. New CMCD Approach – This new method for determining Common Media Client Data (CMCD) parameters, enables the server to generate suitable bitrate ladders based on clients’ device types and network conditions. This approach reduces downloaded data while improving QoE by up to 2.6 times

Slides and more detail  


Multi-access Edge Computing for Adaptive Video Streaming

Dr. Jesús Aguilar Armijo

The network plays a crucial role for video streaming QoE and one of the key technologies available on the network side is Multi-access Edge Computing (MEC). It has several key characteristics: computing power, storage, proximity to the clients and access to network and player metrics, that make it possible to deploy mechanisms at the MEC node to assist video streaming.

This thesis of Dr. Aguilar Armijo investigates how MEC capabilities can be leveraged to support video streaming delivery, specifically to improve the QoE, reduce latency or increase savings on storage and bandwidth.

  1. ANGELA Simulator – A new simulator is designed to test mechanisms supporting video streaming at the edge node. ANGELA addresses issues in state-of-the-art simulators by providing access to radio and player metrics, various multimedia content configurations, Adaptive Bitrate (ABR) algorithms at different network locations, and a range of evaluation metrics. Real 4G/5G network traces are used for radio layer simulation, offering realistic results. ANGELA demonstrates a significant simulation time reduction of 99.76% compared to the ns-3 simulator in a simple MEC mechanism scenario.
  2. Dynamic Segment Repackaging at the Edge – The proposal suggests using the Common Media Application Format (CMAF) in the network’s backhaul, performing dynamic repackaging of content at the MEC node to match clients’ requested delivery formats. This approach aims to achieve bandwidth savings in the network’s backhaul and reduce storage costs at the server and edge side. Measurements indicate potential reductions in delivery latency under certain expected conditions.
  3. Edge-Assisted Adaptation Schemes – Leveraging radio network and player metrics at the MEC node, two edge-assisted adaptation schemes are proposed. EADAS improves ABR decisions on-the-fly to enhance clients’ Quality of Experience (QoE) and fairness. ECAS-ML shifts the entire ABR algorithm logic to the edge, managing the tradeoff among bitrate, segment switches, and stalls through machine learning techniques. Evaluations show significant improvements in QoE and fairness for both schemes compared to various ABR algorithms.
  4. Segment Prefetching and Caching at the Edge – Segment prefetching, a technique transmitting future video segments closer to the client before being requested, is explored at the MEC node. Different prefetching policies, utilizing resources and techniques such as Markov prediction, machine learning, transrating, and super-resolution, are proposed and evaluated. Results indicate that machine learning-based prefetching increases average bitrate while reducing stalls and extra bandwidth consumption, offering a promising approach to enhance overall performance.

Slides and more detail


Potential applications for Bitmovin products

The WISH ABR algorithm presented by Dr. Nguyen is already available in the Bitmovin Web Player SDK as of version 8.136.0, which was released in early October 2023. It can be enabled via AdaptationConfig.logic. Use of CMCD metadata is still gaining momentum throughout the industry, but Bitmovin and Akamai have already demonstrated a joint solution and the research above will help improve our implementation.

Bitmovin has experimented with server-side Super Resolution upscaling with some customers, mainly focusing on upscaling SD content to HD for viewing on TVs and larger monitors, but the techniques investigated by Dr. Çetinkaya take advantage of newer models that can extend Super Resolution to the client side on mobile devices. These have the potential to reduce data usage which is especially important to users with limited data plans and bandwidth. They can also improve QoE and visual quality while saving service providers on delivery costs. 

Controlling costs has been at or near the top of the list of challenges video developers and streaming service providers have faced over the past couple of years according to Bitmovin’s annual Video Developer Report. This trend will likely continue into 2024 and the resource management and transcoding efficiency improvements developed by Dr. Erfanian will help optimize and reduce operational costs for Bitmovin and its services. 

Edge computing is becoming more mainstream, with companies like Bitmovin partners Videon and Edgio delivering new applications that take advantage of available compute resources closer to the end user. The contributions developed by Dr. Aguilar Armijo address different facets of content delivery and provide a comprehensive approach to optimizing video streaming in edge computing environments. This has the potential to provide more actionable analytics data and enable more intelligent and robust adaptation during challenging network conditions.

Conclusion

Bitmovin was born from research and innovation and 10 years later is still breaking new ground. We were honored to receive a Technology & Engineering Emmy Award for our efforts and remain committed to improving every part of the streaming experience. Whether it’s taking advantage of the latest machine learning capabilities or developing novel approaches for controlling costs, we’re excited for what the future holds. We’re also grateful for all of the researchers, engineers, technology partners and customers who have contributed along the way and look forward to the next 10 years of progress and innovation.