bloggeek

The leading authority on WebRTC

Updated: 1 hour 41 min ago

With WebRTC, don’t expect Google to be your personal outsourcing vendor

Mon, 03/27/2023 - 13:00

Understanding how WebRTC is governed in reality will enable you to make better decisions in your development strategy.

If you are correct or not is something we can argue about. What we can’t argue is that the expectation that a company who is maintaining an open source library doesn’t owe you anything.

Free is worth exactly what you pay for it. 0⃣

And there lies the whole issue – if you aren’t paying for WebRTC, then what gives you the right to complain? (btw – this is different from the other side of it – could Google do a better job of maintaining WebRTC for everyone at the same or lower effort, while increasing external contributions to it).

Table of contents

Why this article?
Who “owns” WebRTC?
A few words about libwebrtc
Vendors in the WebRTC ecosystem
Putting your money where your mouth is
Please Google add a feature for me!
How to think about support in WebRTC?

Why this article?

To. Many, Times. People. Complain. About. Google.

I do that as well

If you are complaining, at least know that you’re complaining about something that is reasonable…

One of the more recent cases comes from Twilio (or more accurately a customer of theirs):

There was a minor change in Google’s implementation of WebRTC. For some reason, they decided to be less lenient with how they parse iceServers in peer connections to be more “spec compliant”.

Yes. It is nitpicking.

Yes. It is a useless change.

Yes. They could have decided not to do it.

But they did. And in a weird way, it makes sense to do so.

And there’s a process in place already for dealing with that – Canary and Beta versions of Chrome that vendors (like Twilio) can use to catch and handle these things beforehand. Or they can… well… register to the WebRTC Insights

Twilio had to fix their code (and they did by the way), and yet there are those who blame Google here for making changes in Chrome. Changes that one can say are needed.

I’d add a few more thoughts here before I continue to dive in to this topic properly:

When you make an omelet you break a few eggs. Every change done in Chrome is going to break someone’s code
Chrome is used by billions of users, on countless different devices, using implementations of an endless stream of companies and developers. If YOU think that you can create code that is flawless that won’t break for someone in the next upgrade, then do let me know – I am not hiring, but for you, I’ll definitely make an exception

Who “owns” WebRTC?

WebRTC is an open standard governed by the W3C and an open source library which confusingly is also named “webrtc”. I prefer to call it libwebrtc.

The WebRTC open source standard is somewhat split in “ownership” between the W3C and the IETF. W3C is in charge of the API surface we use in the browser for WebRTC and the IETF on the network protocol itself – what gets sent over the network.

WebRTC as an open source library is… well… it depends. Google develops and maintains libwebrtc – that’s the source code that goes into Chrome. And Edge. And Firefox. And Safari. Yes – all of them. And then there are other alternative libraries you can use.

The thing is this – you can’t really use a different WebRTC implementation in the browser, because browsers come with libwebrtc “built-in”. And in many cases, if you don’t need a browser, you may still want to use libwebrtc just to be as close as possible to the browser implementation.

Does that mean that Google owns the WebRTC implementation? To some degree it does – while there are alternatives, none of them are truly usable for many of the use cases.

That said, anyone can fork the Google WebRTC implementation and create his own project – open source or otherwise – and continue from there. Apple could do it. So could Microsoft and Mozilla. And yet they all decided to stick with libwebrtc as is.

Why is that?

I can think of two main reasons:

Why “waste” resources (engineers, time, money, etc) when you can get it for free and have Google develop it for you?
If you need to end up interoperating or having your application run on Chrome (=the Internet for non-iPhone users), then your best bet is to stick as close as possible to the source – which is libwebrtc

So in a way, Google owns WebRTC without really owning it. At least as long as Chrome is the undisputed and dominant form in which we consume the internet (are you reading this on a Chrome browser?)

I usually place a global market share graph at this stage. This time, I’ll share this website’s visitors distribution:

A few words about libwebrtc

libwebrtc is maintained by Google for Google. It is open sourced and you can use it. You can even contribute back, which isn’t a simple process.

By Google for Google means that prioritization of features, testing and bug fixes is done based on Google’s needs. These needs include Google Meet, a few other Google services and the need to support and maintain the larger ecosystem.

Who sets the tone here? What decides if your bug is more important to deal with than Google Meet or another vendor’s problems?

Put yourself in the shoes of the Google product manager for WebRTC and you’ll know the answer – it would be Google Meet first. The others later.

This also sets the tone as to the build system and code structure of libwebrtc. It is highly geared towards its use inside Chrome. Less elsewhere. And this in turn means that adopting it as a library inside your own application means dealing with code that isn’t meant to be a classic generic purpose SDK – you’ll need to figure your way through it (and with a bit less documentation than you’d like).

Vendors in the WebRTC ecosystem

There are now hundreds if not thousands of vendors using WebRTC in the ecosystem. They do it directly or indirectly via CPaaS vendors and other tooling and solutions. You can find many of them in my WebRTC Developer Tools Lanscape. Most of them view WebRTC as free. Not only that, it seems like many treat WebRTC as a human right – it needs to be there for them, it must be perfect, and if there’s something ”wrong” with it, then humanity has the obligation to fix it for them.

So… WebRTC is free. But what does that mean exactly? What is the SLA associated with it? What can you expect of it and come back to complain if it isn’t met?

Here are a few additional interesting questions, If WebRTC is cardinal and strategic to your application:

Have you invested anything in returning back to the community around WebRTC? Should you?
Do you have someone working part time or full time on the libwebrtc codebase itself? Is that work done in the public library or in your proprietary in-house fork?
If you run into an issue, can you ask Google to help you out and will they spend the time and resources to do so?
Can you pay Google (or anyone else) a support fee for solving your specific issues? (no)
Are you here only to take or also to give?

To be clear – there are no right or wrong answers here – just make sure you position your expectations based on your answers as well

Putting your money where your mouth is

Philipp Hancke has been doing WebRTC for a long time and is renowned for his bug reports. He even got Google to fix quite a few of them. Some bugs stayed open for years however, like this bug about TURN relay servers being used sometimes in cases where using STUN will be just fine. A bug here has an impact on the percentage of calls that get relayed via TURN servers which has a negative impact on call quality (at times) but also increases the cost to run those.

This bug has been open for since 2016. Quite a few Googlers took a look but without finding anything that stood out. The crucial hint of what goes wrong came in 2021 in another bug report. In the end, Philipp had to acquire the skills necessary to fix the bug (which will hopefully happen before the end of 2023).

This takes time and time is not cheap – especially that of engineers. Microsoft as his employer apparently decided it was important enough for him to spend time on fixing this and other issues.

Please Google add a feature for me!

HEVC encoding and decoding in WebRTC seems to be a topic some folks get excited about. It would be great to know why..

There is a bug report about it in the WebRTC issue tracker which gets fairly frequent updates. And yet… Google does nothing! How can that be?

One would say that’s because it is out of the requirements of what Google needs for Google. There are other contributing factors as well here:

It is also not simple to implement and maintain
Testing this is a headache, especially considering all potential edge cases, hardware, devices, …
Patents. HEVC is a legal minefield. Chrome supports HEVC only when the underlying hardware does. Why would Google go further into that minefield for you?
This isn’t a feature in WebRTC. Not a mandatory one. Even as an optional one you can argue that it is somewhat controversial

How to think about support in WebRTC?

There’s this modern concept of zero trust in cloud computing these days.

Here’s my suggestion to you wrt WebRTC and your stance:

Zero expectations.

Don’t expect – and you won’t be disappointed.

But more importantly – understand how this game is played:

Use WebRTC. Take what you are given and make the most of it
If you need to modify the source code:
- Be sure to invest time and thought into how to do it in a way that will let you upgrade to later releases of libwebrtc
- Upgrade frequently. 4 times a year is great. Less is going to be an issue
- Follow up on security issues to patch in-between releases if needed. We keep track of these in WebRTC Insights
Test frequently
- Test against the beta and canary releases
- If things break – report back. Make sure to add as much useful information as possible (follow these suggestions for submitting a WebRTC bug in Chrome)
- If things break – don’t wait for Google to fix it. See if there’s something on your end you can do to fix things and work around the issue
Have means to update your application
- If you end up with an incompatibility with Chrome, you need a way to upgrade your application. Which will take time. You are in a race against the Chrome release train here
- A way to release a hotfix to whatever it is your customers are using. Something that can be deployed within hours or days
Browsers have a release cadence of a version per month. Think about that. And then plan accordingly
Assume things will break. It is not a matter of if – just of when and how
Things can be handled and managed better by the Google team for WebRTC. But it isn’t. Nothing you can really do about it

And yes – we’re here to help – you can use WebRTC Insights to get ahead of these issues in many ways.

The post With WebRTC, don’t expect Google to be your personal outsourcing vendor appeared first on BlogGeek.me.

Different WebRTC server allocation schemes for scaling group calling

Mon, 03/13/2023 - 13:00

In group calls there are different ways to decide on WebRTC server allocation. Here are some of them, along with recommendations of when to use what.

In WebRTC group calling, media server scaling is one of the biggest challenges. There are multiple scaling architectures that are used, and most likely, you will be aiming at a routing alternative, where media servers are used to route media streams around between the various participants of a session.

As your service grows, you will need to deal with scale:

Due to an increase in the number of users in a single session
Because there’s a need to cater for a lot more sessions concurrently
Simply due to the need to support users in different geographical locations

In all these instances, you will have to deal with the following challenge: How do you decide on which server to allocate a new user? There are various allocation schemes to choose from for WebRTC group calling. Each with its own advantages and challenges. Below, I’ll highlight a few such schemes to help you with implementing the WebRTC allocation scheme that is most suitable for your application.

Table of contents

Single data center allocation techniques
Region selection techniques
A word about allocation metrics
Final words

Single data center allocation techniques

First things first. Media servers in WebRTC don’t scale well. For most use cases, a single server will be able to support 200-500 users. When more than these numbers are supported, it will usually be due to the fact that it sends lower bitrates by design, supports only voice or built to handle only one way live streaming scenarios.

This can be viewed as a bad thing, but in some ways, it isn’t all bad – with cloud architectures, it is preferable to keep the blast radius of failures smaller, so that an erroneous machine ends up affecting less users and sessions. WebRTC media servers force developers to handle scaling earlier in their development.

Our first order of the day is usually going to be deciding how to deal with more than a single media server in the same data center location. We are likely to load-balance these media servers through our signaling server policy, effectively associating a media server to a user or a media stream when the user joins a session. Here are a few alternatives to making this decision.

Server packing

This one is rather straightforward. We fill out a media server to capacity before moving on to fill out the next one.

Advantages:

Easy to implement
Simple to maintain

Challenges:

Increase blast radius by design
Makes little use of other server resources that are idle

Least used

In this technique, we look for the media server that has the most free capacity on it and place the new user or session on it.

Advantages:

Automatically balances resources across servers

Challenges:

Requires the allocation policy to know all sever’s capacities at all times

Round robin

Our “don’t think too much” approach. Allocate the next user or session to a server and move on to the next one in the list of servers for the next allocation.

Advantages:

Easy to implement

Challenges:

Feels arbitrary

Random

Then there’s the approach of picking up a server by random. It sounds reckless, but in many cases, it can be just as useful as least used or round robin.

Advantages:

Easy to implement

Challenges:

Feels really arbitrary

Region selection techniques

The second part is determining which region to send a session or a user in a session to.

If you plan on designing your service around a single media server handling the whole session, then the challenge is going to be where to open a brand new session (adding more users takes place on that same server anyway). Today, many services are moving away from the single server approach to a more distributed architecture.

Lets see what our options are here in general.

First in room

The first user in a session decides in which region and data center it gets created. If there are more than a single media server in that data center, then we go with our single data center allocation techniques to determine which one to use.

This is the most straightforward and naive approach, making it almost the default solution many start with.

Advantages:

East to implement

Challenges:

Group sizes are limited by a single machine size and scale
If the first user to join is located from all the rest of the users, then the media quality will be degraded for all the rest of the participants
It makes deciding capacities and availability of resources on servers more challenging due to the need to reserve capacity for potential additional users

Note that everything has a solution. The solutions though makes this harder to implement and may degrade the user experience in the edge cases it deals with.

Application specific

You can pick the first that joins the room to make the decision of geolocation or you can use other means to do that. Here, the intent is to use something you know in your application in advance to make the decision.

For example, if this is a course lesson with the teacher joining from India and all the students are joining from the UK, it might be beneficial to connect everyone to a media server in the UK or vice versa – depending on where you want to put the focus.

A similar approach is to have the session determine the location by the host (similar to first in room) or be the configuration of the host – at account creation or at session creation.

Advantages:

Usually easy to implement

Challenges:

Group sizes are limited by a single machine size and scale
It makes deciding capacities and availability of resources on servers more challenging due to the need to reserve capacity for potential additional users
Not exactly a challenge, but mostly an observation – to some applications, the user base is such that creating such optimizations makes little sense. An example can be a country-specific service

Cascading

Cascading is also viewed as distributed/mesh media servers architecture – pick the name you want for it.

With cascading, we let media servers communicate with each other to cater for a single session together. This approach is how modern services scale or increase media quality – in many ways, many of the other schemes here are “baked” into this one. Here are a few techniques that are applicable here:

Always connect a new user to the closest media server available. If this media server isn’t already part of the session, it will be added to the session by meshing it with the other media servers that cater for this session
When capacity in a media server is depleted, add a new user to a session by scaling it horizontally in the same data center with one of the techniques described in single data server allocation at the beginning of this article
In truly large scale sessions (think 10,000 users or more), you may want to entertain the option of creating a hierarchy of media servers where some don’t even interact with end users but rather serve as relay of media between media servers

Advantages:

Can achieve the highest media quality per individual user

Challenges:

Hard to implement
Usually requires more server resources

Sender decides

This one surprised me the first time I saw it. In this approach, we “disconnect” all incoming traffic from outgoing and treat each of them separately as if it were an independent live stream.

What does that mean? When a user joins, he will always connect to the media server closest to them in order to send their media. For the incoming media from other users, he will subscribe to their streams directly on the media servers of those users.

Advantages:

Rather simple to implement

Challenges:

Doesn’t use good inter-data center links between the servers
Doesn’t “feel” right. Something about the fact that not a single media server knows the state of the user’s device bothers me in how you’d optimize things like bandwidth estimation in this architecture

A word about allocation metrics

One thing I ignored in all this is how do you know when a server is “full”. This decision can be done in multiple ways, and I’ve seen different vendors take different approaches here. There are two competing aspects here to deal with:

Utilization – we want our servers to be utilized to their fullest. Resources we pay for and not use are wasted resources
Fragmentation – if we cram more users on servers, we may have a problem when a new user joins a session but has no room on the media server hosting that session. So at times, we’d like to keep some slack for such users. The only question is how much slack

Here are a few examples, so you can make an informed decision on your end:

Number of sessions. Limit the number of sessions on a server, no matter the number of users each session has. Good for services with rather small and predictable session sizes. Makes it easier to handle resource allocations in cases of server fragmentation
Number of users. Limit the number of users a single server can handle
CPU. Put a CPU threshold. Once that threshold is breached, mark the media server as full. You can use two thresholds here – one for not allowing new sessions on the server and one for not allowing any more users on the server
Network. Put a network threshold, in a similar way to what we did above for CPU

Sometimes, we will use multiple metrics to make our allocation decision.

Final words

Scaling group calls isn’t simple once you dive into the details. There are quite a few WebRTC allocation schemes that you can use to decide where to place new users joining group sessions. There are various techniques to implement allocation of users in group calling, each with its own advantages and challenges.

Pick your poison

One last word – this article was written based on a new lesson that was just added to the Advanced WebRTC Architecture course. If you are looking for the best WebRTC training, then check out my WebRTC Courses.

The post Different WebRTC server allocation schemes for scaling group calling appeared first on BlogGeek.me.

Can I trust WebRTC getStats accuracy?

Mon, 02/27/2023 - 12:30

Yes and no. WebRTC getStats is what we have to work with, so we have to make do with it. That said, your real problems may lie elsewhere altogether.

Philipp Hancke assisted in writing this article and Midjourney helped with most of the visuals

This is the question I was posed in a meeting last week:

Can I trust WebRTC getStats?

As the Jewish person that I am, I immediately answered with a question of my own:

Assume the answer is “No”. What are you going to do now?

I thought the conversation merits a bit more discussion and some public sharing, which led to this article being written.

Table of contents

TL;DR
A short history of WebRTC getStats
Google WebRTC housecleaning project
Firefox & Safari
Keeping up the pace with WebRTC getStats changes
This is a good thing
Chrome’s WebRTC getStats implementation might not be the reason for bad metric values
What can you do about WebRTC getStats changes?

TL;DR

Yes. You can and should trust the accuracy of WebRTC getStats, but like with everything else, you should also keep a dose of happy suspicion around you.

Like any piece of software, libwebrtc and its getStats implementation by extension, has bugs. These bugs get fixed over time. The priority given to fixing them relates mostly to how much Google’s own services suffer from and a seemingly arbitrary prioritization for the rest of the issues.

See below to learn more on why we have a problem and what you can do about it.

A short history of WebRTC getStats Midjourney, envisioning the history of WebRTC getStats

WebRTC was announced somewhere in 2011 and the initial public code in Chrome was released in 2012. The protocol itself was stabilized and officially published by the W3C in January 2021. Just… 10 years later.

In between these 10 years a lot of discussions took place and the actual API surface of the WebRTC standard specification was modified to fit the feedback provided and to encompass additional use cases and requirements.

We’ve had these discussions taking place in parallel to WebRTC being implemented in web browsers and shipped out so developers can make use of them. Years before WebRTC was officially “standardized” we had hundreds if not thousands of applications in production using WebRTC, oftentimes with paying customers.

At some point, the getStats implementation in the standard specification diverged from that implemented by Google in Chrome, ending with two main alternatives:

Spec-compliant getStats – the new API that adheres to the standard specification. Given that this specification is authored by Googlers it is not surprising that it ended up being a description of what Chrome implemented, whether it made much sense or not. This was added in Chrome 58 back in January 2017
Legacy getStats – the original implementation in Chrome

. This made switching from one to the other a challenge:

Google could just implement the new stats, but that would break applications that used legacy getStats implementation
Developers wanted to use the spec compliant stats, but needed a browser that supports them

The decision was made that the distinction between the two would be how getStats() is called. Callback-based invocation returned the legacy stats while using a promise returned spec-compliant getStats. The logic behind this was that promises was a new construct introduced to Javascript at the time, so developers who used the legacy getStats didn’t use promises (yet).

This approach worked rather well for the last 6 years, with many (most?) applications adopting the use of the spec-compliant getStats:

We observed a step drop in usage when Google Meet stopped using the legacy API (that’s the blue line going down). That said, a few outliers still remain who use the old getStats. They will not be able to do so in 2024.

Google WebRTC housecleaning project

Fast forward to today (or last year).

WebRTC is a solid standard and implementation used by many. It got us through the pandemic in many ways and aspects.

All the bigger requirements from WebRTC are behind us. There aren’t that many innovations or new features that get introduced to it.

Which is leading Google in recent months to house cleaning tasks:

Figuring out where they can squeeze the lemon a bit more for performance reasons
Where they can get rid of deadweight by deprecating and killing unnecessary code
Following the WebRTC specification even more closely
Beefing up best practices in security even more

This house cleaning work has reached getStats, and with it, 4 main areas:

Deprecating and later killing legacy getStats (after waiting for Google Meet to stop using it and migrate to the spec compliant variation)
Trimming down the results object for performance reasons
“Randomizing” the object identifiers in the returned getStats structure for both performance and best practices reasons. This is still planned so it is best to prepare for it and not to interpret the “id” attribute in any way
Making sure all stats in the specifications are reflected in the getStats implementation itself

Such changes are great when viewed in the long term. But in the short term they are a huge headache.

Firefox & Safari

Since Safari uses libwebrtc, it will get most statistics out of the box. However, the binding at the WebKit layer needs some code to be written which creates some difference with libWebRTC changes that Safari does not notice. We observed this with the “trackIdentifier” property recently but there may be others. Apple seems rather reactive here.

Firefox used to spearhead the “spec” getStats implementation but has fallen behind and lacks several stats types (such as candidate-pair stats). This means workarounds like shown by this WebRTC sample are still required for very basic functionality. Statistics related to media quality are lacking even more.

Keeping up the pace with WebRTC getStats changes

At testRTC, we’re offering tools for the full lifecycle of WebRTC applications. These include testing and monitoring services. As such, we rely heavily on getStats.

Years ago, we had to implement the migration from legacy stats to spec complaint stats.

Then came 2022 and with it the housekeeping changes by Google to the statistics found in getStats. It started with Chrome 107 and continues even today. With each such release, we need to get an experienced WebRTC developer to check, test and fix our code to make sure our services collect the statistics properly. All that is on top of the need to support more metrics that Google adds to Chrome in WebRTC getStats from time to time.

Our job is harder than most in this simply because we need to collect and support all the stats – the customer base we have is varied and we never really know which metrics they’d be interested in.

This task of keeping up with getStats has been a bit of a challenge in the last few months. That’s because in each release something else changes. Each step is reasonable. Needed. Minor. But it brings with it changes we need to do in our own planning and roadmap.

To others, such changes have brought with them breakages as well. At times the need to update and upgrade open source components or to fix their own code.

This is a good thing

It is important to state – the changes and work conducted here by Google is for the better.

Going for a spec compliant WebRTC getStats implementation means we have actual documentation that we expect to work. It also means interoperability with other browsers and components (assuming they strive to spec compliance as well).

Improvements in performance and polishing out best practices means better performance and code for WebRTC applications in general.

Removing deadweight and deprecated/unused statistics and similar components means smaller codebase with less edge cases and “things” to test.

This is what we want our WebRTC implementation to be and look like.

The fact that we need to undergo this ordeal is the price we need to pay for it. It would have been a wee bit nicer if Google would lay their plans of such changes well in advance (not through sporadic PSAs but rather as a kind of a public roadmap). This will enable better planning for those running such applications. But it is what it is. And frankly – we get what we pay for (=free).

Chrome’s WebRTC getStats implementation might not be the reason for bad metric values

Then there are bugs. Metrics you obtain for getStats that don’t seem to reflect reality.

There are usually 3 reasons for that to happen:

Chrome. There’s a Chrome bug that leads to bad metrics results via getStats. As I stated earlier, these get fixed based on the priority and backlog of Google when it comes to their libwebrtc library
You. The value is correct. You just don’t understand what it means or how it gets calculated. Since there’s little in the way of documenting each and every metric in getStats, this is quite common
The other side. When your browser interacts with a non-browser device, a native mobile application or a media server, it gets a lot of the data used to report specific metrics via WebRTC getStats from RTCP reports that are calculated, generated and sent by the other device. That side may also have bugs in it (highly likely and even more)

A few things to remember here:

WebRTC is used by MANY inside browsers. Think billion(s) of people

It is adopted by thousands of applications developing directly and indirectly on top of it

Using statistics is standard practice to optimizing for media quality and most of the large WebRTC applications rely on it heavily already

Why should your application and use case be any different in trusting WebRTC getStats?

What can you do about WebRTC getStats changes?

Nothing.

That said, I do have a few suggestions for you:

Understand and assume that things will change, bugs will be found (and fixed), and that for the most part, getStats is a really powerful and useful tool
Test your application (and its stats) against the latest browser builds. This should include the upcoming beta and even the nightly builds if you’re up for it
Make sure your media servers and other components are up to date. Especially in the RTCP reports they spew. When in doubt, question their behavior before libwebrtc (remember that they also need to run after Google’s implementation of WebRTC in Chrome)
Subscribe and follow the WebRTC Insights. That’s where we flag such upcoming issues, among other things we cater for

The post Can I trust WebRTC getStats accuracy? appeared first on BlogGeek.me.

Can a native media engine beat WebRTC’s performance?

Mon, 02/06/2023 - 13:00

WebRTC is the best media engine out there. And it has nothing to do with its performance…

I’ve been part of the video conferencing industry throughout the first decade of the 21st century and a bit of the 2nd decade as well. The driving force at the time was resolution and frame rate. There was an arms race among vendors as to who provides higher resolutions and frame rates in their room system. A lot of the ethos at the time was the implementation of proprietary media engines that were built for the task at hand. Optimizing and fine tuning them for media quality was considered a core competency.

Fast forward to 2023, what should be the mindset and ethos today?

This is a kind of a continuation to my article on the WebRTC predictions for 2023

Table of contents

What is a media engine?
WebRTC (and libWebRTC) as a media engine
Can a media engine other than WebRTC perform better?
Advantages of native (and proprietary) media engines
Challenges of native (and proprietary) media engines
WebTransport, WebCodecs, WebAssembly
Why would I choose WebRTC as my media engine every day of the week?

What is a media engine?

In the context of VoIP and WebRTC, a media engine is a component that takes care of media processing. Simplifying it, a media engine implementation does something like this:

Capturing the raw data from the input devices (camera and microphone, but also the display)
Encoding that media and then sending it over the network (with WebRTC, that’s using SRTP)
Receiving the media from the network and then decoding it
Playing it back to the speakers and the display

The media engine also deals with improving voice and video – things such as echo cancellation, noise suppression, packet loss concealment, background blurring, etc.

WebRTC (and libWebRTC) as a media engine

One of the descriptions of WebRTC that I love is that WebRTC is a media engine with a JavaScript API on top.

Google’s implementation of WebRTC is libWebRTC. Originally, it came from its acquisition of GIPS (Global IP Solutions) – a company that licensed their proprietary media engine to VoIP developers. Google took that library, sprinkled the WebRTC API definition on top of it and integrated it with their Chrome browser.

10 years ago, there were other media engines as well. Most large vendors built and maintained their own media engine – especially if their market was video conferencing.

WebRTC, being a standard on both network and interface later, with libWebRTC being an open source implementation of it (that is maintained by Google AND integrated inside the most popular web browser) – became the best media engine out there practically overnight (or at least within 10 years and through a pandemic).

Joining a video call in your browser? Great! If you aren’t using Zoom, then 99.99% chance that what you are using is WebRTC, with the libWebRTC implementation.

Can a media engine other than WebRTC perform better? Made with Midjourney

Yes.

But what does that even mean?

What does performing better than WebRTC mean exactly?

If it supports HEVC. Is it better?
Let’s say it uses 10% less CPU. Is it better? How about 30% less memory consumption. That’s definitely much better
The video encoder compresses the same video input at 5% less bits with similar video quality. Is it better now?
It has more resilience to packet losses. It must be better!
Offering more voice codecs makes it better. Obviously…
…

libWebRTC isn’t the best media engine out there. At least not in that one (or more) parameters you’ve decided to compare it with your own proprietary alternative. But does it even matter?

Advantages of native (and proprietary) media engines

Building and maintaining your own native and proprietary media engine? Good for you! Lets’ see what advantages you gain by doing that:

You own and control your destiny
- The code is yours
- Along with it, the ability to modify it at will
Your application, your behavior
- libWebRTC is optimized for… well… nothing. Almost – it is optimized for Google’s own needs
- Your implementation of a media engine can be optimized to the exact needs, architecture, hardware and software that you use
Easy to differentiate
- You own the code. You modify it to your heart’s content
- This means that media specific capabilities can be unique and differentiated

Challenges of native (and proprietary) media engines

Now that we’re happy with building our own native and proprietary media engines, lets see what are our challenges:

Resources
- Developing and maintaining media engines is ridiculously expensive and time consuming
- There aren’t a lot of experienced media engine engineers out there waiting in line to be hired
Availability
- Where exactly is your media engine running? Windows?
  - Now we need it for Mac
  - Next week on iOS and Android
  - And on a gazillion of devices and chipsets
- Every new device permutation you need to support is a new headache to deal with and optimize for
- Did I mention it takes time and money to do that?
Browsers
- You’ve got your super perfect solution, but what happens the moment your customers want to be able to use it in a browser?
- That’s when you need WebRTC…
- And for that, you need to gateway and interoperate between your own media engine and the WebRTC implementation found in browsers
- In most cases, doing that will degrade the media experience AND remove most of your proprietary differentiated features

WebTransport, WebCodecs, WebAssembly

We’re in the 3rd year of the WebRTC unbundling trend. This is still early days.

WebAssembly is here. It is powerful. And it is used more and more, with ever increasing usefulness.

WebTransport and WebCodecs are still great experiments – usable mostly for proof of concepts or early implementations. Using these to power a full fledged media engine that doesn’t make use of WebRTC is still a challenge.

Not all browsers support these interfaces, and those that do still have instabilities and a lot of optimization work to pore into them.

Using these is a long term investment that won’t offer a usable solution for 2023.

Why would I choose WebRTC as my media engine every day of the week?

Going to use your own native and proprietary media engine implementation? Good for you!

But do you need browser support in your application? Are these 5% of the user base or interactions or is it more like 50% or more?

Are you looking to make use of open source media servers and components? If so, then are these available for your proprietary implementation or will it be easier to just use ones that support… WebRTC!

Assuming you need browser support for your application and that said browser support isn’t there just as another unused feature to win a customer deal (and then lay forgotten somewhere), then you should just use WebRTC.

Why?

Because at the end of the day, that’s what browsers have available for you.

The post Can a native media engine beat WebRTC’s performance? appeared first on BlogGeek.me.

WebRTC predictions for 2023

Mon, 01/23/2023 - 13:00

Here are the WebRTC predictions and trends you should expect in 2023. It is more of the same, but with nuanced differences.

As we’re starting 2023, it is time to look back and then into the future, to understand where we are and where we are headed with WebRTC. This year, things are getting somewhat trickier here:

WebRTC is a done deal. It is here to stay and there are no questions about the need to use it
We’re in a global recession (or about to be in one)
The pandemic is over, but rearing its ugly head in China, just when the Chinese government decided to open up everything
A new toy just came out (generative AI) with a technology paradigm shift that will affect everyone and everything

Oh, and did I mention that I changed a lot in my own work-life? I am now Chief Product Officer at Spearline, dealing with the larger picture of testing and monitoring communication networks. Life is full of surprises

There’s lots to cover, so let’s start.

Table of contents

Our WebRTC map
The state of WebRTC open source
CPaaS and WebRTC
How did I do with my 2022 WebRTC predictions?
- Hitting the nail
- Missing miserably
WebRTC predictions for 2023
Preparing for a rocky year

Our WebRTC map

Before I dive into the predictions, it is important to know where we stand. We’ll do this by looking at 3 different layers:

WebRTC the technology
Open source in WebRTC
CPaaS and WebRTC

Let’s start with the technology itself

The era of differentiation

We are well into the era of differentiation:

This started with Google unbundling WebRTC in the browser, starting to offer pieces of it as separate future W3C standards as well as opening up more access to lower levels of the stack. In the past year we’ve seen growing use of these capabilities outside of Google and experimentation and in production.

2021 brought with it background blurring and replacement in the browser to the masses.

In 2022 we’ve seen proprietary codecs and noise suppression finding a solid home in WebRTC applications and technologies using these capabilities. Representative commercial examples of this are Dolby Voice proprietary codec and Twilio’s Krisp partnership on noise cancellation.

If this is hinting on anything, it is that we’re going to see more of these moving forward, as vendors try to differentiate further. The only thing slowing this trend down is the current market recession.

Peak WebRTC

The pandemic that has raised all boats is all but over.

China is opening up, with or without another COVID wave. Many have shifted to hybrid work. Others are now communicating via video sessions a lot more than they used to.

Zoom is seen as the poster child of the pandemic. If you overlay its stock price with WebRTC usage in Chrome, you get this interesting chart:

WebRTC is still 3-4 times bigger in use than it used to be prior to the pandemic. That said, throughout 2022 we’ve seen consistent decrease in use of WebRTC. This is likely to continue into 2023.

My guess/prediction is that we will stay at around 3 times the use we had at the beginning of 2020.

libWebRTC dominance

libWebRTC is still king of the hill when it comes to WebRTC client-side implementations.

Nothing comes close to it.

libWebRTC is Google’s implementation of WebRTC, and the one used across all browsers today. A monoculture.

For most projects, using libWebRTC as a starting point for a non-browser implementation is the way to go. In some niche use cases, other solutions can and should be considered. The main alternative in such cases is probably Pion today.

2022 has been mostly a year of optimizations and polishing for the libWebRTC implementation, continuing on Google’s focus in 2021. 2023 will look no different.

WebRTC Insights clients received an analysis of the contributors to the libWebRTC project throughout history as part of a recent issue tracker sent to them.

Lets try a quick Q&A here on libWebRTC:

Is there a competitive alternative to libWebRTC in WebRTC?

The most popular WebRTC implementation out there is libWebRTC.

It is also the most dominant since it got embedded in all modern browsers.

libWebRTC is well maintained and is undergoing consistent improvements and optimizations. No other WebRTC stack is getting the same level of investment.

This is not expected to change in the foreseeable future.

Why is Google investing in libWebRTC?

This isn’t about Google Meet. Google is monetizing the web via ads delivered on search conducted in browsers and smartphones. By placing more of our activities in browsers and on the web, Google can monetize more interactions – indirectly.

Then there’s Google Meet/Workspace, competing with Microsoft Office on enterprise productivity.

Commoditizing communications is Google’s way of managing complementary technologies. Ben Thomspon in his latest analysis of AI and the Big Five refers to Joel Spolsky’s Strategy Letter V which offers a great explanation for both Google’s approach and is a good segway to our next section on open source:

Open source is not exempt from the laws of gravity or economics. […] something is still going on which very few people in the open source world really understand: a lot of very large public companies, with responsibilities to maximize shareholder value, are investing a lot of money in supporting open source software, usually by paying large teams of programmers to work on it. And that’s what the principle of complements explains.

Once again: demand for a product increases when the price of its complements decreases. In general, a company’s strategic interest is going to be to get the price of their complements as low as possible. The lowest theoretically sustainable price would be the “commodity price” — the price that arises when you have a bunch of competitors offering indistinguishable goods. So:

Smart companies try to commoditize their products’ complements.

The state of WebRTC open source

Not much has changed since my analysis a year ago on WebRTC trends in 2022, where I looked at WebRTC open source projects.

Kurento is still dead
Janus is great, in the same way it were a year ago
Jitsi is still pushing on group meeting features
mediasoup is a solid alternative. Its founders and lead developers who worked at Around now work at Miro, who acquired Around
Pion is still growing in adoption and use

Unsurprisingly, Janus, Jitsi, mediasoup and Pion still reserve most of their founders and key figures. These are teams/individuals who are personally and emotionally invested in these projects, which is a good thing.

The challenge is that besides Janus, none of them offer any official support and custom development. For the rest, companies need to rely on in-house development or external outsourcing vendors and freelancers.

As this state hasn’t changed for a good few years, not much is expected to change in 2023.

The main difference or question mark can be put on the projects that are now indirectly owned by a business whose focus might be elsewhere:

Jitsi – Jitsi was acquired by Atlassian and then 8×8. 8×8 has its focus in UCaaS, CCaaS and CPaaS. Jitsi as a Service has been released and is promoted by 8×8. But what about its open source project? How much would 8×8 be willing to invest in the open source project in 2023?
mediasoup – the mediasoup founders are used to having a “day job”. Yesterday it was Around. Today it is Miro. Tomorrow – who knows? Is that going to affect the mediasoup project in 2023? Probably not, but the recession might have different plans for this project
Pion – Pion was created by Sean DuBois, who has an infectious enthusiasm towards it and towards easy accessibility of the WebRTC technology. This will probably continue moving forward
Janus – Janus is maintained by Meetecho, a company embedded in open source and providing services around them. The current state of the market is unlikely to change their focus and trajectory

CPaaS and WebRTC

The CPaaS landscape is changing and shifting where it comes to WebRTC.

We started seeing these shifts a couple of years ago, but it seems that change is accelerating in this space – something that is different from what is happening with WebRTC open source.

The perceived leaders in WebRTC CPaaS are still Twilio, Vonage and Agora. I have a feeling that by the end of 2023 this will change.

Let’s review the who’s who of WebRTC in CPaaS.

Twilio

No CPaaS list is complete without Twilio. I’ll obviously start with them.

Twilio is continuing their trend from last year of going after the Customer Experience Platform market.

There was one big change that took place in 2022, where Twilio announced focusing on 4 pillars, instead of spreading all over. This was conveyed in Jeff Lawson’s open letter laying off 11% of their workforce. These focus areas are:

“Investing in our platform reliability and trust” scale, security, optimization, …
Increasing the profitability of messaging SMS and social messaging
Accelerating Segment adoption CDP (Customer Data Platform)
Scaling the Flex customer base CCaaS (contact centers)

No word about WebRTC. Definitely no video in here.

The opposite has happened – Twilio Live, announced in 2021, is being shut down:

Interestingly, its migration guide is recommending Mux, a vendor that just launched a WebRTC video offering as well. Should Twilio customers using Programmable Video also migrate that part to Mux? One wonders

Vonage

Vonage has its hands full with Ericsson who acquired them.

Not much has changed on their platform besides the introduction of background blurring and replacement.

As the honeymoon between Vonage and Ericsson will dissipate, along with the realization of a recession, it will be interesting to see what will happen to the Vonage Video APIs – will the level of investment there remain high or will it shrink?

Agora

Agora’s stock tanked since its peak:

Our information there is more limited than that of Zoom simply because the Agora IPO took place only in 2020.

It got into a recent mud fight with Zoom over the quality of experience that their respective platforms offer.

Zoom

Zoom opted to go with the unbundled approach, using WebRTC only sparsely. For video, they are especially focused on building their own media stack replacing most of what WebRTC does. In the short term, such an approach isn’t too productive. Longer run, who knows?

Zoom and APIs and CPaaS is a long affair by now. One which hasn’t worked out well enough for Zoom. Their browser story wasn’t tight enough until recently. This got them to go head to head with competition and commission a performance report pitting their Zoom Video SDK versus Vonage Video API, Agora, Twilio Programmable Video and Amazon Chime SDK.

This specific post is telling:

Zoom is looking to publicize its existence as a video CPaaS vendor. Their market penetration here is smaller than the bigger video CPaaS vendors at the moment. This performance report is their assurance to potential customers that they are competitive in this market
Amazon is gaining ground. Zoom decided to add them in because they are now competitive and relevant in this market. The Amazon Chime SDK has penetrated the mindshare of developers and competitors (like Zoom) are noticing

Microsoft

IaaS gone video CPaaS. That was in 2020. Both Microsoft Azure and Amazon AWS introduced their own video APIs.

Microsoft had the better story: Azure Communication Services. Uses the same infrastructure as Microsoft Teams. Being able (in the longer run) to connect directly to Microsoft Teams calls.

The network effect and infrastructure were always in their favor. That said, it doesn’t appear enough in discussions I have with developers building WebRTC applications.

There’s a lot of untapped potential here.

Amazon

I am starting to see the Amazon Chime SDK in more places. It seems that like Amazon Connect, after 3 years of being out there, it is getting the critical mass it needs to become “a thing” in the industry.

This is one to watch closely, especially if you are a video API vendor yourself…

Cloudflare (new entrants)

There’s another IaaS vendor who is joining the party of Video APIs – Cloudflare.

Cloudflare started in 2021 with a managed TURN service. One that is still in private beta.

But they announced and launched on September 2023 two additional services:

Cloudflare Stream – WebRTC-based live streaming
Cloudflare Calls – WebRTC video group calls

Both API offerings that are well-defined these days in the Video API or WebRTC CPaaS space.

Hopefully, they’ll move faster with these two than they had with their managed TURN service.

Mux (new entrants)

Mux, a vendor who focused on video delivery via APIs has joined the WebRTC market as well, offering their own Video APIs – Mux Real-Time Video. This is an interesting take, especially since their target audience is slightly different than that of developers who end up with CPaaS. It brings a fresh look and interpretation of the problem – just like the IaaS vendors and Zoom are.

The interesting part is that Twilio decided to refer their Twilio Live customers to Mux. If I were Mux, I’d mark every customer coming in from Twilio Live, making sure they get the best experience and support so that 6 months from now I can start talking to them about migrating away from Twilio Programmable Video.

SaaS as CPaaS, Embeddable & Prebuilt An embeddable video call, courtesy of DALL-E

Then there’s the lowcode/nocode trend and how it manifests itself in CPaaS. I’ve written an ebook about it – Lowcode & Nocode in Communication APIs (sponsored by Daily, a known CPaaS vendor). In the past two years we’ve seen more and more CPaaS vendors offering lowcode and nocode solutions on top of their video APIs.

To that specific market/solution, we are seeing SaaS vendors heading as well – for some reason, everyone thinks that CPaaS is a great business.

The notable examples here are Whereby, a meetings platform that started offering Whereby Embedded, and Digital Samba, who started from a webinars platform and is now offering Digital Samba Embedded.

This part of the market will continue to evolve, with CPaaS vendors and others offering ever higher layers of abstraction.

How did I do with my 2022 WebRTC predictions?

We’re done with the market overview. Time to move on to predictions.

I’ll start by looking at how I fared with my 2022 predictions of the upcoming trends…

This was a hit and miss thing (obviously).

Hitting the nail

There were three trends that I was spot-on.

#1 – Scale & performance

My bet at the time was that we will continue to see a continuation in improving scale and performance of WebRTC. This was definitely the case for 2022.

At the Kranky Geek event in November 2022, Google in their WebRTC annual update spent the time on quite a few items, but the first one of them was performance optimizations:

We will review this slide a few more times later on.

#2 – #newtech

This is the new technology trend, which was split a bit internally:

WebAssembly – WebAssembly is now part and parcel of most dominant WebRTC applications out there. This is achieved today by background blurring/replacement and noise suppression.
WebTransport, WebCodecs – we’ve seen more of this, but mostly in the experimentation phase. Not much going in actual production (besides maybe Zoom)
AV1 – still an ongoing effort. We’re not there yet, but getting closer

#4 – Live streaming

Live streaming continued to evolve in 2022:

Cloudflare joining the fray of vendors offering solutions to it
Daily scaled up their live streaming to support 15,000 viewers
WHIP and WHEP standardization for… live streaming with WebRTC. A thing with a growing ecosystem. More on that in this Kranky Geek session on WHIP & WHEP

Missing miserably

This is where I got it wrong.

#3 – WebRTC infrastructure, hyperscaling and SD-WAN

Here, I thought we’ll still ponder if Anycast and SD-WAN are important to WebRTC.

And then Subspace got shut down, and with it, a lot of the effort to push this story forward. It is sad, because I do think that striving to lower latencies and clearer networks is the way to go. This setback will delay such attempts by a few years.

#5 – 2D to Metaverse

Extremes and experiments to counter Zoom fatigue. I don’t think that that many new alternatives and suggestions were made in 2022 that we haven’t seen before.

Cloud media processing

This is something I haven’t seen coming. It can’t be considered a trend yet, but it is something to keep a close eye on.

The whole point of using SFUs in WebRTC is in order to reduce infrastructure costs in compute.

BUT…

Google started with doing noise suppression in the cloud for Google Meet a few years back. This means decoding and encoding audio in the cloud in an SFU architecture.

And now Google is doing the same for background replacement on low-end devices

Is that a one-time transitional thing, or will others follow suit?

WebRTC predictions for 2023

Time to look at my predictions for 2023. This is where I think we will see the most focus in WebRTC this year, and how it will shape up.

#1 – libWebRTC (and the future of WebRTC)

In libWebRTC we will see more of the same, with a few nuances.

Google’s WebRTC library is mature. It has all the bells and whistles expected of it. Here’s where we will see Google taking libWebRTC:

House cleaning. Cleaning up unused code (we’ve seen this with the recent and ongoing changes to the stats objects). Getting it ever closer to be spec-compliant. These are all things you do when you have time and no large fires to quell
Squeezing the optimization lemon. Doing more with less. Improving performance in CPU and memory use. Improving the algorithms used for bandwidth estimation, echo cancellation, etc.
Polishing collaboration. We’ve seen this take place in 2022. It will continue into 2023. Google will look for opportunities to introduce additional APIs and configurations to make collaboration easier and polished in WebRTC. Check out how you can share a Google Doc in a Google Meet or a Google Meet in a Google Doc for examples of where and why this is taking place

libWebRTC will maintain its leading and dominant position as the WebRTC stack of choice for client-side development. And Google will take it wherever THEY need it.

#2 – Machine learning and media processing

WebAssembly will continue to be a driving force in 2023 when it comes to WebRTC.

It will be used for media processing and in relatively the same places we see it used and experimented today – background replacement, noise suppression and proprietary codecs implementations.

We will also see it enabling more vendors to leave the peer connection implementations in WebRTC and play around with media engines developed using WebAssembly and running on top of WebRTC data channels or WebTransport.

#3 – Voice before video (Lyra first, AV1 later)

This one is a bit of an overreach, but one I am willing to make.

Lyra, Google’s ML-based voice codec, will find its way into WebRTC before AV1 will. This isn’t in terms of availability, but in terms of adoption and popularity of use.

AV1 takes up too much CPU power and memory. This makes it usable only in high-end devices or devices with newer hardware (which is almost non-existent still). We have ways to go until AV1 can become a reality. Probably one or two more years.

Lyra is here. And it is improving in performance and quality. Microsoft’s Satin is breathing down Google’s neck. Something will have to happen here. And my bet is that this will happen in 2023.

The technology is most probably ready. The market is ready.

You can learn more about it from Phillip Hancke’s session about voice codecs in WebRTC at the recent Kranky Geek event.

#4 – Observability

You can say I am biased. So be it.

Observability was always a real challenge with WebRTC applications. Its nature, due to many reasons (one of them being encryption), makes it hard to monitor using legacy tools and methodologies.

What we will see in 2023 is more interest in observability. We have more products in the market that use WebRTC. Contact centers are moving to the cloud. Many of the bigger vendors are in the process of shifting focus from SIP to WebRTC in their current deployments, and not just as a feature in their checklist.

This will bring with it the need for better tools to understand and figure out how WebRTC sessions behave – both in pre-production and in production.

And now it is time for some shameless self-promotion here –

Watch my session from Kranky Geek, where I discuss on where observability of WebRTC statistics fall short (hint: troubleshooting)

Don’t forget to check out the WebRTC products we have at Spearline

#5 – M&As and shutdowns

This is an easy one to make in 2023.

We’re in recession. It will get better by December. It will get worse and stay with us. Whoever is correct in his estimate at what will happen a year from now, one thing is quite apparent:

Companies are closing their pockets, downsizing and keeping to their core focus.

WebRTC is part of it, and as a relatively new technology, it might be hurt more than others. I don’t think this will be the case, simply because we’re also in transition towards hybrid work due to the pandemic we faced. These two will negate each other a bit.

The end though will be house cleaning of the industry itself:

Some vendors will not weather well and will shut down this year. Their technology might even be solid, but not reaching product-market-fit or just missing to execute on a solid business plan will get them there faster
Others will find their solution by being acquired. We’ve seen quite a few acquisitions in 2022. We will see more in 2023

This in itself puts a strain on developers who need to choose which CPaaS vendor to use – picking the wrong one may lead them stranded with the need to switch (think Twilio Live). They will go to the bigger, more known vendors. Which will lead to a vicious cycle since the smaller vendors may not have the time to grow quickly enough – potential customers will be less willing to risk using them.

Preparing for a rocky year Rendered using Midjourney

Interesting times ahead.

2023 will shape up to be challenging.

On one hand, we have more of the same in a lot of areas. On the other hand, the current market state is causing a lot of instabilities that will cause some shifts in the market.

And that, without saying a word about generative AI and what that might mean to the market of WebRTC and communications moving forward.

The post WebRTC predictions for 2023 appeared first on BlogGeek.me.

WebRTC course home assignments are here

Mon, 01/09/2023 - 13:00

Home assignments are coming to the next round of office hours for my WebRTC training courses for developers.

Around 6 years ago I launched the first WebRTC course here. Since then, that grew into its own separate website and multiple courses and bundles.

Next month, another round of office hours is about to begin. In each such round, there are live sessions where I teach something about WebRTC and then open the floor for general questions. That’s on top of all the recorded lessons, the chat widget and slack channel that are available.

In this round (starting February 6), I am experimenting with something new. This time, I will be adding home assignments…

The dynamics of office hours

The office hours are 10-12 lessons that take place on a weekly cadence at two separate time zones, to fit everyone.

In each I pick and choose a topic that is commonly discussed and try to untangle it from a slightly different angle than what you’ll be finding in the course itself. I then let people ask questions.

The office hours are semi-private. Usually with 2-6 participants each time. This gives the ability to really ask the questions you care about and need to deal with in your own WebRTC application.

Why home assignments?

As part of my new role as the Chief Product Officer at Spearline, I asked to enroll in a course – CPO Bootcamp (the best one if you’re in Israel). It is grueling as hell but more importantly – highly useful and actionable.

One of the components in that bootcamp is home assignment. They are given every week, then they get checked and feedback is given. They make me think about the things I am doing at Spearline and how to improve and finetune our roadmap and strategy. I even share them with my own team – being able to delegate is great, but it is more about the shared brainpower.

As with anything else, when I see something that is so good, I try to figure out if and where I can make use of that idea.

Which brings me to the WebRTC courses home assignments.

Home assignments = implementation AND feedback

For me, home assignments fit the best as part of the office hours.

Here’s what we’re going to do:

You come to the office hours
I share a topic related to WebRTC. In this round, the focus will be on requirements and architecture and design – and the planning of it all
Then, I will present the home assignment for the given round
You will have time until the following office hour to write down the assignment and submit it – in Google Docs or a Microsoft Word file
Once submitted, I’ll be reviewing and writing my feedback

The assignments relate and are focused on your WebRTC application. Not to something unrelated. Their purpose is to make you think, revisit and evaluate the things you’ve done and decided.

They are also building upon one another, each touching a different aspect of the design and architecture.

In a way, this is a unique opportunity to get another pair of eyes (mine) looking at your set of requirements, architecture and decisions and offering a different viewpoint.

Getting the most of the WebRTC courses

If you are planning to learn WebRTC, then now is the best time possible.

Those who have enrolled to the course in the last 12 months or have renewed their course subscription can join the office hours and take part in the home assignments.

Office hours will start February 6.

If you haven’t enrolled yet, then you should More information on how to enroll can be found on the WebRTC courses site.

The post WebRTC course home assignments are here appeared first on BlogGeek.me.

Kranky Geek WebRTC event summary 2022

Wed, 12/14/2022 - 08:53

Kranky Geek 2022 follows our tradition of great curated content on WebRTC that is both timely and timeless. Here’s what we had this year.

Kranky Geek is the main event focusing on WebRTC. I’ve been doing it with Chris Koehncke and Chad Hart for many years now, with the help and assistance of Google along with various sponsors each time.

Like many, we’ve switched to an all virtual event since the pandemic started, and decided at least for this year to continue in the same format. This turned out well, since I had to go on a business trip to Ireland at the date of the event, and virtual meant I was still able to both host and speak at the event.

Kranky Geek is quite a grueling experience for the hosts. We curate the sessions, at times approaching those we want to speak, at other times telling the speakers what topics we think will fit best. We go over the draft slide decks and comment on them. Doing dry runs on the week of the event with all speakers to make sure the session is top notch.

You won’t find much commercial content in a Kranky Geek event. What you will find is lots of best practices and suggestions based on the experience and the path taken by our great speakers.

To this year’s summary, Philipp Hancke did the commentary about the sessions themselves. If you are a WebRTC Insights subscriber, and would like to discuss the content and how it fits in your company, feel free to reach out to me to schedule a meeting.

If you are looking for the whole playlist, you can find it here. The videos have been embedded below to make it easier for you to watch.

Roundtable: The state of Open Source in WebRTC

Jitsi, Janus, mediasoup and Pion
Watch if you are using any of these projects

AI in Google Meet / Dan Gunnarsson, Google

Background blurring and light adjustment using MediaPipe.

ML powered background blur and light adjustment using MediaPipe
Performance in the browser is a challenge, as is model size
A lot of data and practical examples. Not an introduction to MediaPipe though
Watch after you built a background blur pipeline yourself and want to learn how to improve it

Performant Real Time Audio ML in the Browser / Arman Jivanyan, Krisp

Krisp SDK on the Web: noise suppression.

ML powered audio improvements on the Web
128 samples per frame suggest using WebAudio / Audio Worklets. Note that there’s a Chrome “L16 hack” which can deliver 10ms frames of “raw” audio
Watch when you consider building an audio processing pipeline yourself as well as after your first attempt

Making sense of WebRTC statistics / Tsahi Levent-Levi, Spearline/testRTC

Where I speak and Philipp comments (I am kinda subjective on this session).

WebRTC’s getStats API is tremendously powerful but making sense of the numbers is quite a challenge. Turning those numbers into something actionable is quite tricky
There are a lot of things that can go wrong and are actionable. The tooling testRTC has is quite valuable answering the common support questions. Troubleshooting is important but may depend on how much time you can spend on debugging a customer’s environment
Watch when you are stuck trying to troubleshoot a problem with just WebRTC statistics and need to take a look beyond them

WebRTC annual update 2022 / Google

A billion minutes every day. That sounds impressive. There is a but here however. In 2018 the number Google told was 2.5 billion per week or around 400 million per day (on weekdays and half on weekends?). That means a 2.5x growth in four years. With a pandemic in between. Meta said Whatsapp is doing 15 billion minutes per day… WebRTC in the browser remains small. One wonders what happened to their 100x usage (which was received minutes). We’re past peak usage
Some Insights into their roadmap. We have been tracking most of this in WebRTC Insights over the past year so no big surprises if you are a subscriber
Elad Alon provides a good overview of the improvements he did to screen sharing such as preferring tab sharing in the getDisplayMedia picker
Markus Handell talks about a lot of things like Metronome which, as a reader of WebRTC Insights may mean something to you. Great slide that shows the pipeline and where the improvements to individual components affect it
Harald Alvestrand gives a great summary of what new APIs are coming to WebRTC in general. Control about ICE candidates seems super interesting but we have not yet figured out what the field trials actually enabling

Compositing in the cloud with native pipelines / Pauli Ojala, Daily

Recording and compositing video sessions.

Great overview of the considerations you need to make when doing WebRTC recording
The goal is to get a view like in the browser but you cannot record in the browser, not even on the server
Watch if you are looking to build a recording feature and are interested in the requirements

WHIP and WHEP: Standardized Live Streaming with WebRTC / Sergio Garcia Murillo, Dolby

A great introduction to why streaming wants to have “standardized signaling”
WebRTC was built with JavaScript and the flexibility in mind but native applications need a bit more standardization. And let us not talk about hardware encoders
In many ways, WebRTC breaks the model of vendor’s ecosystem where media servers and device manufacturers of the past had to interoperate by having a single vendor take care of it all. For live streaming and broadcasting, this interoperability model is still very important
Watch if you are interested in streaming use-cases

Using Video Forward Error Correction to improve game streaming quality / Harsh Maniar, NVIDIA

FlexFEC and video.

NVIDIA uses video forward error correction for GeforceNOW, their game streaming service. The requirements for such a service are quite different from WebRTCs “talking heads”
Video forward error correction is a surprisingly obscure topic in WebRTC
Watch if you are interested in learning how FEC works and how to measure the improvements

Advances in audio codecs / Philipp Hancke

Audio remains the most important thing in conferencing – you need to understand what the other side is saying. This talk explains the history of Opus and how Lyra and Satin fit into the picture as well as how forward error correction and redundancy work
The famous “Opus comparison” picture is quite problematic when looked at in detail
Watch if you are interested in going beyond the usual Opus forward error correction mechanism available in WebRTC by default

Our Kranky Geek sponsors

It should be noted that without our sponsors, doing the Kranky Geek event would be impossible. When we set out to run these events, we had this in mind:

Content should be free for all
Participating in the event should be free or have a token cost associated with it
Content must be top notch. Timeless. With little to no sales pitches

This requires sponsors to help with funding it. Each year we search for sponsors and end up with a few that are willing and happy to participate in this project of ours.

This year?

Google, and especially the team working tirelessly on WebRTC
Daily, who operates a video API (CPaaS) platform with a strong lowcode/nocode focus
Krisp, with their AI solution to handle background voices, noises and echo
Spearline, and their testRTC products for WebRTC testing and monitoring

Check them out

A Kranky Geek 2023?

When we’re planning and preparing for the event, it feels like this is going to be our last event. It isn’t easy, and none of us in the Kranky Geek team are event planners by profession. The question arises after each such event – will we be doing another one?

Once this event was over, we started working on wrapping the event. Part of it was editing the content and uploading it to YouTube (which takes time).

Will there be another Kranky Geek event next year? Maybe

Will it be in person or virtual? Maybe

Until then, go check out our growing library of great WebRTC content: https://www.youtube.com/krankygeek

The post Kranky Geek WebRTC event summary 2022 appeared first on BlogGeek.me.

WebRTC: Privacy or Privacy? Which one shall it be?

Tue, 11/29/2022 - 12:30

WebRTC comes with mandatory encryption, which enables privacy, but which type of privacy are you really looking for?

DALL-E: a broken lock on a chest

In the past, all the great stuff started in the enterprise and then trickled down to consumers. Now it is the other way around – first features come to consumers and from there find their way to enterprises.

Privacy is no different, but in enterprises it needs to be defined quite differently, making it a totally different kind of a feature.

This is where privacy vs privacy comes to play.

Table of contents

Privacy: The consumer version
The enterprise version of privacy
WebRTC and privacy
Who cares?
CPaaS, Video API and… privacy
What next?

Privacy: The consumer version

As a user, what do you mean when you say privacy?

That the data you generate is yours. Be it sensor related data (think GPS or heart rate). The conversations you have with people are not accessible to anyone else. The same for the photos you take.

Practically, you want no one other than you and those you explicitly share data with to have any access to that data. And that includes the services you use to generate and share that data.

Sending messages over Whatsapp or any other social media service? You probably want these messages to be encrypted on the go, so no one can sniff the network and read your messages. You also don’t want Whatsapp’s employees reading what you wrote.

Essentially, what you are looking for is E2EE – End-to-End Encryption. This means that any intermediary along the route of your communications, including the communication provider himself who is facilitating the session, won’t have the ability to read the content. Simply because it is encrypted using some encryption key that is known only to those on the session.

The enterprise version of privacy

Life for a consumer is simple. At least when compared to an enterprise.

In the enterprise you want this privacy thingy, but somehow you also want governance and the creation of some corporate knowledge base.

When a meeting takes place. Should only the people in the meeting have access? Think about it. Should the people involved in that aspect of the business have access?

Let’s say we’re on a sales call with a customer. And then the sales rep on that call leaves and gets replaced with another one. Should the new sales rep have access to that call that took place and the decisions made in it?

Today, our CRM systems can connect directly to the corporate email and siphon any emails sent or received with certain customers into their account for recording and safekeeping. So we stay in sync with all conversations with that customer.

We may need to store certain conversations due to regulatory reasons. Or we might just want to transcribe them for later search – that internal company knowledge base repository.

There are also times when we’d like to use these conversations we’re having to improve performance. Similar to what Gong does to sales teams.

BUT

We don’t want others to have access to these meetings. In some cases, we don’t want the theoretical ability of the provider of the service to access these conversations – think of a Microsoft Teams session, Google Meet or a Zoom call that gets listened to by the employees of these companies.

Privacy in an enterprise looks different than for consumers. It is more granular and more structured, with different rules and permissions at different levels and layers.

WebRTC and privacy

Privacy is king in WebRTC, with a few caveats:

Only if you let it
Assuming you don’t screw it up
When it is of interest to you

Why these caveats?

Because WebRTC is just a building block – the actual solution is of your making. Which means you can screw it up by architecting or implementing it wrong
It also means that you want to have privacy as part of your service

And why is privacy king in WebRTC? Because security is ingrained in WebRTC, which means you can use it to provide privacy conscious services.

Lets go over what privacy in WebRTC actually means:

WebRTC mandatory encryption (and security)

In WebRTC, all media is encrypted. You can’t decide to send media “in the clear”. And then the signaling itself is also encouraged to be encrypted, and for all intent and purpose – it is encrypted as well.

This means that if you send audio or video via WebRTC from one user to another or from one user to a media server – then that media is encrypted and can be played only by the recipient.

Someone looking at the bitstream “over the line” won’t be able to play it back or intervene with the content.

Note here that a media server terminates the conversation here and is privy to what is being sent – it has access to the encryption keys. TURN servers don’t have such access.

This mechanism of encryption isn’t optional – it is just there.

E2EE in WebRTC

If we increase the scope to group conversations, then we need E2EE – End-to-End Encryption.

This can be achieved on top of WebRTC using a mechanism known as insertable streams, which ends up as double encryption – one between the sender and the media server. And one between the sender and the receivers on the other end. That second layer of encryption is part of the application. WebRTC doesn’t mandate it or even encourage it – it just enables you to implement it.

Deniability vs governance of communications in WebRTC

Here’s where things can get tricky with WebRTC – it can be used to cater for both ends of the equation.

You can use WebRTC to obtain deniability.

WebRTC has a data channel that runs peer to peer. Using signaling servers to open up such connections to create a loose mesh network of peers means you can send private, encrypted messages from one user to another on that network without having any easy way to trace the communications – let alone to trace its metadata. That’s on the extreme scale of what can be achieved with WebRTC – a TOR/bittorrent-like network.

With the same methodology, I can get two users or even small groups to communicate directly, so that their media travels between them and them alone. Or I can employ E2EE on media servers and get privacy of the content of the communications from the infrastructure used to facilitate it.

You can use WebRTC to handle governance.

On the other side of the equation, you can use WebRTC and force all communications to go through media servers. Media servers which can then enforce policy, record media and provide governance. For some industries and verticals – that’s a mandatory requirement.

And you get these capabilities while keeping the communication encrypted over the internet.

Who cares?

With privacy that’s the biggest question. Who cares?

No one and everyone at the same time.

If you ask a person if he wants privacy the immediate answer is – yes!

And yet… Twitter still doesn’t offer E2EE on DM messages. And people use it.

Whatsapp added E2EE in 2016, when it already had a billion monthly active users. It added E2EE backups in 2021. It seems people wanted it, but not in such high demand to switch to a more secure and private messaging system.

Here’s a screenshot from my own Whatsapp in one of the groups I have:

That weird message is an indication that a friend of mine has changed his security code. This usually means he re-installed Whatsapp or switched a phone I presume. I ignore these messages altogether, and I am assuming most people ignore these messages.

In the same way, companies want and look and strive for privacy and want the services they use to be private. But most of them want it up to a point.

Does that mean privacy isn’t needed? No.

Does it mean we shouldn’t strive for privacy? No.

It just means that people value other things just as much or even more.

CPaaS, Video API and… privacy

When it comes to video APIs and CPaaS platform, it feels that privacy is somewhat lagging behind.

Messaging platforms today mostly offer E2EE. UCaaS are and have been introducing E2EE to their chat services and video calls. Some are offering integration with third party KMS (Key Management Systems) so they don’t have access to the decryption keys to begin with.

CCaaS relies heavily on the telephony network, where, well, what privacy exactly? And they also like to record calls for “quality and training purposes” – which translates to using machine learning and providing governance.

Video CPaaS is somewhere in-between these days – it offers encryption on sessions because it uses WebRTC, which is encrypted by default. But anything going through the media server can usually be accessed by the Video APIs vendor itself. Very few have gone ahead and added E2EE capabilities as part of their solution.

The reasons for that? It is hard to offer E2EE, but it is even harder to offer it in a generic manner to fit multiple use cases. And on top of that, customers don’t necessarily care or will be willing to pay for it, while they will be willing to pay for features such as recording.

What next?

Here’s the thing:

Everybody talks about privacy but nobody does anything about it

In the consumer space, we are moving to an E2EE world.

The enterprise space is glacially pacing towards that same goal.

Parallel to that though, machine learning and cloud media processing are shifting the balance back towards less privacy – at least less privacy from the vendor hosting the service.

Which is more important to the buyers of services? Privacy or governance? Deniability or machine learning?

The post WebRTC: Privacy or Privacy? Which one shall it be? appeared first on BlogGeek.me.

Two years of WebRTC Insights

Mon, 11/07/2022 - 12:30

It is time to stop for a second and review what we’ve accomplished here with our WebRTC Insights in the past two years.

There are a few pet projects that I am doing with partners, and one of the prime partners in crime for me is Philipp Hancke. We’ve launched our successful WebRTC codelab and are now in the process of finalizing our second course together – Low-level WebRTC protocols.

Two years ago, we decided to start a service – WebRTC Insights – where we send out an email every two weeks about everything and anything that WebRTC developers need to be aware of. This includes bug reports, upcoming features, Chrome experiments, security issues and market trends.

All of this with the intent of empowering you and letting you focus on what is really important – your application. We take care of giving you the information you need quicker and in a form that is already processed.

Now, two years in, it is safe to say that this is a VERY useful tool for our subscribers.

“WebRTC insights might be the most important email you read every fortnight as a RTC / video engineer. It’s hard to keep tabs on what Google et al are doing with WebRTC while working on your product and the WebRTC Insights provides very specific and actionable items that help tremendously. We have been ahead countless times because of it. If you are serious about WebRTC you should definitely subscribe 100% worth it.”

— Saúl Ibarra Corretgé, Principal Software Engineer @ 8×8 (Jitsi)

How do we keep track of all the WebRTC changes?

Keeping track of all the changes in WebRTC is a pretty daunting task. Tsahi started WebRTC Weekly almost nine years ago and it has been the source of high-level information ever since. Philipp has closely worked with WebRTC at a more technical level for a decade too. We both had our routines for keeping notes and transforming them into something informative for our audience but joining forces (which we never expected after having strong arguments about whether XMPP was a great signaling protocol in the early days!) has yielded a surprising amount of synergy effects.

We start doing Insights with a template. Whenever we find something that we think is interesting we add a link and maybe a very brief comment to that template . Usually we chat about those too (as we have done for…. almost a decade now). Then we move on because both of us have day jobs that keep us busy.

Every two weeks we spend a couple of hours turning the “brain dump” into something that our audience understands. Philipp focuses on the technical bits while Tsahi focuses on the market. Then we review each other’s section, improve and exchange thoughts.

We did this before Insights already but putting a structure and a biweekly cadence to it has “professionalized” it. While it remains a side project for us, we now have the process in place.

WebRTC Insights by the numbers

We’re not new to this, as this is our second year, we might as well also compare the numbers today with those we’ve had on year one of WebRTC Insights:

26 Insights issued this year with 447 issues & bugs, 151 PSAs, 11 security vulnerabilities, 146 market insights all totalling 239 pages. We’ve grown on all metrics besides security vulnerabilities.

WebRTC is still ever changing, but at least there are less security threats in it

Activity on libWebRTC has cooled down a bit in the last two years when it comes to the number of commits and people working on it:

After more than a decade that is a sign of maturity, the easy changes have already been done and all that is left is optimizations. The numbers we see for Insights roughly correlate with the amount of energy Google puts into the project. We are just glad we did not start it during the “hot phase” of 2016-2019.

Let’s dive into the categories, along with a few new initiatives we’ve taken this year as part of our WebRTC Insights service.

Bugs

Among the really useful feedback we have received was the suggestion to add a “component” or area the issue is in. This is useful for larger teams where one person may be digesting the biweekly email and route this to a subteam with a particular focus such as audio, video or networking.

The other improvement is a visual hint whether a particular item is a bug, a regression, a feature or just something that is generally good to know:

In addition to that we classify it as “read, plan or act”. Of course we hope our subscribers read all the issues but some are more important than others.

PSAs & resources worth reading

Public service announcements or PSA are the main method Google’s WebRTC team uses to announce important changes on the discuss-webrtc mailing list. We track them and give some context why they are important or whether they are safe to ignore (which can happen for API changes where a PSA may be required by the release process.

We also look at important W3C changes in this section as well as other content that is too technical for the “market watch” section.

Experiments in WebRTC

Chrome’s field trials for WebRTC are a good indicator of what large changes are rolling out which either carry some risk of subtle breaks or need A/B experimentation. Sometimes, those trials may explain behavior that only reproduces on some machines but not on others. We track the information from the chrome://version page over time which gives us a pretty good picture on what is going on:

In this example we saw the AV1 decoder switch from libaom to libdav1d over the course of several weeks.

WebRTC security alerts

This year we continued keeping track of WebRTC related CVEs in Chrome (totaling 11 new ones in the past year). For each one, we determine whether they only affect Chromium or when they affect native WebRTC and need to be cherry-picked to your own fork of libwebrtc when you use it that way.

To make it easier to track, we now keep a separate Security Tracker file that gets updated with new issues as they are found. This makes it easier to glance at all the security issues we’ve collected.

On top of that, when there’s a popular open source component that has its own security issues published, we tend to also indicate these, though not add them to the Security Tracker, so they aren’t even counted in our statistics.

WebRTC market guidance

Information overload. That’s what all of us face these days with so much material that is out there on the Internet. On our end, we read a lot and try to make sense of it.

Part of that is taking what feels relevant to WebRTC and sharing it with our WebRTC Insights subscribers. It includes the reference to the article, along with our thoughts about it.

For product managers, this is their bread and butter in gleaning the bits and pieces of information they need to make educated decisions about roadmap and priorities.

For developers, this brings a bit more context than they are used for in their daily work – and is often outside of their immediate work and expertise.

Our purpose? Enrich your world about WebRTC and express some of the power plays and the shifts in the market that are taking place. So you know them well ahead of them happening in force.

Covering important events

We really enjoyed Meta’s RTC@scale event. In terms of quality and technical depth it set a bar for the upcoming KrankyGeek event which had been the gold standard so far.

However, the technical depth of the event was too intense for it to be digested in real-time. This meant Philipp sat down on a rainy Saturday and started rewatching the videos while keeping notes. And ended up watching each session multiple times since there were so many great points that needed or even demanded a bit more explanation. This turned into a nine page summary of the event, annotated with the timestamps in the video.

We decided to make this summary public because, while we thought it provided a ton of valuable lessons to our subscribers. Meta made the content freely available and so should we. And hey, we keep referencing this every other week.

This may have been a one-off but we still genuinely enjoyed it so might repeat the exercise… on a rainy saturday!

WebRTC release notes interpretation

We started playing around with video release notes at the end of our first year, and quickly made it a part of the WebRTC Insights service.

Whenever Google publishes a release notes for WebRTC, we publish our own video with a quick analysis of the release notes (and the release itself) for our Insights clients.

We go over the release answering 4 main questions:

Is the new release more about features or stability?
What are the things developers should investigate in the new release?
Which bugs and features in the new release should developers beware from?
What can be disregarded and ignored in the release?

Our intent here, as with anything else, is to reduce the amount of work our clients have to do figuring out WebRTC details.

We are also making these release notes videos publicly available, 3-4 versions back, so you can derive value from them. You can find them on YouTube:

https://www.youtube.com/watch?v=DQt_OQT4ZAo&list=PL7fuFATIj-PUtMVTQKpW_odTCO0_CPfXV

Be sure to subscribe to receive them once they get published freely to everyone.

Join the WebRTC experts

We are now headed into our third year of WebRTC Insights.

Our number of subscribers is growing. If you’ve got to this point, then the only question to ask is why aren’t you already subscribed to the WebTRC Insights if WebRTC interests you so much?

You can read more about the available plans for WebRTC Insights and if you have any questions – just contact Tsahi.

Oh – and you shouldn’t take only our word for how great WebRTC Insights – just see what our readers have to say about it:

“For any Service Provider or Apps who heavily relies on WebRTC, the WebRTC Insights offers great value. […] What I like most about the Insights is its bi-weekly cadence, which fits the rapid Chrome/WebRTC release cycle, and most of the mentions are actionable for us. With the recent Safari audio breakage, the Insights highlighted the problem timely and saved us a lot of troubleshooting effort.”

— Jim Fan, Engineering Director @ Dolby Laboratories

“As a service company specialized in WebRTC I think WebRTC Insights is really useful. It keeps us up to date about what is coming next, giving good ideas for projects and research. Also, receiving periodic insights is always a good excuse to stop what I am doing and find some time to go over the latest WebRTC updates in more detail. It is much easier to do when you get all summarized in a single document than on your own just googling and going through an overwhelming list of webrtc news, updates and bugs.”

— Alberto Gonzalez Trastoy, CTO @ WebRTC.ventures

Here’s the summary of the first year of Insights if you’re interested

The post Two years of WebRTC Insights appeared first on BlogGeek.me.

The lead actors in WebRTC are outside of your control

Mon, 10/31/2022 - 12:30

When developing with WebRTC, make sure you address the fact that many aspects are out of your control.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

When you develop a WebRTC application, you need to take into consideration the sad truth that most of the things that are going to affect the media quality (and by extension the user experience) are out of your control.

To understand this, we first need to define who the lead actors are:

The main entities in WebRTC applications, taken from my presentation on testRTC Your application

This is probably the only piece you do control in a WebRTC application.

The code and logic you write in the application has immediate effect over the media quality and connectivity.

Deciding on how group calls are architected for example –

Do you create a mesh network where everyone talks to everyone directly?
Do you use a central mixer (MCU) to mix all media content to generate a single stream for each participant?
Do you route the media using an SFU?
What configuration limits do you impose on the streams on the get go?
What kind of layout do you use to display the participants?

All these are going to greatly change the experience and it is all up to you to decide.

The browsers

Web browsers are out of your control.

I’ll repeat that for effect:

Web browsers are out of your control.

You can’t call Google asking them to delay their Chrome release by a week so you can solve a critical bug you saw cropping up in their upcoming release. It. doesn’t. work. this. way.

Browsers have their own release cadence, and it is brutal. In many cases, it is way faster than what you are going to be able to manage – a release every month.

The problem isn’t with this fast pace. It is with the fact that now, even after over 10 years since its announcement, WebRTC is still getting changed and improved quite frequently:

Some of these changes are optimizations
Others are bug fixes
A few are behavioral changes
Then there are the API changes to make the browser work closer to how the WebRTC specification states

All of these changes mean that your application might break when a new browser version goes out to your users. And as we said, you don’t control the roadmap or release schedule of the browser vendors.

The network

You decide where to place your servers. But you don’t get to decide what networks your users will be on.

I often get into talks with vendors who explain to me the weird places where they find their end users:

In an elevator
Sitting in basements
Driving a car on the highway
At the beach
In the library
…

I am writing this article while sitting in the lobby of a dance studio on my laptop, tethered via WiFi to my smartphone’s cellular network (a long story). Users can be found in the most unexpected places and still want to get decent user experience.

WebRTC being so sensitive to the network connection (think latency, jitter, packet loss and bandwidth), these are things you’ll need to come to terms with.

In some cases, you can instruct your users to improve their connection. In others you can only guide them. In others still your best bet is to make do with what the user has.

Oh – and did I mention that the network’s conditions are… dynamic? They tend to change throughout the duration of the session the users are on, so whatever you decide to do needs to accommodate for such changes.

The user’s device

Is your user running on a supercomputer? Or a 2010 smartphone? Do you think that’s going to make a difference in how they experience your WebRTC sessions?

WebRTC is a resource hog. It requires lots of CPU power to encode and decode media. Memory for the same purpose. It takes up bandwidth.

Your users don’t care about all that. They just want to have a decent experience. Which means you will need to accommodate for a vastly different range of devices. This leads to different application logic that gets selected based not only due to the network conditions, but also based on the performance of each and every user’s device – without sacrificing the experience for others.

Sounds simple? It is. Until you need to implement it.

How do you take back control of WebRTC?

First step to gain control of your WebRTC application and its lead actors is by letting go.

Understand that you are not in control. And then embrace it and figure out how to make that into an advantage – after all – everyone is feeling these same pains.

Embracing them means for example:

Testing on beta and dev releases of browsers, so that you’re more prepared for what’s to come
Making sure you can upgrade your infrastructure and application at a moment’s notice with urgent patches
Monitoring everything so you can understand user experience and behavior
Place your servers closer to your users
Optimize your application to work with different devices and networks within the same session
Check for dynamic network changes and how that affects your service
…

Need help?

The WebRTC Developer training courses touch a lot of these issues while teaching you about WebRTC

My WebRTC Scaling eBooks Bundle can assist you in figuring out some of the tools available to you when dealing with networks and devices

This blog is chock full with resources and articles that deal with these things. You just need to search for it and read

The post The lead actors in WebRTC are outside of your control appeared first on BlogGeek.me.

WebRTC turns services into features

Mon, 10/17/2022 - 12:30

Telephony and communications used to be services, but WebRTC has turned them into features inside other services.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

Telephony and communication used to be services.

Need a phone system for your company? Go to your carrier, and they’ll set you up with a solution. Maybe install a PBX or even host one “in the cloud” for you.

The thing is, what you got was a full fledged service from a communication vendor. You had your own service, and your phone service. They were unlikely to be really connected to each other.

Then came along CPaaS vendors and communication APIs. You could purchase phone numbers and automate and route them anyway you wanted programmatically. In some ways, you could connect it to your own service logic, but only to a certain degree. If you had a call center agent who needed to answer a phone, you had to give him a physical phone (or install a softphone application for him), as well as the CRM application he interacted with in order to assist people calling in.

Two separate services.

Communications and Customer Relation Management (CRM).

WebRTC changes all that

Since WebRTC runs inside a browser, it lets you place the communication part right there in the browser, where all your other services live already.

It means that now that “telephony service” you had can be added as a feature inside your CRM.

But it gets better.

It doesn’t need to be a CRM you integrate it with. It can be a dating service. A gaming experience where you need to communicate with others during the game. A doctor visit at the virtual clinic. Remotely driving a car. Gambling online. The list goes on.

The main attraction isn’t the communication, but rather the service you are there to use. It so happens to need the ability to communicate using voice or video in real time, but that’s just a detail – a feature – no longer the service itself.

And that’s the real paradigm shift that WebRTC has brought with it.

My free WebRTC for Business People report is a great place to dive deeper into what WebRTC is and the changes it brings with it – the ecosystem around it and what companies are doing with it. Check it out.

The post WebRTC turns services into features appeared first on BlogGeek.me.

WebRTC is the most secure VoIP protocol

Mon, 10/03/2022 - 12:30

WebRTC security and privacy are top of mind. You won’t find any other open standard VoIP protocol as secure as WebRTC.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

Time for a quick security check…

Here are some concepts that are true when it comes to security, privacy and WebRTC:

Security often requires sacrificing privacy
Privacy often requires sacrificing security
WebRTC is an attempt to balance the two, and let the application developers figure out which one their focus is going to be on – without sacrificing either security or privacy more than is needed in the process

But what does that exactly mean?

You remember that WebRTC is only a building block. Right? This means that it can’t offer full privacy or full security, since there’s an application developer on top, who can… well… screw things up.

If your developers don’t think about the security and privacy necessary, then your WebRTC application will look like this:

But if they do think about it (and they should, no matter what they are developing), then you should have security and privacy nailed down properly.

What WebRTC gives you when it comes to security and privacy?

Encryption at transit
- Traffic is always encrypted between one WebRTC entity and another
- It is up to you to figure out how to maintain it if you need to – for example, using media servers likely means media is available in the clear on the media server
Short development cycles
- WebRTC has a new version released every month – because that’s the release cadence of Chrome
- It means the client code on the browser can be refreshed and updated frequently, which makes patching up security issues easier on that front
- You will need to figure out how your own release cadence for your native clients and your server infrastructure, especially when it comes to security patches
Open implementation
- This means people can scrutinize the actual protocol and its implementation
- Over time, this leads to more secure solution, as more eyeballs can review what’s going on
- You can learn more about open vs closed security here
Shoulder of giants
- Google Chrome, Apple Safari, Microsoft Edge, Mozilla Firefox
- Together they power most of our browsable internet
- They do it at scale, and securely (for the most part)
- All of them integrated WebRTC into their browsers, and they adhere to high security standards
- Would you rather trust a proprietary solution from an unknown/smaller third party instead?
Modern
- Other VoIP standards are older
- As such, they were conceived and written before our modern era of cloud and smartphones
- This means they adhere to different threat surfaces than what is needed today

Security

Need security?

WebRTC has the mechanisms available for you

It is encrypted
Requires signaling to be encrypted
Enables end-to-end encryption via media servers by using Insertable Streams

Privacy

Need privacy?

WhenRtc has the mechanisms available for you here as well

It is encrypted
Can run peer-to-peer, without any media servers touching the media itself
You can use the data channel to “hide” data from passing through servers altogether (once a connection is established between the peers)
Your decision on where you install and manage your infrastructure to add to the privacy you offer

Care about security? WebRTC is your best choice moving forward. But it won’t take the responsibility off your back.

Two pointers for you before you go:

Everything you need to know about WebRTC security

Zoom’s past security issues and why WebRTC is different

The post WebRTC is the most secure VoIP protocol appeared first on BlogGeek.me.

Video API, CPaaS, programmability and WebRTC

Mon, 09/26/2022 - 12:30

A common solution for real time video apps is to rely on Video API, a niche in CPaaS, which makes use of WebRTC.

WebRTC has been around for enough time now to garner the creation of an ecosystem around it – both commercial and open source. I’ve recently covered the state of open source WebRTC solutions. It is time to look at the commercial solutions, and in them, to focus on the managed Video API – also known as CPaaS or “programmable video”.

The need for managed video API in WebRTC

WebRTC isn’t simple. It is simpler than building the whole thing by yourself, but getting it deployed and building and managing all the backend side for it is a grind. It isn’t just the initial implementation and setup, but rather the ongoing updates and maintenance – WebRTC is still a living thing that requires care and attention.

Some vendors go for open source solutions. Others will aim for commercial infrastructure that they host on their own. Most would simply rely on the cloud, using a video API.

At the end of the day, the concept is rather simple – a third party CPaaS / video API vendor puts up the infrastructure, maintains it and scales it. Adds a public API on top. Sprinkles helpful documentation. And gets developers to pay for the use of this infrastructure.

It is a win-win for everyone:

The video API vendor has paying customers, while focusing on delivering high real time video communication solution
Developers can use the video API to get faster time to market, while enjoying the video API vendor’s economies of scale and expertise. At the same time, they can focus on building their own application, where video is just one part of the whole experience

Video API, CPaaS or programmable video?

One thing to note is that there’s no specific definition or term to use here.

Some use video API, which I decided to use for this article.

Others will simply say CPaaS – Communication Platform as a Service, but refer to the video feature/product within that market. CPaaS does a lot more and usually focuses on voice and SMS.

There are those who use VPaaS – Video Platform as a Service, to try and explain that this is still CPaaS but for video. Or still UCaaS (Unified Communications as a Service) but for video. I never did relate to this one.

Others still use Programmable Video.

Then there’s RTC (Real Time Communication) and RTE (Real Time Experience). Trying to broaden the scope beyond the mere use of APIs.

I’ve used WebRTC API Vendor or WebRTC PaaS in the past. Today I am just trying to use CPaaS or video API. Mostly.

Video Call, Video Chat or Group Video?

In the same manner that we have multiple names to describe video API vendors we have multiple phrases to describe what it is we are doing with real time video communications.

The usual names are video call, video chat, group video, video conferencing.

And then there’s also live streaming and broadcast – when a single or a small number of users broadcast their video in real time to a potentially very large audience.

Who are the video API vendors and what do they offer?

Assuming you are looking for a video API vendor, who are the candidates we have in the market? Here are a few of them.

Twilio video API

You can’t start any discussion about CPaaS without looking at Twilio. Twilio is the uncontested leader in CPaaS and communication APIs. It has grown up in that space and is expanding it beyond the initial developers and API focus it once had.

When it comes to video API, Twilio’s main offering is Twilio Programmable Video. It wasn’t the first to come to market, but it can’t be ignored simply because it comes from Twilio.

Sadly tough, the recent downsize at Twilio was accompanied with an email and a blog post shared by Jeff Lawson, CEO of Twilio:

As we’ve discussed frequently, we have four priorities for reaching profitability and leading in customer engagement: Investing in our platform reliability and trust, increasing the profitability of messaging, accelerating Segment adoption, and scaling the Flex customer base.

Twilio’s priorities are:

Platform reliability

Profitability of messaging
Segment adoption
Flex adoption

None of it really related to video API or the Twilio Programmable Video…

This doesn’t say that Twilio Programmable Video is good or bad. Just that it isn’t the main focus for Twilio.

Vonage Video API

The Vonage Video API is another popular choice. It came to Vonage through its acquisition of TokBox from Telefonica. At the time, the TokBox API was one of the most widely known and used alternatives out there.

Today, the Vonage Video API is still going strong.

Vonage was acquired by Ericsson this year, which can be seen as either a good thing or a bad thing for the Vonage Video API.

On one hand, Ericsson doesn’t cater developers, the long tail or video calling use cases, so what do they have to contribute here? This is again just going to be a distraction for them.

On the other hand, Ericsson just acquired Vonage. They are unlikely to make big sweeping changes. As the world goes into recession, this may mean that the Vonage Video API internal resources will be left untouched a while longer compared to other vendors in this space.

The rest of the video API pack

There are many other alternatives in the video API space.

In 2020 I shared my viewpoint as to the entrance of Amazon’s Chime SDK and Microsoft’s Azure Communication Services into this market of video API.

Other vendors include Agora, Daily, Dolby and many others.

Each with its own focus areas, strength and limitations.

What about Zoom video conferencing API? Is Zoom the exception to prove the rule?

No list of video vendors will be complete without mentioning the elephant in the room – Zoom.

Zoom offers a set of APIs and integrations in various levels and uses:

Zoom has an SDK to develop Zoom Apps – applications that live inside Zoom and interact with it
Zoom Meeting SDK – letting you embed the Zoom experience inside your own application. Practically whitelabeling the Zoom interface
Zoom Video SDK – a video API that is comparable and competitive with the rest of the vendors mentioned here

From APIs to lowcode/nocode Prebuilt solutions

The biggest notable trend in the video API domain is the introduction of Prebuilt solutions.

I’ve been waiting for this to take shape for many years now, and it finally is happening.

Prebuilt are lowcode/nocode solutions that enable developers to write less code in order to embed the video API experience into their application. Here, the vendor offers more than an API layer, and instead of letting the developers using its platform figure out how to implement the UX/UI layer, it is given as a prebuilt component – usually with some level of configuration.

Most video API vendors today offer this in some form or shape or another – from official and unofficial reference applications, to iframe solutions, UIKits or application builders.

Daily, who has kindly sponsored my free ebook on nocode/locode in CPaaS is one such vendor. Their Prebuilt solution is quite comprehensive.

Is there a perfect video API?

No.

Video communication is varied and flexible. It has many different scenarios and use cases. Today, there is no vendor that can cover all these use cases well. It means that there is no single vendor that offers a video API that can be automatically recommended for use.

The answer is more complex than that and boils down to “it depends”. It depends what it is you want to develop and what are your exact requirements and limitations.

There’s an (updated) report for that

To understand which video API vendor offers the best fit to your needs, you can look at my Choosing a WebRTC API Platform report. It just got a fresh update.

25 vendors are covered, looking at them from various aspects. You’ll be learning:

The various strategies developers tend to take in building their WebRTC applications
What makes developers go for a video API solution, and what type
Which KPIs you should be measuring in video API products
What do each of the video API vendors have to offer you

This can greatly reduce the time it will take you to make your selection, as well as lower the risks of making the wrong decision.

Check my report today

Did I mention there’s a discount until the end of the month?

The post Video API, CPaaS, programmability and WebRTC appeared first on BlogGeek.me.

With WebRTC, better stick as close as possible to the requirements, architecture and implementation of Google Meet

Mon, 09/19/2022 - 12:30

When developing with WebRTC, try to stick as close as possible to how Google Meet is designed and architected. That’s where love and attention is given to the source code.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

Video is a resource hog. Some say that WebRTC is a great solution for 1:1 calls, but is lacking when it comes to group calling. To them I’d say that WebRTC is a technology and not a solution. In this case, it simply means that you need to invest some effort in getting group video calling to work well.

What does that mean exactly? That you need to think about bandwidth management first and foremost.

Why?

Let’s assume a 25 participants video call. And we’re modest – we just want each to encode his video at 500kbps – reasonable if we plan on having everyone at a mere VGA resolution (640×480 pixels).

Want to do the math together?

We end up with 12.5Mbps. That’s only for the video, without the overhead of headers or audio. Since we only need to receive media from 24 participants, we can “round” this down to 12Mbps.

I am sure you have a downlink higher than 12Mbps, but let me tell you a few things you might not be aware of:

A downlink of 100Mbps doesn’t mean you can really get sustainable 12Mbps for a long period of time
It also doesn’t mean you can get 12Mbps of incoming UDP traffic (and you prefer UDP since it is better for sending real-time media)
Most likely, your device won’t be able to decode 12Mbps of video content at reasonable CPU use
And if you have hardware acceleration for video decoding, it usually is limited to 3 or 4 media streams, so handling 24 such streams means software decoding – again running against the CPU processing limit
The larger the group the more diverse the devices and network connections. So you’ll be having people joining on old devices and smartphones, or with poor network connections. For them, 12Mbps will be science fiction at best
As a rule of thumb, I’d look at any service that uses over 3-4Mbps of downlink video traffic for video group calls as something that wasn’t properly optimized

You can get better at it, trying to figure out lower bitrates, limit how much you send and receive and do so individually per participant in the video group meeting. You can take into consideration the display layout, the dominant speaker and contributing participants, etc.

That’s exactly what 90% of your battle here is going to be – effectively managing bandwidth.

Going for a group video calling route? Be sure to save considerable time and resources for optimization work on bandwidth estimation and management. Oh – and you are going to need to do that continuously. Because WebRTC is a marathon not a sprint

Scaling WebRTC is no simple task. There are a lot of best practices, tips and tricks that you should be aware of. My WebRTC Scaling eBooks Bundle can assist you in figuring out what more you can do to improve the quality and stability of your group video calling service.

The post With WebRTC, better stick as close as possible to the requirements, architecture and implementation of Google Meet appeared first on BlogGeek.me.

CPO at Spearline and what it means to BlogGeek.me

Tue, 09/13/2022 - 12:30

I am now CPO (Chief Product Officer) at Spearline. This means that there are going to be some changes here at BlogGeek.me. Here’s what you can expect

Me, somewhere in Ireland, 3 weeks ago

Almost a year ago, testRTC, the company I co-founded, got acquired by Spearline. During that time, I got to know the great team there and the huge opportunity that Spearline has.

Since the above feels corny and a cliché to me as I write it, I’ll stop here.

To make a long story short:

Spearline acquired testRTC (Spearline has its HQ in Ireland)
Now they had 2 separate product lines: Voice Assure and testRTC
As time went by, it was apparent that 2 is just a beginning
And also that someone needs to manage product management as a whole
Which is where I came in – they asked, and I said yes
So now I am CPO at Spearline

What does this mean?

First off, I am excited. Very.

It has been some time since I had a team to work with as their direct manager. It will also be the first time I get to manage product managers.

It also means that I am going to be investing a lot more of my time and attention at Spearline. Which is great, as I really love interacting with the people there already (I wouldn’t have accepted the role otherwise).

For my consulting business, it means that I will be shrinking it down considerably. I won’t be doing much consulting moving forward. It is somewhat sad, as I really loved helping people and hearing their stories and challenges. Hopefully, I will still get to do it in other ways.

What is going to stay, are all the initiatives that have taken place around BlogGeek.me over the years:

My writing here on this blog will continue, though probably at a lower frequency
The courses and reports will continue to be supported and updated. Me and Philipp Hancke are working to complete the new Low-level Protocols Course and we have plans for a few other courses after this one
In the same token, WebRTC Insights is going to continue as a service
And so will WebRTC Weekly and the Kranky Geek events
From time to time, I’ll probably run an initiative or two here. Because I just can’t stop myself

All in all, it is time to continue and grow, and in a direction I have never expected I’ll find myself again.

The post CPO at Spearline and what it means to BlogGeek.me appeared first on BlogGeek.me.

The WebRTC Developer Tools Landscape 2022 (+report)

Thu, 09/08/2022 - 12:30

An updated infographic of the WebRTC Developer Tools Landscape for 2022, along with my Choosing a WebRTC API Platform report.

This week I took the time to update my WebRTC Developer Tools Landscape. I do this every time I update my report, just to make sure it is all aligned and… up to date.

A few quick thoughts I had while doing this:

Vendors come and go
- We see this all the time
- At the time of writing, I am aware of 2-3 additional changes that couldn’t fit to this update simply because of timing
Testing & Monitoring is becoming more important
- There are more vendors there than they used to
- With my testRTC hat on, I can say this is a good thing
- Especially since we’re the best game in town
CPaaS is crowded
- And becoming more so
- Is there room for everyone there?
- How will this market look like moving forward?
- Who should you be selecting for your next project?
- All these questions is what I am covering in the WebRTC API report

Why is your company not there?

The WebRTC Developer Tools Landscape will never be complete. People always get pissed off at me when I publish it, not understanding why their company isn’t there. My answer to this is a simple one – because I don’t know what it is that you are doing.

They then get even angrier. What they should do at that point is ask themselves why I don’t know them enough. I have lived and breathed WebRTC since it was first announced. So if I don’t know their company and product, how do they expect others to learn about them?

I don’t think I am unique or special. Just that if you want to be in a landscape infographic that covers WebRTC, you might as well want to make sure people who deal with WebRTC and help others figure out what tools to use will know what it is that you’re doing.

What about that report?

The report has been going strong for some 8 years now, with an update taking place every 8-12 months. It has been 12 months, so it definitely needed an update.

2 vendors were removed from the report and 3 new vendors added.

I’ve also decided to “upgrade” the term Embed/Embeddable/Embedded to Prebuilt. The reason behind it is the progress and popularity of these types of solutions in the video API space. Most CPaaS vendors today that offer a video API are also offering some form of higher level abstraction in the form of a ready made application – be it a full reference app, a UIKit, or a Prebuilt component.

The report will be published on 22 September. If you want to purchase it, there’s a 20% discount available at the moment – from now and until its publication.

Check out more about my Choosing a WebRTC API Platform report.

The post The WebRTC Developer Tools Landscape 2022 (+report) appeared first on BlogGeek.me.

Media compression is all about purposefully losing what people won’t be missing

Mon, 09/05/2022 - 12:30

With WebRTC, we focus on lossy media compression codecs. These won’t maintain all the data they compress, simply because we won’t notice it either.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

The purpose of codecs – voice and video – is to compress and decompress the media that needs to be sent over the network. This was true before WebRTC and will stay true after WebRTC.

Generally speaking, there are two types of compression:

The two types of codecs

Lossless compression – these are codecs that whatever they see as input to the encoder will be generated in the other end of the decoder. Nothing will get lost along the way. Think of it as a .zip file – it stores files and requires a perfect match on both ends of the compression
Lossy compression – these are codecs that don’t maintain an exact match from what goes into the encoder with what ends up after the decoder. These types of codecs are quite common with audio and video processing

Audio and video tend to hold a lot of data. And since we want to send it over the network, we’d rather not waste network resources. So what do these codecs do? They try to remove anything and everything that they can which our eyes and ears won’t notice much.

On a conceptual level, lossy compression has this virtual dial. You move the dial to decide how much you are willing to lose out of the data. The encoder will do its best to lose things you wouldn’t notice, but at some point, you’ll notice.

This flexibility in setting the compression level is also used to manage the bitrate. By estimating the bandwidth, the encoder can be instructed to turn the dial up and down the compression level to generate higher or lower compression to meet the requirements of the estimated available bandwidth.

Looking to learn more about video codecs? Go ahead and read my WebRTC video basics article

The post Media compression is all about purposefully losing what people won’t be missing appeared first on BlogGeek.me.

The state of WebRTC open source projects

Mon, 08/29/2022 - 12:30

WebRTC open source is a mess. It needs to grow out of its youth and become serious business – or gain serious backing.

This article has been written along with Philipp Hancke. We cooperate on many things – WebRTC courses (new one coming up soon) and WebRTC Insights to name a few.

—

WebRTC is free. Every modern browser incorporates WebRTC today. And the base code that runs in these browsers is open sourced and under a permissive BSD license. In some ways, free and open source were mixed in a slightly toxic combination. One in which developers assume that everything WebRTC should be free.

The end result? The sorry state in which we find ourselves today, 11 years after the announcement of WebRTC. What we’re going to do in this article, is detail the state of the WebRTC open source ecosystem, and why we feel a change is necessary to ensure the healthy growth of WebRTC for years to come.

Table of contents

Your open source Cliffs Notes
The WebRTC open source landscape
WebRTC open source client libraries
Open source TURN server(s)
Open source signaling servers for WebRTC
Open source SFUs and media servers in WebRTC
Other, less popular open source alternatives for WebRTC
Is it time for WebRTC open source to grow up?

Your open source Cliffs Notes

We’ll start with the most important thing you need to know:

Open Source != Free

Let’s take a quick step back before we dive into it though.

What’s open source exactly?

An open source project is a piece of source code that is publicly available for anyone under one of the many open source licenses out there. Someone, or a group of people from the same company or from disparate places, have “banded together” and created a piece of software that does something. They put the code of that software out in the open and slap a license on top of it. That ends up being an open source project.

Open source isn’t free. There’s a legal binding associated with using open source, but it isn’t what we’re interested in here. It is the fact that if you use open source, it doesn’t mean that you pay nothing to no one. It just means that you get *something* with no strings attached.

Why would anyone end up doing this for free? Well… that brings us to business models.

Open source business models

There are different types of open source licenses. Each with its own set of rules, and some more permissive than others, making them business-friendly. Sometimes the license type itself is used as a business model, simply by offering a dual license mode where a non-permissive open source license is available freely and a commercial one is available in parallel.

In other cases, the business model of the open source project revolves around offering support, maintenance and customization of that project. You get the code for free, but if you want help with it – you can pay!

Sometimes, the business model is around additional components (this is where you will see things like community edition and enterprise edition popping up as options in the project’s website). Things such as scripts for scaling the system, monitoring modules or other pieces of operational and functional components are protected as commercial products. The open source part brings companies to use it and raise popularity and awareness to the project, while the commercial one is the reason for doing it all. How the developers behind the project bring food to the table and become rich.

In recent years, you see business models revolving around managed services. The database is open source and free, but if you let us host it for you and pay for it, we’ll take care of all your maintenance and scaling headaches.

And some believe it is really and truly free. Troy Hunt wrote about it recently (it is a really good post – go read it):

“… there is a suggestion that those of us who create software and services must somehow be in it for the money”

To that I say – yes!

At the end of the day, delving into open source is all about the money.

Why?

If you do this to create a popular project, then your aim is almost always to figure out how to monetize it. Directly (see above examples) or indirectly, by increasing your chances of getting hired for higher paying jobs or into more interesting projects
Sometimes, you do this because you care deeply about a topic. But the end result is similar. You either have the time to deal with it because you make money elsewhere and this is a hobby – or because the company hiring you is HAPPY that you are doing it (which means you are doing it to some extent for the intrinsic value it gives you at that company)
You might be doing it to hone your skills. But then again, the reason for all this is to become a better programmer and… get hired

The moment the open source project you are developing is meaningful to two more people, or even a single company, there are monetary benefits to be gleaned. We’d venture that if you aren’t making anything from these benefits (even minor ones), then the open source project has no real future. It gets to a point where it should either grow up or wither and die.

A few more words about open source projects

Just a few things before we start our journey to the WebRTC open source realm:

Most open source projects are just an API abstracting out a certain activity or capability that you need for your own application development. In the case of WebRTC, we will be focusing on such abstractions that implement specific network entities – more on that later
When using open source, you usually have a bit more control over your application. That’s because you can modify the source code of the open source components you use as opposed to asking from a vendor to do that when you use a precompiled library
Many open source projects will have poor documentation. That will be doubly true when they are lacking a solid business model – hobbyists developers are more into writing code than they are explaining how to use that code
Documentation is an important aspect for commercial use of open source projects. So are its ability to provide a clear API facade and code samples to make it easy to start using

The WebRTC open source landscape

A common mistake by “noobs” is that WebRTC is a solution that requires no coding. Since browsers already implement it, there’s nothing left to do. This can’t be farther away from the truth.

WebRTC as a protocol requires a set of moving parts, clients and servers; that together enable the rich set of communication solutions we’re seeing out there.

The diagram above, taken from the Advanced WebRTC Architecture course, shows the various components necessary in a typical WebRTC application:

Clients, web-based or otherwise
- The web browser ones are the ones you get for “free” as part of the browser
- Anything else you need to figure on your own
Application server, which we’re not going to touch in this article. The reason being that this is a generic component needed in any type of application and isn’t specific to WebRTC
Signaling server, taking care of setting up and negotiating the WebRTC sessions themselves
STUN/TURN server, which deals with NAT traversal. Needed in almost every deployment
Media server, for media processing heavy lifting. Be it group calling, recording, video rendering, etc – a media server is more than likely to make that happen

For each and every component here, you can find one or more open source projects that you can use to implement it. Some are better than others. Many are long forgotten and decaying. A few are pure gold.

Lets dive into each of these components to see what’s available and at what state we find the open source community for them.

WebRTC open source client libraries

First and foremost, we have the WebRTC open source client libraries. These are implementations of the WebRTC protocol from a user/device/client perspective. Consider these your low level API for WebRTC.

There used to be only a single one – libwebrtc – but with time, more were introduced and took their place in the ecosystem. Which is why we will start with libwebrtc:

libwebrtc

THE main open source project of WebRTC is libwebrtc.

Why?

It is the first one to be introduced
Chrome uses it for its WebRTC implementation
The same goes for Safari, Edge and Firefox – each with a varying degree of integration and use
Many of the native mobile apps use libwebrtc internally

Practically speaking – libwebrtc is everywhere WebRTC is.

Here are a few things you need to know about this library:

libwebrtc is maintained and controlled solely by Google. Every change needs to be signed off by a Googler.
It gets integrated into Chromium and Chrome, which means it reaches billions of devices
That means that Google is quite protective about it. Getting a contribution into libwebrtc is no easy feat
While there are others who contribute, external contributions to libwebrtc are rare and far between
Remember also that the team at Google doing this isn’t philanthropic. It does that for Google’s own needs, which mostly means Google Meet these days. This means that use cases, scenarios, APIs and code flows that are used by Google Meet are likely to be more secure, stable and far more optimized than anything else in libwebrtc’s codebase
Did we mention the whole build system of libwebrtc is geared towards compiling it into Chromium as opposed to other projects (like the one you’re building)? See Philipp’s Fosdem talk from 2021.
Or that some of its interfaces (like device acquisition) are less tested simply because Chrome overrides them, so Google’s focus is on the Chrome interfaces and not the ones implemented in libwebrtc?

Looking at the contributions over time Google is doing more than 90% of the work:

The amount of changes has been decreasing year-over-year after peaking in early 2016. During the pandemic we even reached a low point with less than 200 commits per month on average. Even with these reduced numbers libwebrtc is the largest and most frequently updated project in the open source WebRTC ecosystem.

The number of external contributions is fairly low, below 10%. This doesn’t bode well for the future of libwebrtc as the industry’s standard library of WebRTC. It would be better if Google opened up a bit more for contributions that improve WebRTC or those that make it easier to use by others.

This leads us to the business model aspect of libwebrtc

Money time

What if one decides to use libwebrtc and integrate it directly in his own application?

There’s no option for paid support
No real alternative to pay for custom development
Maintaining your own fork and keeping it in sync with the upstream one is a lot of effort

That said, for the most part, and in most situations, libwebrtc is the best alternative – that’s because it follows the exact implementations you will be bumping into in web browsers. It will always be the most up to date one available.

A side note – libwebrtc is implemented in C++. Why is this relevant? Pion

Pion

Pion is a Go implementation of the WebRTC APIs. Sean DuBois is the heart and sole behind the Pion project and his enthusiasm about it is infectious.

Putting on Tsahi’s cynic hat, Pion’s success can be attributed a lot to it being written in Go. And that’s simply because many developers would rather use Go (modern, new, hip) and not touch C++.

Whatever the reason is, Pion has grown quite nicely since its inception and is now quite a popular WebRTC open source project. It is used in embedded devices, cloud based video rendering and recently even SFU and other media server implementations.

Money time

What if one decides to use Pion and integrate it directly in his own application?

There’s no option for paid support
No official alternative to pay for custom development
There are a handful of contributors to Pion who are doing contracting work

Python, Rust, et al

There are other implementations of WebRTC in other languages.

The most notable ones:

aiortc – a Python implementation of WebRTC
WebRTC.rs – a Rust implementation of WebRTC, created as a rewrite of Pion

There are probably others, less known.

We won’t be doing any Money time section here. These projects are still too small. We haven’t seen too many services using them in production and at scale.

GStreamer

GStreamer is an open source media framework that is older than WebRTC. It is used in many applications and services that use WebRTC, even without using its WebRTC capabilities (mainly since these were added later to GStreamer).

We see GStreamer used by vendors when they need to transform video content in real-time. Things like:

Taking machine rendering (3D, screen casting or other) and passing them to a browser via WebRTC
Mixing inputs combining them into a single recording or a single livestream
Collecting media input on embedded platforms and preparing it for a WebRTC session

Since WebRTC was added as another output type in GStreamer, developers can use it directly as a broadcasting entity – one that doesn’t consume data but rather generates it.

GStreamer is a community effort and written in C. While it is used in many applications (commercial and otherwise), it lacks a robust commercial model. What does that mean?

Money time

What if one decides to use GStreamer and integrate it directly in his own application?

There’s no official option for paid support
No official alternative to pay for custom development
The ecosystem is large enough to allow finding people with GStreamer knowledge

Open source TURN server(s) Connecting WebRTC by using TURN to relay the media

Next we have open source TURN servers. And here, life is “simple”. We’re mostly talking about coturn. There are a few other alternatives, but coturn is by far the most popular TURN server today (open source or otherwise).

In many ways, we don’t need more than that, because TURN is simple and a commodity when it comes to the code implementation itself (up to a point, as Cloudflare is or was trying to change that with their managed service).

But, and there’s always a but in these things, coturn needs to get updated and improved as well. Here’s a recent discussion posted as an issue on coturn’s github repo:

Is the project dead?

Read the whole thread there. It is interesting.

The maintainers of coturn are burned out, or just don’t have time for it (=they have a day job). For such a popular project, the end result was a volunteer or two from the industry picking up the torch and doing this in parallel to their own day job.

Which leads us to:

Money time

What if one decides to use coturn and integrate it directly in his own application?

There’s no official option for paid support
No official alternative to pay for custom development
The ecosystem is large enough to allow finding people with coturn knowledge

Open source signaling servers for WebRTC

Signaling servers are a different beast. WebRTC doesn’t define them exactly, but they are needed to pass the SDP messages and other signals between participants. There are several alternatives here when it comes to open source signaling solutions for WebRTC.

It should be noted that many of the signaling server alternatives in WebRTC offer purely peer communication capabilities, without the ability to interact with media servers. Some signaling servers will also process audio and video streams. How much they focus on the media side versus the signaling side will decide if we will be treating them here as signaling servers or media servers – it all boils down to their own focus and to the functions they end up offering.

Signaling requires two components – a signaling server and a client side library (usually lightweight, but not always).

We will start with the standardized ones – SIP & XMPP.

SIP and XMPP

SIP and XMPP preceded WebRTC by a decade or so. They have their own ecosystem of open source projects, vendors and developers. They act as mature and scalable signaling servers, sometimes with extensions to support WebRTC-specific use-cases like creating authentication tokens for TURN servers.

We will not spend time explaining the alternatives here because of this.

Here, it is worthwhile mentioning MQTT as well. Facebook is known to be using it (at least in the past – not sure about today) in their Facebook Messenger for signaling

PeerJS

PeerJS has been around for almost as long as WebRTC itself. For an extended period of that time, the codebase has not been maintained or updated to fit what browsers supported. Today, it seems to be kept.

The project seems to focus on a monolithic single server deployment, without any thought about horizontal scaling. For most, this should be enough.

Throughout the years, PeerJS has changed hands and maintainers, including earlier this year:

Without much ado, lets move to the beef of it:

Money time

What if one decides to use PeerJS and integrate it directly in his own application?

There’s no official option for paid support
No official alternative to pay for custom development
The codebase is small, so if you know WebRTC, these challenges shouldn’t pose any real issue

simple-peer

Simple-Peer has been driven by Feross and his name in the early days. It is another one of those “pure WebRTC” libraries that focuses solely on peer-to-peer. If that fits your use-case, great, it is mature and “done”. Most of the time your use-case will evolve over time though.

It has received only a few maintenance commits in 2022 and not many more in 2021. The same considerations as for PeerJS apply for simple-peer. If you need to pick between the two… go for simple-peer, the code is a bit more idiomatic Javascript.

Money time

Just go read PeerJS – same rules apply here as well.

Matrix

Matrix is “an open network for secure, decentralized communication”. There’s also an open standard to it as well as a commercial vendor behind it (Element).

Matrix is trying to fix SIP and XMPP by being newer and more modern. But the main benefit of Matrix is that it comes as client and server along with implementations that are close to what Slack does – network and UI included. It is also built with scale in mind, with a decentralized architecture and implementation.

Here we’re a bit unaligned… Tsahi thinks Matrix is a good alternative and choice while Philipp is… less thrilled. Their WebRTC story is a bit convoluted for some, meandering from full mesh to Jitsi to a “native SFU” only recently.

So… Matrix has a company behind it. But they have their own focus (messaging service competing with Slack with privacy in mind).

Money time

What if one decides to use Matrix and integrate it directly in his own application?

There’s no official option for paid support
No official alternative to pay for custom development
That said, Matrix does have a jobs room on Matrix where you can search for paid help

Everything else in the github jungle

At the time of writing, there are 26,121 repositories on github mentioning WebRTC. By the time you’ll be reading it, that number will grow some.

Not many are sticking out too much, and in that jumble, it is hard to figure out which projects are right for you. Especially if what you need needs to last. And doubly so if you’re looking for something that has decent enough support and a thriving community around it.

Open source SFUs and media servers in WebRTC

Another set of important open source WebRTC components are media servers and SFUs.

While signaling servers deal with peer communication of setting up the actual sessions, media servers are focused on the channels – the actual data that we want to be sending – audio and video streams, offering realtime video streaming and processing Whenever you’ll be needing group sessions, broadcasts or recordings (and you will, assuming you’d like video calls or video conferences incorporated in your application), you will end up with media servers.

Here’s where are are marketwise

Janus, Jitsi, mediasoup & Pion

I’ve written about these projects at length in my 2022 WebRTC trends article. Here’s a visual refresher of the relevant part of it:

Janus, Jitsi, mediasoup and Pion are all useful and popular in commercial solutions. Let’s try to analyze them with the same prism we did for the other WebRTC open source projects here.

Janus

There’s official paid support available from meetecho
You can pay meetecho for consulting and paid development. From experience, they are mostly busy which means they are picky with who they end up working with
The Janus ecosystem is large enough and there are others who offer development services for it as well

Jitsi

Jitsi can be considered a platform of its own:

At the heart of Jitsi is the Jitsi Videobridge, with additional components around it, composing together the Jitsi Meet video chat app
There’s also a managed CPaaS service offering as part of it – 8×8 JaaS

Money time

Jitsi was acquired a few years ago by 8×8. Which means that there’s no official option for paid support
Similarly, custom development isn’t available
The Jitsi ecosystem is large enough and there are others who offer development services for it as well
Oh, and like Matrix (where Element offers paid hosting), 8×8 JaaS offers paid hosting for Jitsi (=CPaaS). There’s also Jitsi Meet which is essentially a free managed service built on top of Jitsi itself

Mediasoup

mediasoup is maintained by 2 developers who have a day job at Around. Which means that there’s no official option for paid support
Similarly, custom development isn’t available
The ecosystem around mediasoup means you can get developers for it as well

Pion

We’ve already discussed Pion when we looked at WebRTC clients
Assume the same is true for media servers
Only you have the headache of choosing which media server written on top of Pion to use

–

To be clear – in all cases above, getting vendors to help you out who aren’t maintaining the specific media server codebase means results are going to be variable when it comes to the quality of the implementation. In other words, it is hard to figure out who to work with.

The demise of Kurento

The Kurento Media Server is dead. So much so that even the guys behind it went to build OpenVidu (below) and then made OpenVidu work on top of mediasoup.

Don’t touch it with a long stick.

It has been dead for years and from time to time people still try using it. Go figure.

Higher layers of abstraction

A higher layer abstraction open source project strives to become a platform of sorts. Their main focus in the WebRTC ecosystem is to offer a layer of tooling on top of open source media servers. The two most notable ones are probably OpenVidu and LiveKit.video conferencing

OpenVidu

OpenVidu is a kind of an abstraction layer to implement a room service, UI included.

It originates from the team left behind from the Kurento acquisition. With time, they even adopted mediasoup as the media server they are using, putting Kurento aside for the most part.

Money time

Unlike many of the open source solutions we’ve seen so far, OpenVidu actually seem like they have a business model:

There’s an official commercial support available
There are hosted commercial plans available as well as consulting and development work

LiveKit

LiveKit offers an “open source WebRTC infrastructure” – the management layer above Pion SFU.

For the life of me though, I don’t understand what the business model is for LiveKit. They are a company – not just an open source project, and as such, they need to have revenue to survive.

Most probably they get some support and development money from enterprises adopting LiveKit, but that isn’t easily apparent from their website.

Other, less popular open source alternatives for WebRTC

There are other companies who offer commercial solutions that are proprietary in nature. Some do it as on premise alternatives, where they provide the software and the support, while you need to deploy and maintain.

These can either be suitable solutions or disasters waiting to happen. Especially when such a vendor decides to pivot or leave the market.

Tread carefully here.

Is it time for WebRTC open source to grow up?

This has been a long overview, but I think we can all agree.

The current state of WebRTC open source is abysmal:

We are more than 10 years in
There are thriving open source projects for WebRTC out there
These projects are used by many – hobbyists and professionals alike
They are found inside commercial applications serving millions of users
But they offer little in the way of support or paid help
Somehow, the market hasn’t grown commercially

If it were up to us, and it isn’t, we’d like to see a more sophisticated market out there. One that gives more and better commercial solutions for enterprises and entrepreneurs alike.

The post The state of WebRTC open source projects appeared first on BlogGeek.me.

Be very clear to yourself why you manage your own TURN servers

Mon, 08/22/2022 - 12:30

Running your own TURN servers for your WebRTC application is not necessarily the best decision. Make sure you know why you’re doing it.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

Are you running your own TURN server? Great!

Now, are you crystal clear and honest with yourself about why you’re doing that exactly?

WebRTC has lots of moving parts you need to take care of. Lots of WebRTC servers: The application. Signaling servers. Media servers. And yes – TURN servers.

I already covered a few aspects of TURN in this WebRTC quote – We TURNed to see a STUNning view of the ICE. It is now time to review the build vs buy decision around TURN.

You see, NAT traversal in WebRTC is done by using two different servers: STUN and TURN. STUN is practically free and it can also be wrapped right into the TURN server.

TURN servers are easy to interface with, but not as easy to install, configure and maintain properly. Which is why my suggestion more often than not is to use a third party managed TURN service instead of putting up your own. Economies of scale along with focus and core competencies come to mind here with this decision.

Why buy your WebRTC TURN servers?

Buying a TURN server should be your default decision. It is simple. It isn’t too expensive (for the most part) and it will reduce a lot of your headaches.

Most of the companies that approach me with connectivity issues of their WebRTC application end up in that state simply because they decided to figure out NAT traversal in WebRTC on their own.

Here are a few really good reasons why you should buy your TURN service:

The best practices of TURN (and STUN) configuration aren’t the defaults of open source TURN servers or of the standard specification itself. So if you don’t have someone inhouse who has done it at scale in the past already, then don’t start now
Using a third party managed TURN server is simple. Onboarding and integration should be a breeze (a few hours at most)
There’s no real vendor lock-in. Switching to your own TURN servers will cost you the same as it would to start with your own TURN servers, so you can delay that decision for later. And switching to another managed TURN server is just as simple as it is to start using one for the first time
Testing for edge cases and figuring out issues with WebRTC connectivity is hard. It takes a lot of time, requires patience, understanding and visibility when issues fail. None of this is something you’ll have in the first months of running your own service
It is cheap. Twilio has it at $0.4/gigabyte of data. And not all of your traffic will go through TURN anyways. When you’ll start paying too much to your taste, you will be able to put up your own infrastructure. But why invest in that effort before it is time to do so?
Someone else will take care of scaling. TURN needs to be as close as possible to the end users. Installing a single server won’t be enough. Installing a single region won’t be enough. Why deal with that headache?
Firewall friendliness. Using your own servers means opening them up in firewall configurations of your customers. There’s a small likelihood that these firewalls are already configured to support the managed TURN service you are using for other tools

Why build your WebRTC TURN servers?

We are all builders. And we love building. So adding TURN into our belt of things we built makes sense. It also plays well into the vertical integration we now appreciate with how successful Apple has been with it with its services.

But frankly, it is mostly about control. The ability to control your own destiny without relying on others.

I still think you should buy your TURN servers from a reputable managed service provider. That said, here are some good reasons why to build and deploy your own:

Data sovereignty and other regulatory reasons. In some industries, for some customers, the fact that you host and run your own servers is critical. In such a case, using a managed third party TURN service is simply impossible. In the same domain, privacy and data processing requirements may make using a third party harder than setting up your own
You already have a large traffic and footprint. With economies of scale this starts becoming interesting and important. If you have the sheer size that makes it worthwhile running your own then do it. I wouldn’t start below $10,000 or even $50,000 in monthly expenses for your managed TURN service, which is a lot of traffic. Why? Because you’ll need a full time ops person on the job for at least half a year if not longer. And you’ll need to deploy servers in many regions from the get go, so better start when you’re big enough
Firewall configurations can be a mess. Sometimes, your customers may want to validate the IP addresses they configure are yours, or want to limit the IP address ranges they configure, or limit the services they expose themselves to. In such cases, they might not look at it nicely when you use a third party
Existing customer installations might already be configured to your IP address ranges, and just placing your TURN servers within those ranges will be easier than asking them to change firewall configurations to incorporate a third party vendor
Traffic control is another reason. Using your own SDN network configuration or packet acceleration may benefit from having your own TURN servers in-house, alongside the rest of your infrastructure as opposed to be hosted elsewhere where connectivity to your backend servers might be questionable

–

Build? Buy? Which one is the path you’ll be taking?

Trying to get more of your calls connected in WebRTC? Check out this free video mini course on effectively connecting WebRTC sessions

The post Be very clear to yourself why you manage your own TURN servers appeared first on BlogGeek.me.

We TURNed to see a STUNning view of the ICE

Mon, 08/08/2022 - 11:30

Every time you look at NAT Traversal in WebRTC, you end up learning something new about STUN, TURN and/or ICE.

[In this list of short articles, I’ll be going over some WebRTC related quotes and try to explain them]

STUN, TURN and ICE. The most misunderstood aspects of WebRTC, and the most important ones to get more calls connected. It is no wonder that the most viewed and starred lesson in my WebRTC training courses is the one about NAT traversal.

Let’s take this opportunity to go over a few aspects of NAT traversal in WebRTC:

STUN is great (and mostly free). It doesn’t route media, it just punches holes in firewalls and NATs
TURN means relaying your media. It isn’t used for all sessions, but when it is used, it is a life saver for that session. You can keep the TURN servers on all connections, since it will be used only when needed
While STUN and TURN are servers, ICE isn’t. ICE is a protocol. It is how WebRTC decides if it is going to use TURN or not in a session
No matter how you connect your session, it may happen on either UDP or TCP. UDP will be a better alternative (and WebRTC will prioritize it and try to connect it “first”)
TURN servers are expensive. Don’t use free TURN servers – they aren’t worth the money you aren’t paying for it. Use your own or go for a paid, managed TURN service
Put TURN servers as close as possible to your users. They’ll thank you for that
In the peer connection’s iceServers configuration – don’t put more than 3-4 servers (that means 1 STUN, 1 TURN/UDP, 1 TURN/TCP, 1 TURN/TLS). More servers means more connectivity checks and more time until you get things connected – it doesn’t mean better connectivity
Geolocation with TURN should be done either before you place your TURN servers in the configuration or via the DNS requests for the TURN servers themselves
You don’t always need TURN servers. Read more about when you need and don’t need TURN

This covers the basics. There’s a ton more to learn and understand about NAT traversal in WebRTC. I’d also suggest not installing and deploying your own TURN servers but rather use a third party paid managed service. The worst that can happen is that you’ll install and run your own later on – there’s almost no vendor lock-in for such a service anyway.

Trying to get more of your calls connected in WebRTC? Check out this free video mini course on effectively connecting WebRTC sessions

The post We TURNed to see a STUNning view of the ICE appeared first on BlogGeek.me.

bloggeek

With WebRTC, don’t expect Google to be your personal outsourcing vendor

Different WebRTC server allocation schemes for scaling group calling

Can I trust WebRTC getStats accuracy?

Can a native media engine beat WebRTC’s performance?

WebRTC predictions for 2023

WebRTC course home assignments are here

Kranky Geek WebRTC event summary 2022

WebRTC: Privacy or Privacy? Which one shall it be?

Two years of WebRTC Insights

The lead actors in WebRTC are outside of your control

WebRTC turns services into features

WebRTC is the most secure VoIP protocol

Video API, CPaaS, programmability and WebRTC

With WebRTC, better stick as close as possible to the requirements, architecture and implementation of Google Meet

CPO at Spearline and what it means to BlogGeek.me

The WebRTC Developer Tools Landscape 2022 (+report)

Media compression is all about purposefully losing what people won’t be missing

The state of WebRTC open source projects

Be very clear to yourself why you manage your own TURN servers

We TURNed to see a STUNning view of the ICE

Pages

Using the greatness of Parallax

About

WITH A RICH FOOTER

Recent comments

Main menu

bloggeek

Pages

Using the greatness of Parallax

Main menu

User login