Decoding Youtube filters

Introduction

Recently, I was searching for some videos on Youtube and noticed the search filters aren't passed to the server as a series of key-value strings on the URL as one might expect, e.g.:

https://www.youtube.com/results?search_query=cassini?type=video&features=4k

Instead, all the active filters are somehow combined and encoded to the sp query parameter. So the valid version of the previously mentioned URL is:

https://www.youtube.com/results?search_query=cassini&sp=EgQQAXAB

There is some function on Youtube that encodes Type: Video and Features: 4K filters to EgQQAXAB. I can only guess what's the reasoning behind such an opaque approach. More importantly, it made me curious how the encoding works concretely. Hence as a little mental excercise, I set out to decode it.

Reading the data

EgQQAXAB looks like base64 encoded data, so let's fire up ptpython and try to decode it:

>>> import base64
>>>
>>> filter = 'EgQQAXAB'
>>> bytes = base64.b64decode(filter)
>>> bytes
b'\x12\x04\x10\x01p\x01'
>>> bytes.decode('utf-8')
'\x12\x04\x10\x01p\x01'

We're dealing with some binary that is not a utf-8 string. Hence converting the bytes to a more explorable form is the logical next step:

>>> [format(byte, '08b') for byte in bytes]
['00010010', '00000100', '00010000', '00000001', '01110000', '00000001']

Ok, we're in. What now?

There's not much else to do besides looking at various combinations of filters and trying to spot pattern(s).

What about EgQQASAB that encodes Type: Video and Features: HD filters?

>>> filter = 'EgQQASAB'
>>> bytes = base64.b64decode(filter)
>>> [format(byte, '08b') for byte in bytes]
['00010010', '00000100', '00010000', '00000001', '00100000', '00000001']

What can we observe? The first 4 bytes are the same. Fifth is different, so it probably encodes different feature filters:

  • 01110000 is 4K feature filter
  • 00100000 is HD feature filter

Before validating the assumptions, it would be useful to reduce repetitive code, print bytes in more readable form, and be able to compare different sps. We can use the following little piece of code to achieve that:

import base64
from tabulate import tabulate


def decode_sp(sp):
    sp = base64.b64decode(sp)

    return [format(byte, '08b') for byte in sp]


def build_header(labeled_sps):
     bytes_len = [len(bytes) for _, bytes in labeled_sps]
     header_bytes_part = [i for i in range(max(bytes_len))]

     return ["label"] + header_bytes_part


def print_sps(sps):
    sps = [(label, decode_sp(sp)) for label, sp in sps]
    header = build_header(sps)
    sps = [[label, *sp] for label, sp in sps]
    table = tabulate(sps, headers=header)

    print(table)

That allows us to stay organized and compare various sps:

>>> sps = [('type=video, features=4k', 'EgQQAXAB'), ('type=video, features=hd', 'EgQQASAB')]
>>> print_sps(sps)
label                           0         1         2         3         4         5
-----------------------  --------  --------  --------  --------  --------  --------
type=video, features=4k  00010010  00000100  00010000  00000001  01110000  00000001
type=video, features=hd  00010010  00000100  00010000  00000001  00100000  00000001

Moving forward I'm going to format the output to markdown table to further improve readability:

LabelSPB0B1B2B3B4B5
type=video, features=4kEgQQAXAB000100100000010000010000000000010111000000000001
type=video, features=hdEgQQASAB000100100000010000010000000000010010000000000001

Making sense of the data

What about multiple active features filters?

LabelSPB0B1B2B3B4B5B6B7
type=video, features=4kEgQQAXAB000100100000010000010000000000010111000000000001
type=video, features=hdEgQQASAB000100100000010000010000000000010010000000000001
type=video, features=hd+4kEgYQASABcAE=0001001000000110000100000000000100100000000000010111000000000001

Now we're talking.

B0 is a constant (at least in current sample space) and probably a header.

B1 changed from 00000100 to 00000110 after the introduction of an additional feature. Is it a counter of active filters? The first two sps contain 2 filters - 10 in binary, third has three filters - 11 in binary. Why counting doesn't start at the rightmost bit though?

B2 is the same for all the filters, it most likely identifies type: video filter that has been active for all the configurations.

Moreover, the following may hold:

  • 00100000 identifies feature: HD filter
  • 01110000 identifies feature: 4K filter
  • 00000001 looks like some spacer in between of filters

We can easily validate the counter assumption by applying an additional filter:

LabelSPB0B1...
type=video, features=hd+4k+liveEgQQAXAB0001001000001000...

It's official, B1 contains a counter that is shifted by one to the left for an unknown reason. We can consider it being part of a header and draft a specification:

Specification v1

The sp is base64 encoded binary. First 2 bytes are header. The header is followed by one or more filters.

B0 constantB1 counter
00010010B7-B1: counter; B0: 0

Example

Header - ConstHeader - Count: 2Type: VideoSpacer?Features: 4KSpacer?
000100100000010000010000000000010111000000000001

We have some understanding of the header and shape of some filters. Now we can apply different filters to see whether our assumptions hold for those.

Additional filter types

Let's observe EgQIAhAB sp encoding Type: Video and Upload date: Today filters:

B0 Header - ConstB1 Header - Count: 2B2 ?B3 ?B4 Type: VideoB5 Spacer?
000100100000010000001000000000100001000000000001

B0 and B1 still hold our assumption. However Type: Video filter together with spacer moved from B2 and B3 to B4 and B5 respectively. Meaning B2 should identify Upload date: Today but B3 is not usual assumed spacer byte.

Maybe looking at yet another filter type may be of help. Let's check Type: Video and Duration: 4 - 20 minutes given by EgQQARgD sp:

B0 Header - ConstB1 Header - Count: 2B2 Type: VideoB3 ?B4 ?B5 ?
000100100000010000010000000000010001100000000011

Type: Video is back at B2 but the spacer assumption is rendered invalid as the B5 following assumed Duration: 4 - 20 minutes filter is yet again different.

Here's an idea. What if there's nothing like a spacer and each piece of information in the sp is composed of two bytes wide words? And what if the order of filter words doesn't really matter?

We can try to decode a combination of all known filters to see whether all the assumptions fit well.

The EgoIAhABGAMgAXAB sp contains following filters

  • Type: Video
  • Features: 4K + HD
  • Duration: 4 - 20 minutes
  • Upload date: Today

From previous observations and assumptions made we know that the header should contain 5 in binary - 101 and that the filters should decode as:

Type: VideoFeatures: 4KFeatures: HDUpload date: TodayDuration: 4 - 20 mins
00010000,0000000101110000,0000000100100000,0000000100001000,0000001000011000,00000011

Let's try to find a match in whole decoded sp:

Header Count: 5Upload date: TodayType: VideoDuration: 4-20minsFeatures: HDFeatures: 4K
00010010,0000101000001000,0000001000010000,0000000100011000,0000001100100000,0000000101110000,00000001

All good, it seems like our assumptions are valid. Thus we can update our specification.

Specification v2

The sp is base64 encoded binary. The binary consists of 2 bytes wide words. The first word is a header which is followed by one or more filter words.

Header

The first byte is constant, second contains a counter of filter words.

B0 constantB1
00010010b7 - b1: counter; b0: 0
Filters catalog
Type
LabelBinary value
Video00010000,00000001
Features
LabelBinary value
4K01110000,00000001
HD00100000,00000001
Upload date
LabelBinary value
Today00001000,00000010
Duration
LabelBinary value
4 - 20 minutes00011000,00000011

Example

Header ConstHeader Count: 2Type: VideoFeatures: 4K
000100100000010000010000,0000000101110000,00000001

Breaking the rules

We can try to tinker with some additional combinations to see whether we fully cracked the encoding logic. What about usual Type: Video combined with Features: HDR given by EgUQAcgBAQ== sp?

B0 Header ConstB1 What?B2-3 Type: VideoB4 ?B5 ?B6 ?
000100100000010100010000,00000001110010000000000100000001

Now, this is different.

Bit 0 on the counter got finally activated. B2 and B3 clearly identify Type: Video filter that is followed by 3 bytes.

The only possible conclusion is that bit 0 of the counter marks the presence of special 3-bytes long filter. That seems to be included in the counter and I assume it will always be the last filter in the sp. Otherwise parsing would be a nightmare.

We can again apply all the known filters together with Features: HDR to validate the assumptions. The Eg0IAhABGAMgAXAByAEB sp contains the following filters:

  • Type: Video
  • Duration: 4 - 20 minutes
  • Upload date: Today
  • Features: 4K + HD + HDR (new)

Decoding as 00010010 00001101 00001000 00000010 00010000 00000001 00011000 00000011 00100000 00000001 01110000 00000001 11001000 00000001 00000001 binary that can be broken down to the following sections:

Header:

ConstCount: 6; 3B filter on
0001001000001101

Filters:

Upload date: TodayType: VideoDuration: 4-20minFeatures: HDFeatures: 4KFeatures: HDR
00001000,0000001000010000,0000000100011000,0000001100100000,0000000101110000,0000000111001000,00000001,00000001

Another 3 bytes wide filter is Features: VR180 given by EgPQAQE= (this time without any additional filter):

Header ConstHeader Count: 1; 3B filter onFeatures: VR180
000100100000001111010000,00000001,00000001

With that cracked we can update our specification and fill it with all the possible filters.

Specification v3

The sp is base64 encoded binary. The binary consists of 2 bytes wide header followed by 2 bytes wide filters. When the first bit of the header is set, the last filter (tail) is 3 bytes wide.

Header

The first byte is constant, second contains a 3-bytes filter tail toggle and a counter of filters.

B0 constantB1
00010010b7 - b1: counter; b0: 3-bytes filter tail toggle
3-bytes filter tail toggle

When set, the last filter is 3-bytes wide.

Filters catalog
Type
LabelBinary value
Video00010000,00000001
Channel00010000,00000010
Playlist00010000,00000011
Movie00010000,00000100
Features
LabelBinary value
Live01000000,00000001
4K01110000,00000001
HD00100000,00000001
Subtitles/CC00101000,00000001
Creative commons00110000,00000001
$360^{\circ}$01111000,00000001
VR180 (3B)11010000,00000001,00000001
3D00111000,00000001
HDR (3B)11001000,00000001,00000001
Location (3B)10111000,00000001,00000001
Purchased01001000,00000001
Upload date
LabelBinary value
Last hour00001000,00000001
Today00001000,00000010
This week00001000,00000011
This month00001000,00000100
This year00001000,00000101
Duration
LabelBinary value
Under 4 minutes00011000,00000001
4 - 20 minutes00011000,00000011
Over 20 minutes00011000,00000010

Examples

Header ConstHeader Count: 2Type: VideoFeatures: 4K
000100100000010000010000,0000000101110000,00000001
Header ConstHeader Count: 1; 3B filter onFeatures: VR180
000100100000001111010000,00000001,00000001

Closing thoughts

I believe we cracked the filters quite accurately, I successfully tested a couple of combinations against Youtube. All were accepted and correctly parsed.

Of course, there are some additional nuances to the filters. For example when Type: Movie filter is active, only certain additional filters can be appended. Also, I didn't cover sorting.

The core workings behind the sp should be covered though.

One might write an encoder/decoder of the spbased on this breakdown. As I don't have use of such program I'll leave it on fellow readers that would benefit from it.