TransWikia.com

Decoding unknown ID format parts

Reverse Engineering Asked by siikamiika on May 23, 2021

I have a data set where events have a unique ID. Some examples:

CkUKGkNNZmY1TkhObi1zQ0ZWT0h3Z0VkaExJQ0xnEidDSlB0bG9UTG4tc0NGWmhpaXdvZDFxOE5fdzE1OTc1NzY4NzU4Mzg%3D
CjoKGkNOWHBsX1hJbi1zQ0ZTWks3UW9kRDRNSVdREhxDSXI4emVuSW4tc0NGZm5WVEFJZEVRZ0JrUS0x
CjkKGkNMTzJ6TkRWbi1zQ0ZUNkh3Z0VkdGVVQnd3EhtDT3ZOajZYVm4tc0NGWUdGY0FvZGJIc09iZzE%3D
CjkKGkNMcnN4OHZKbi1zQ0ZRMXpnd29kUW9VQll3EhtDSTdfZzVmSm4tc0NGVWFBY0FvZGRsQUhFQTQ%3D
CjsKGkNMNjlrdFRObi1zQ0ZZMkN3Z0VkVWxnSGxREh1DSktpanFISm4tc0NGVHZCVEFJZEM3a01Udy00OA%3D%3D
CjkKGkNNQ2ItbzNWbi1zQ0ZkU013Z0VkX2RRRHNBEhtDSlAzcm8zUm4tc0NGV252T0FZZENkUUJyUTU%3D
CjoKGkNOTzhoZVRRbi1zQ0ZVYTBnZ29kN0VrTnlnEhxDSS1qbmRmUW4tc0NGY1pLaXdvZEgxRUFvUS0w
CkUKGkNQU0VocjNMbi1zQ0ZaaXNnZ29kQlNjTTZBEidDTVNmbDRiTG4tc0NGWDRBdHdBZF9lQUtPQTE1OTc1NzYyOTkyNDg%3D
CjkKGkNJLVh3NEhLbi1zQ0ZaV3JEUW9kUzNBSC1BEhtDTmFWeVpuSm4tc0NGZEtSY0FvZFdWNEVvQTE%3D
CjkKGkNKYnZpb3pSbi1zQ0ZVVkE3UW9kV0xzTnVBEhtDSXlKcktQUW4tc0NGZkhEVEFJZEJtOENGdzQ%3D
CjsKGkNJRGZsZnpVbi1zQ0ZjM3hnZ29kd1hZRnVREh1DS0t1bE96VW4tc0NGUlQ1V0FvZGE2OElSQS0yNg%3D%3D
CkUKGkNJVGhuZHpQbi1zQ0ZZMkJ3Z0VkUkk0Q1N3EidDUC1qbDVITm4tc0NGVnBsaXdvZGJ2b0hLUTE1OTc1Nzc0MzU0Mjg%3D
CkUKGkNPRzc4SkRPbi1zQ0ZkbWhnZ29kckprTWJREidDT2FCdmRUTW4tc0NGYnhDOVFVZExfc0lIQTE1OTc1NzcwMDk0NTQ%3D
CjoKGkNJN05fZURWbi1zQ0ZZRzJEUW9kamEwTXJ3EhxDTHJ5eVpMVm4tc0NGVWZDVEFJZEkyNEdIQS00
CkUKGkNNN1F6TERKbi1zQ0ZWbUd3Z0Vkb0RJSzlREidDT3ZBeUszSm4tc0NGU1FfWUFvZEl1UUZ4dzE1OTc1NzU3MzIyNTM%3D
CjkKGkNJeXlwOV9Kbi1zQ0ZkQzZnZ29kaGdBSWVnEhtDSVdRMzd6Qm4tc0NGWmlDckFJZEFNb05BZzU%3D
CjoKGkNKQ1YzWnpUbi1zQ0ZjWW5nd29kaUVJRkdREhxDSXVieHBqU24tc0NGVUZrS2dvZElRNElndy0x
CjkKGkNMVHJrNWpMbi1zQ0ZVa29nd29kSG1FRmtnEhtDS3VXdGRMS24tc0NGUVdIY0FvZFFuWUVfdzA%3D
CjkKGkNQVDhqOV9Tbi1zQ0ZaUlU3UW9kSGZZQkl3EhtDTV9rak9iUG4tc0NGUkVsandvZFdIa1A5QTg%3D
CkUKGkNKdmM4S0xRbi1zQ0ZSaktnZ29kdWdRTm5REidDS096NDRQTm4tc0NGUXQ3aXdvZDI1Z0c4dzE1OTc1Nzc1ODMwMDM%3D
CjoKGkNOM3ZnTXJQbi1zQ0ZaSk03UW9kWmJvQmx3EhxDSUdVcHBMSm4tc0NGWVptTUFvZFg3SVBZUTMy
CjkKGkNKYjZ6YmZMbi1zQ0ZRNkV3Z0VkaW9FRDZREhtDTV9UcG96TG4tc0NGWWpGY3dFZExiVUp0ZzA%3D
CjkKGkNPRHZ1Yl9Wbi1zQ0ZZeGU3UW9kMzZ3UG5nEhtDTmYzcFp2R24tc0NGUlpUandvZEt3SUswUTk%3D
CjkKGkNKcjV3NkRKbi1zQ0ZUNkh3Z0VkT1RZSGtBEhtDUFRJeXB6Sm4tc0NGZmp5T0FZZENqZ1BfZzA%3D
CkUKGkNOZlM4NlhKbi1zQ0ZRcWhnZ29ka3kwUEt3EidDSVBwcDVUSm4tc0NGZl9GVEFJZGhIa0ZiZzE1OTc1NzU3MDc1Njk%3D
CjkKGkNONmNsZmJKbi1zQ0ZkYTlnZ29kU3cwRnFBEhtDUEMyOEtMSG4tc0NGZkdDWXdZZDgtTUprUTk%3D
CkUKGkNLZTJ1TmpSbi1zQ0ZTTGdnZ29kcDJVSkdBEidDTW1YcHF2Um4tc0NGUmhQS2dvZGRPQURUdzE1OTc1Nzc4ODk4NjM%3D
CjoKGkNOang2NUxUbi1zQ0ZRSlE3UW9kN1JNRUlREhxDSlNRaDZqRm4tc0NGWFJSaFFvZFJmUUlZZzgx
CkUKGkNLT2dpYXZVbi1zQ0ZRV21EUW9kTngwRy1BEidDSm5lZ3FEUm4tc0NGUzVDaFFvZHlxb0V4dzE1OTc1Nzg2NzI5NjI%3D
CkUKGkNLaU8tZURKbi1zQ0ZlcUZ3Z0VkRGhrT1V3EidDTFdEcUtISm4tc0NGYXhDaFFvZFlSRUNmUTE1OTc1NzU4MzM3NDM%3D
CkUKGkNPMkJ1YlBLbi1zQ0ZlUGxnZ29kN3JvUFZBEidDTWlfdlozSm4tc0NGWVBKV0FvZDZwb05hQTE1OTc1NzYwMDY3ODk%3D
CkUKGkNKR09xS1hTbi1zQ0ZZUkM3UW9kV1FvRzh3EidDUFc2dEkzU24tc0NGUlI0WUFvZGdHb0ZKUTE1OTc1NzgxMjQ0MjY%3D
CjoKGkNNekd1N2JObi1zQ0ZjeXlnZ29kdmJZTXZBEhxDTnlnN003S24tc0NGVVZDOVFVZFktZ0Zwdy02
CjoKGkNPemd2cmJPbi1zQ0ZZTnRnd29kbzFvSy1nEhxDTEtMMnFfSm4tc0NGWmRBOVFVZGR5WUVrUTEz
CjoKGkNLbUxxZXpRbi1zQ0ZaUE1nZ29kbnAwSDRBEhxDTmVqbE96SW4tc0NGUTlIV0FvZDV5QUdaQTEy
CjsKGkNKcjMzSkhLbi1zQ0ZSaktnZ29kZ2RnS3N3Eh1DT1N3eXBESm4tc0NGYXhTaFFvZExJSUdlUS0xMw%3D%3D
CkUKGkNKT3luXzdSbi1zQ0ZhSkQ3UW9kLWlRTHNnEidDSUhnaVlqUm4tc0NGWVZCOVFVZFdqc05JQTE1OTc1NzgwNDM3MzE%3D
CjoKGkNQbWM5UHpQbi1zQ0ZSblBnZ29kTlU0Q3d3EhxDTXVEa3RqS24tc0NGZDVBOVFVZE43MEN1dzIz
CkUKGkNMX3k5UDdUbi1zQ0ZRU0d3Z0VkQ2tvR2t3EidDSXVqb19IVG4tc0NGU0pEOVFVZGRZMERyZzE1OTc1Nzg1ODAyODI%3D
CjkKGkNJSGJ0b2pMbi1zQ0ZXSDNnZ29kLW5vTlpREhtDSjNrMV8zS24tc0NGUWJ3T0FZZHlJa01rZzA%3D
CjoKGkNKT2I4ZHpObi1zQ0ZjcU13Z0VkWjkwRkpBEhxDTUtsc01uTW4tc0NGVnpPVEFJZERmWU9idy0z
CkUKGkNNSFVtS19Tbi1zQ0ZjT0J3Z0VkcHRVSXBnEidDTnVLbk9mUm4tc0NGYzVQS2dvZG53WURmUTE1OTc1NzgxNDQ2ODg%3D
CjkKGkNNYXJnS1BKbi1zQ0ZRcVJ3Z0VkemdzSGlREhtDT1RCcTdQQm4tc0NGVWIwT0FZZFFwSUpFUTM%3D
CkUKGkNJcnFqcGZLbi1zQ0ZVMkh3Z0VkOVpvSHB3EidDS2UyaGZ6Sm4tc0NGWnJIVEFJZDM4a1BjUTE1OTc1NzU5NDY4OTI%3D
CjkKGkNJdkM4bzdTbi1zQ0ZaV1BnZ29kWWN3SXd3EhtDTGlfby1MSm4tc0NGUXBCaFFvZDMzQU9wZzA%3D
CkUKGkNKYXRqc1RSbi1zQ0ZaQ2xnZ29kTTAwRGpBEidDSW5mejk3S24tc0NGWVJEaFFvZElNd043dzE1OTc1Nzc5MjE1NTQ%3D
CkUKGkNMSEgyZGpUbi1zQ0ZZVDBnZ29kOGRzR0tREidDT2lqdTRyUm4tc0NGWWRKWUFvZFg4TU8xdzE1OTc1Nzg1MDA5NDY%3D
CjoKGkNLQzN3cjNWbi1zQ0ZRRy1nZ29kQXMwSW5REhxDSXV5NXBuR24tc0NGUXotT0FZZHB6a0h5UTE1
CjkKGkNNXzZpWkRObi1zQ0ZZTHdnZ29kZ0JFRGlREhtDTlRWNElIS24tc0NGVXB3andvZHRoMENfZzQ%3D
CjoKGkNMNk56dXZQbi1zQ0ZSU3NEUW9kUnRjSGtREhxDTGF0c05yTW4tc0NGUVBoV0FvZHY5Z0xNUS05
CjkKGkNQVzJsTFRNbi1zQ0ZYNkh3Z0VkY25FTG53EhtDT21CajZmTW4tc0NGZE9CY0FvZHF0b0tsZzA%3D

Starting from the surface, they seem to

  1. be URL encoded
  2. be base64 encoded (although there are only 62 unique characters and = padding)
  3. have some struct that contains two URL safe base64 strings https://tools.ietf.org/html/rfc4648#section-5
    • the format is n<length byte>n<length byte><first string>x12<length byte><second string>

I know what the second string points to, but I’m interested in the first one.

Everything that I’ve seen so far matches this pattern:

grp1, grp2, grp3 = re.match(b'^x08(.{7})x02x15(.{4})x1d(.{4})$', part1_decoded, re.DOTALL).groups()

grp1 can have 128 unique characters, but grp2 and grp3 can have all 256. However, some characters are more common than others.

  • In grp1 the most common ones (by a large margin) are 9f and eb. There’s also a spike between c9 and d5. 9f and eb are usually on the right side, so it might be a sequential little-endian integer such as a database index or a timestamp.
  • In grp2, characters like 01 n r 82 c2 are common. I can’t make anything out of it except that they could be separators.
  • grp3 looks otherwise random except characters from 00 to 0f are ~8 times more common than others.

I’m looking for any clues or analysis methods to find out what groups 1, 2 and 3 are used for.

One Answer

The entire thing seems to be protobuf after doing the initial unwrapping with urldecode and base64 decode. grp1 was a timestamp, but I'm not yet sure what the 32-bit floats are used for. Should be easier to go forward now that the format is known, though.

I found this tool helpful decoding the data: https://github.com/omarroth/protodec

> echo 'CkUKGkNNZmY1TkhObi1zQ0ZWT0h3Z0VkaExJQ0xnEidDSlB0bG9UTG4tc0NGWmhpaXdvZDFxOE5fdzE1OTc1NzY4NzU4Mzg%3D' | php -r 'echo urldecode(fgets(STDIN));' | ./protodec -bp | jq
{
  "1:0:embedded": {
    "1:0:base64": {
      "1:0:varint": 1597576876470215,
      "2:1:float32": 7.14585257493996e-38,
      "3:2:float32": 2.971713153332445e-11
    },
    "2:1:string": "CJPtloTLn-sCFZhiiwod1q8N_w1597576875838"
  }
}

# the left part of the above string
> echo 'CJPtloTLn-sCFZhiiwod1q8N_w' | ./protodec -bp | jq
{
  "1:0:varint": 1597576176842387,
  "2:1:float32": 1.3422299960259732e-32,
  "3:2:float32": -1.883341397915719e+38
}

Answered by siikamiika on May 23, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP