Best topology for joining data from multiple sensors

Stack Overflow Asked by utxeee on October 23, 2020

I have n sensors generating measurements every t minutes to their own topic as follows:

Topic_1: {timestamp: 1, measurement: 1}, {timestamp: 2, measurement: 4}, ...

Topic_2: {timestamp: 1, measurement: 5}, {timestamp: 2, measurement: 3}, ...
Topic_n: {timestamp: 1, measurement: 3}, {timestamp: 2, measurement: 5}, ...

This number of sensors is dynamic but for sake of simplicity let’s assume I have 3 sensors, therefore, 3 topics getting data every t minutes.

What is the best topology for joining all measurements with the same timestamp as shown below?

{timestamp: 1, measurement: 1} 
{timestamp: 1, measurement: 5}  --------> {timestamp: 1, measurements: [1,5,3]}
{timestamp: 1, measurement: 3}

2 Answers

TANSTAFL: There Ain't No Such Thing As A Free Lunch

Every tradeoff will apply in different situations.

I recommend writing a stupidly simple service first, like an in memory default dictionary. Having something stupid and slow can validate your tests work and sometimes be run in parallel to make sure your complex algorithms work.

I've used 'hop and consolidate' star networks for sensors that collect and forward on a schedule (sleep 6 minutes, wake for 40 ms). Coupling with telemetry, this can give a very low transmission cost. Adds one bit/sensor for no bit received. Downside is it doesn't handle out of order readings, retransmission, etc. Also consolidation systems have a minimum latency.

There as been a lot of work on extremely compact, read-only database readings for logs. Basically, timestamps allow distributing your queries correctly across your compute and drive resources. Sensage and others did this.

Like most Stack Overflow questions, I'm just guessing about your actual problem. :)

Answered by Charles Merriam on October 23, 2020

You have a few options. You can use join and define a joiner to make the list. However it would have to be a windowed stream after the join. If your measurements always come in during the grace period then this should not be a problem.

A little more complicated, if your time stamps do not have duplicates you can groupByKey then aggregate into the lists. this will form a table with the results you want. If you need it to be a stream you can use toStream and filter out updates without a list of length n.

There are probably a few other ways of doing this as well, but these come to mind first.

Answered by wcarlson on October 23, 2020

Add your own answers!

Related Questions

Invalid shorthand property initializer

5  Asked on December 27, 2021 by pallab-ganguly


Drupal payment method getting access denied for admin

1  Asked on December 27, 2021 by vijayanand-premnath


How can I plot more 10k points using matplotlib?

1  Asked on December 27, 2021 by andkot


PHP; time conversion form distance and speed

2  Asked on December 27, 2021 by nway


Jest test with SpyOn

1  Asked on December 27, 2021 by user9855821


json.Decoding a TCP connection and logging the original data at the same time in Go

2  Asked on December 27, 2021 by mnv3akndilyweipfg0ocvwiuob


Check lists are equal whilst accounting for nulls

1  Asked on December 27, 2021


Not sure how to change the behavior of my CSS Grid

1  Asked on December 27, 2021 by m3hran


Fluentd how to get source from a file by executing a script

1  Asked on December 27, 2021 by snoogybunny


Var keyword in Java

5  Asked on December 27, 2021 by harold-ibouanga


shorten the string to the desired values

3  Asked on December 27, 2021


Ask a Question

Get help from others!

© 2022 All rights reserved. Sites we Love: PCI Database, MenuIva, UKBizDB, Menu Kuliner, Sharing RPP, SolveDir