# From Labels to Graph: what machine learning approaches to use?

Data Science Asked by Hamed on December 9, 2020

Imagine a world where we have places (e.g., cities, restaurants, national parks, etc.) but no roads connecting them.

Our objective is to build roads connecting any two places while going through some given places. For instance, connect Seattle to San Francisco and make sure the connection goes through restaurants X and Y and goes through the Red Wood national park. Note, there is not necessarily one possible road, hence multiple options as output are also acceptable.

We have some roads as examples to train the system. For instance, we have a road from Washington DC to Boston that goes through Baltimore.

So, the input to a machine learning system will be a list of strings (e.g., city names and places to go through) and the expected output is a tree/graph that connects the given points.

(Note, this cannot be a graph traversing problem, as a path the connects the points does not exist.)

I want a machine learning model that can produce/generate a novel graph in the output given a set of labels.

Any thoughts on what ML machine learning methods I can use and where to start?

## Update 1

Note that I have neither of the following:

1. A graph of route(s) from Seatle to San Francisco;
2. A larger graph that contains a route from Seatle to San Francisco.

Instead, I have graphs for routes from Los Angles to Texas (with stops for eating and seeing landscape), from Milan to Rome, and etc.

Therefore, I think this is not a graph searching problem.

## Update 2

I take that the initial question was not clear, so I reworded it.

Your graph is characterized by a set of nodes and edges. It means given a start point x, show me the next node z1, then from z1 show me z2 and so on. This looks like the definition of graph-searching algorithm.

However, maybe by definining a 'cost function' that you want to optimize (minimize or maximize) you can convert it to a supervised learning task. Basically, given a graph composed of weighted nodes and edges, you want to choose a subset (E,V) that minimzes the cost function.

I am not sure this problem fits a generative problem (as you emphasize on the word generate) whose goal is to learn a distribution.

Answered by YAS on December 9, 2020

## Related Questions

### How to handle unseen labels in test data?

1  Asked on August 22, 2020 by meysam

### Regression problem with trace regression model

0  Asked on August 22, 2020

### Why does my cost either increase or go down very slowly

1  Asked on August 21, 2020 by idk

### Regarding Class Balancing in Deep Neural Network

0  Asked on August 21, 2020 by goc

### In a list, find the numbers where they are bigger than the next numbers

1  Asked on August 21, 2020 by ellen-sheldon

### Correlation vs Multicollinearity

3  Asked on August 20, 2020

### Imbalanced classification or Regression? What is the best approach to my A/B testing related problem?

0  Asked on August 20, 2020 by pradical2190

### Is there a way to calculate the degree of classification of data points in a set?

0  Asked on August 19, 2020 by celina

### Problem of continuous training – Supervised learning

2  Asked on August 19, 2020 by sandeep-bhutani

### Classifiers for Page Numbers Sequences

0  Asked on August 19, 2020 by siya-m

### How to binary encode multi-valued categorical variable from Pandas dataframe?

1  Asked on August 19, 2020 by denis-l

### Positively skewed target label in regression

1  Asked on August 19, 2020 by elliot

### Suitable Autoencoder for Activity Recognition dataset Feature Extraction

1  Asked on August 18, 2020 by jemshit-iskenderov

### Neural networks: how to think of it?

0  Asked on August 18, 2020 by luca-di-mauro

### Can an applied mathematics degree get you into data science?

1  Asked on August 18, 2020

### MLPRegressor Output Range

2  Asked on August 18, 2020 by kam

### What are practical differences between kernel k-means and spectral clustering?

1  Asked on August 17, 2020 by kuba_

### Keras input for multivariate classification with LSTM using current features and previous timesteps features and y values

2  Asked on August 17, 2020 by user76478

### What is the best way to train a gradient boosted model on a binomial dataset where the number of “observations” for each instance varies?

0  Asked on August 15, 2020 by lavendarlemur

### Choose CNN architecture first, then optimize parameters – validation vs test performance to pick architecture?

1  Asked on August 15, 2020 by sob3kx

### Ask a Question

Get help from others!