TransWikia.com

Inserting random data from a list

Stack Overflow Asked by smiley on August 1, 2020

These are my table columns:

 ID || Date || Description || Priority

My goal is to insert random test data of 2000 rows with date ranging between (7/1/2019 – 7/1/2020) and randomize the priority from list (High, Medium, Low).

I know how to insert random numbers but I am stuck with the date and the priority fields.

If I need to write code, any pointers on how do I do it?

Just want to be clear – I have issue with randomizing and inserting from a given list

2 Answers

CREATE TABLE mytable (
  id SERIAL PRIMARY KEY,
  date DATE NOT NULL,
  description TEXT,
  priority ENUM('High','Medium','Low') NOT NULL
);

INSERT INTO mytable (date, priority)
  SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY, 
      ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low') 
  FROM DUAL;

The fake table DUAL is a special keyword. You can select from it, and it always returns exactly one row. But it has no real columns with data, so you can only select expressions.

Do this INSERT a few times and you get:

mysql> select * from mytable;                                                                                                                                       

+----+------------+-------------+----------+
| id | date       | description | priority |
+----+------------+-------------+----------+
|  1 | 2019-10-20 | NULL        | Medium   |
|  2 | 2020-05-17 | NULL        | High     |
|  3 | 2020-06-25 | NULL        | Low      |
|  4 | 2020-05-06 | NULL        | Medium   |
|  5 | 2019-09-30 | NULL        | High     |
|  6 | 2019-08-06 | NULL        | Low      |
|  7 | 2020-02-21 | NULL        | High     |
|  8 | 2019-11-10 | NULL        | High     |
|  9 | 2019-07-30 | NULL        | High     |
+----+------------+-------------+----------+

Here's a trick to use the number of rows in the table itself to insert the same number of rows, basically doubling the number of rows:

INSERT INTO mytable (date, priority)
  SELECT '2019-07-01' + INTERVAL FLOOR(RAND()*365) DAY, 
      ELT(1+FLOOR(RAND()*3), 'High', 'Medium', 'Low') 
  FROM mytable;

Just changing FROM DUAL to FROM mytable I change from selecting one row, to selecting the current number of rows from the table. But the values I insert are still random expressions, not the values already in those rows. So I get new rows with new random values.

Then repeat this INSERT as many times as you want to double the number of rows.

Read also about the ELT() function.

Correct answer by Bill Karwin on August 1, 2020

You seem to be looking for something like this. A basic random sample is:

select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by random()
fetch first 2000 rows only;

Of course, the function for random() varies by database, as does the logic for limiting rows. This should get about the same distribution of priorities as in the original data.

If you want the rows to come by priority first, then use:

select t.*
from t
where date >= '2019-07-01' and date < '2020-07-01'
order by (case when priority = 'High' then 1 when priority = 'Medium' then 2 else 3 end),
         random()
fetch first 2000 rows only;

Answered by Gordon Linoff on August 1, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP