# How to easily convert utf8 tables to utf8mb4 in MySQL 5.5

I have a database which now needs to support 4 byte characters (Chinese). Luckily I already have MySQL 5.5 in production.

So I would just like to make all collations which are utf8_bin to utf8mb4_bin.

I believe there is no performance loss/gain with this change other than a bit of storage overhead.

If you, like me, do not trust automation, this is how I have handled the problem.

First Stop digging!

Start with altering the default charset of new tables by changing the DB definition(like in all other answers):

ALTER DATABASE database_name
CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;


Then generate sql to change the default charset for new columns of all existing tables:

SELECT concat("ALTER TABLE ",table_schema,".",table_name," DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_swedish_ci;") as _sql
FROM information_schema.TABLES
WHERE table_schema like "database_name" and TABLE_TYPE="BASE TABLE"
GROUP BY table_schema, table_name ;


Now we can handle the "legacy"

List character datatypes you are using:

select distinct data_type  from information_schema.columns where table_schema = "database_name" and CHARACTER_SET_NAME is not null;


For me that list was "varchar" and "text"

List character_SETS_ in use:

select distinct character_set_name from information_schema.columns where table_schema = "database_name";


This gives me "utf8", "latin1", and "utf8mb4" which is a reason I do not trust automation, the latin1 columns risk having dirty data.

Now you can make a list of all columns you need to update with:

select table_name, column_name, data_type, character_set_name, collation_name
from information_schema.columns
where table_schema = "database_name" and CHARACTER_SET_NAME is not null AND CHARACTER_SET_NAME <> "utf8mb4"
group by table_name, , data_type, character_set_name, collation_name;


Tables containing only utf8 or utf8mb4 could be converted with "CONVERT TO CHARACTER SET" as Mathias and MrJingles describes above, but then you risk MySQL changing the types for you, so you may be better of running "CHANGE COLUMN" instead since that gives you control of exactly what happens.

If you have non-utf8 columns these questions may give inspiration about checking the columns data: https://stackoverflow.com/q/401771/671282 https://stackoverflow.com/q/9304485/671282

Since you probably know what you expect to have in most of the columns something like this will probably handle most of them after modifying the non-ascii chars allowed to suit your needs:

SELECT distinct section FROM table_name WHERE column_name NOT REGEXP '^([A-Za-z0-9åäöÅÄÖ&.,_ -])*$';  When the above did not fit I used the below that have a bit "fuzzier" maching: SELECT distinct CONVERT(CONVERT(column_name USING BINARY) USING latin1) AS latin1, CONVERT(CONVERT(column_name USING BINARY) USING utf8) AS utf8 FROM table_name WHERE CONVERT(column_name USING BINARY) RLIKE CONCAT('[', UNHEX('C0'), '-', UNHEX('F4'), '][',UNHEX('80'),'-',UNHEX('FF'),']') limit 5;  This query matches any two characters that could start an utf8-character, thus allowing you to inspect those records, it may give you a lot of false positives. The utf8 conversion fails returning null if there is any character it can not convert, so in a large field there is a good chance of it not being useful. Answered by Samuel Åslund on August 14, 2020 Run the following SQL statements to get the result statements for updating character set and collation: SET @database = "your_database_name"; SET @charset = "utf8mb4"; SET @collate = "utf8mb4_0900_ai_ci"; SELECT "SET foreign_key_checks = 0;" UNION ALL SELECT concat( "ALTER DATABASE ", SCHEMA_NAME, " CHARACTER SET = ", @charset, " COLLATE = ", @collate, ";" ) AS sql FROM information_schema.SCHEMATA WHERE SCHEMA_NAME = @database AND ( DEFAULT_CHARACTER_SET_NAME <> @charset OR DEFAULT_COLLATION_NAME <> @collate ) UNION ALL SELECT concat( "ALTER TABLE ", TABLE_SCHEMA, ".", TABLE_NAME, " CONVERT TO CHARACTER SET ", @charset, " COLLATE ", @collate, ";" ) AS sql FROM information_schema.TABLES WHERE TABLE_SCHEMA = @database AND TABLE_TYPE = "BASE TABLE" AND TABLE_COLLATION <> @collate UNION ALL SELECT concat( "ALTER TABLE ", c.TABLE_SCHEMA, ".", c.TABLE_NAME, " CHANGE ", c.COLUMN_NAME, " ", c.COLUMN_NAME, " ", c.COLUMN_TYPE, " CHARACTER SET ", @charset, " COLLATE ", @collate, if(c.IS_NULLABLE="YES", " NULL", " NOT NULL"), ";" ) AS sql FROM information_schema.COLUMNS c, information_schema.TABLES t, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY a WHERE c.TABLE_SCHEMA = t.TABLE_SCHEMA AND c.TABLE_NAME = t.TABLE_NAME AND t.TABLE_COLLATION = a.COLLATION_NAME AND c.TABLE_SCHEMA = @database AND c.DATA_TYPE IN ( 'varchar', 'char', 'text', 'tinytext', 'mediumtext', 'longtext' ) AND ( c.CHARACTER_SET_NAME <> a.CHARACTER_SET_NAME OR c.COLLATION_NAME <> t.TABLE_COLLATION ) AND t.TABLE_TYPE = "BASE TABLE" UNION ALL SELECT "SET foreign_key_checks = 1;";  Notice: • It only changes database, tables or columns that are not consistent with the given character set or collation. • It doesn't treat views as regular tables. • It avoids foreign key violation by turning off related checks. • Update variable values accordingly before running the above SQL statements. • Default collation utf8mb4_0900_ai_ci may not be friendly to your preferred language. For example, it consider half-width parentheses () the same as full-width parentheses （）, and causes trouble in certain cases. • Run SHOW COLLATION WHERE CHARSET = 'utf8mb4' AND COLLATION LIKE 'utf8mb4%0900%'; and choose a suitable collation for your needs. Answered by Rockallite on August 14, 2020 I made a script which does this more or less automatically: <?php /** * Requires php >= 5.5 * * Use this script to convert utf-8 data in utf-8 mysql tables stored via latin1 connection * This is a PHP port from: https://gist.github.com/njvack/6113127 * * BACKUP YOUR DATABASE BEFORE YOU RUN THIS SCRIPT! * * Once the script ran over your databases, change your database connection charset to utf8: * *$dsn = 'mysql:host=localhost;port=3306;charset=utf8';
*
* DON'T RUN THIS SCRIPT MORE THAN ONCE!
*
* @author hollodotme
*
* @author derclops since 2019-07-01
*
*         I have taken the liberty to adapt this script to also do the following:
*
*         - convert the database to utf8mb4
*         - convert all tables to utf8mb4
*         - actually then also convert the data to utf8mb4
*
*/

$dsn = 'mysql:host=localhost;port=3306;charset=utf8';$user     = 'root';
$password = 'root';$options  = [
PDO::ATTR_CURSOR                   => PDO::CURSOR_FWDONLY,
PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true,
PDO::MYSQL_ATTR_INIT_COMMAND       => "SET CHARACTER SET latin1",
];

$dbManager = new PDO($dsn, $user,$password, $options );$databasesToConvert = [ 'database1',/** database3, ... */ ];
$typesToConvert = [ 'char', 'varchar', 'tinytext', 'mediumtext', 'text', 'longtext' ]; foreach ($databasesToConvert as $database ) { echo$database, ":n";
echo str_repeat( '=', strlen( $database ) + 1 ), "n";$dbManager->exec( "USE {$database}" ); echo "converting database to correct locale too ... n";$dbManager->exec("ALTER DATABASE {$database} CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci");$tablesStatement = $dbManager->query( "SHOW TABLES" ); while ( ($table = $tablesStatement->fetchColumn()) ) { echo "Table: {$table}:n";
echo str_repeat( '-', strlen( $table ) + 8 ), "n";$columnsToConvert = [ ];

$columsStatement =$dbManager->query( "DESCRIBE {$table}" ); while ( ($tableInfo = $columsStatement->fetch( PDO::FETCH_ASSOC )) ) {$column = $tableInfo['Field']; echo ' * ' .$column . ': ' . $tableInfo['Type'];$type = preg_replace( "#(d+)#", '', $tableInfo['Type'] ); if ( in_array($type, $typesToConvert ) ) { echo " => must be convertedn";$columnsToConvert[] = $column; } else { echo " => not relevantn"; } } //convert table also!!!$convert = "ALTER TABLE {$table} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"; echo "n",$convert, "n";
$dbManager->exec($convert );
$databaseErrors =$dbManager->errorInfo();
if( !empty($databaseErrors[1]) ){ echo "n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." n";
exit;
}

if ( !empty($columnsToConvert) ) {$converts = array_map(
function ( $column ) { //return "{$column} = IFNULL(CONVERT(CAST(CONVERT({$column} USING latin1) AS binary) USING utf8mb4),{$column})";
return "{$column} = CONVERT(BINARY(CONVERT({$column} USING latin1)) USING utf8mb4)";
},
$columnsToConvert );$query = "UPDATE IGNORE {$table} SET " . join( ', ',$converts );

//alternative
// UPDATE feedback SET reply = CONVERT(BINARY(CONVERT(reply USING latin1)) USING utf8mb4) WHERE feedback_id = 15015;

echo "n", $query, "n";$dbManager->exec( $query );$databaseErrors = $dbManager->errorInfo(); if( !empty($databaseErrors[1]) ){
echo "n !!!!!!!!!!!!!!!!! ERROR OCCURED ".print_r($databaseErrors, true)." n"; exit; } } echo "n--n"; } echo "n"; }  Answered by clops on August 14, 2020 For people who might have this problem the best solution is to modify first the columns to a binary type, according to this table: 1. CHAR => BINARY 2. TEXT => BLOB 3. TINYTEXT => TINYBLOB 4. MEDIUMTEXT => MEDIUMBLOB 5. LONGTEXT => LONGBLOB 6. VARCHAR => VARBINARY And after that modify the column back to its former type and with your desired charset. Eg.: ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] LONGBLOB; ALTER TABLE [TABLE_SCHEMA].[TABLE_NAME] MODIFY [COLUMN_NAME] VARCHAR(140) CHARACTER SET utf8mb4;  I tried in several latin1 tables and it kept all the diacritics. You can extract this query for all columns doing this: SELECT CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' VARBINARY;'), CONCAT('ALTER TABLE ', TABLE_SCHEMA,'.', TABLE_NAME,' MODIFY ', COLUMN_NAME,' ', COLUMN_TYPE,' CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci;') FROM information_schema.columns WHERE TABLE_SCHEMA IN ('[TABLE_SCHEMA]') AND COLUMN_TYPE LIKE 'varchar%' AND (COLLATION_NAME IS NOT NULL AND COLLATION_NAME NOT LIKE 'utf%');  Answered by MalachiteBR on August 14, 2020 I used the following shell script. It takes database name as a parameter and converts all tables to another charset and collation (given by another parameters or default value defined in the script). #!/bin/bash # mycollate.sh <database> [<charset> <collation>] # changes MySQL/MariaDB charset and collation for one database - all tables and # all columns in all tables DB="$1"
CHARSET="$2" COLL="$3"

[ -n "$DB" ] || exit 1 [ -n "$CHARSET" ] || CHARSET="utf8mb4"
[ -n "$COLL" ] || COLL="utf8mb4_general_ci" echo$DB
echo "ALTER DATABASE $DB CHARACTER SET$CHARSET COLLATE $COLL;" | mysql echo "USE $DB; SHOW TABLES;" | mysql -s | (
echo $DB.$TABLE
echo "ALTER TABLE $TABLE CONVERT TO CHARACTER SET$CHARSET COLLATE $COLL;" | mysql$DB
done
)


Answered by Petr Stastny on August 14, 2020

From my work, I saw that ALTER the database and the tables is not enough. I had to go into each table and ALTER each of the text/mediumtext/varchar columns too.

Luckily I was able to write a script to detect the metadata of MySQL databases, so it could loop through the tables and columns and ALTERed them automatically.

Long index for MySQL 5.6:

There is one thing you must have DBA/SUPER USER privilege to do: Setting the database parameters:

innodb_large_prefix : ON
innodb_file_format : Barracuda
innodb_file_format_max : Barracuda


In the answers for this question, there is instruction how to set those parameters above: https://stackoverflow.com/questions/35847015/mysql-change-innodb-large-prefix

Of course, in my article, there are instructions to do that too.

For MySQL version 5.7 or newer, the innodb_large_prefix is ON by default, and the innodb_file_format is also Barracuda by default.

Answered by Châu Hồng Lĩnh on August 14, 2020

Ran into this situation; here's the approach I used to convert my database:

1. First, you need to edit my.cnf to make default database connection (between applications and MYSQL) utf8mb4_unicode_ci compliant. Without this characters like emojis and similar submitted by your apps won't make it to your tables in right bytes/encoding (unless your application's DB CNN params specify a utf8mb4 connection).

Instructions given here.

2. Execute the following SQL (no need to get prepared SQL to change individual columns, ALTER TABLE statements will do that).

Before you execute below code replace "DbName" with your actual DB name.

USE information_schema;

SELECT concat("ALTER DATABASE ",table_schema,
" CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
FROM TABLES
WHERE table_schema like "DbName"
GROUP BY table_schema;

SELECT concat("ALTER TABLE ",table_schema,".",table_name,
" CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
FROM TABLES
WHERE table_schema like "DbName"
GROUP BY table_schema, table_name;

3. Collect and save output of above SQL in a dot sql file and execute it.

4. If you get an error like #1071 - Specified key was too long; max key length is 1000 bytes. along with the problematic table name, this means index key on some column of that table (which was supposed to be converted to MB4 charstring) will be very big hence that Varchar column should be <= 250 so that its index key will be max 1000 bytes. Check the columns on which you have indexes and if one of them is a varchar > 250 (most likely 255) then

• Step 1: check data in that column to make sure that max string size in that column is <= 250.

Example query:

select id,username, email,
length(username) as l1,
char_length(username) as l2,
length(email) as l3,
char_length(email) as l4
from jos_users
order by l4 Desc;

• Step 2: if max charlength of indexed column data <= 250 then change the col length to 250. if that is not possible, remove index on that column

• Step 3: then run the alter table query for that table again and table should now be converted into utf8mb4 successfully.

Cheers!

Answered by Nav44 on August 14, 2020

I have a solution that will convert databases and tables by running a few commands. It also converts all columns of the type varchar, text, tinytext, mediumtext, longtext, char. You should also backup your database in case something breaks.

Copy the following code into a file called it preAlterTables.sql:

use information_schema;
SELECT concat("ALTER DATABASE ",table_schema," CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;") as _sql
FROM TABLES where table_schema like "yourDbName" and TABLE_TYPE='BASE TABLE' group by table_schema;
SELECT concat("ALTER TABLE ",table_schema,".",table_name," CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;") as _sql
FROM TABLES where table_schema like "yourDbName" and TABLE_TYPE='BASE TABLE' group by table_schema, table_name;
SELECT concat("ALTER TABLE ",table_schema,".",table_name, " CHANGE ",column_name," ",column_name," ",data_type,"(",character_maximum_length,") CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM COLUMNS where table_schema like "yourDbName" and data_type in ('varchar','char');
SELECT concat("ALTER TABLE ",table_schema,".",table_name, " CHANGE ",column_name," ",column_name," ",data_type," CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci",IF(is_nullable="YES"," NULL"," NOT NULL"),";") as _sql
FROM COLUMNS where table_schema like "yourDbName" and data_type in ('text','tinytext','mediumtext','longtext');


Replace all occurences of "yourDbName" with the database you want to convert. Then run:

mysql -uroot < preAlterTables.sql | egrep '^ALTER' > alterTables.sql


This will generate a new file alterTables.sql, with all the queries you need to convert the database. Run the following command to start the conversion:

mysql -uroot < alterTables.sql


You can also adapt this to run through multiple databases, by changing the condition for the table_schema. For example table_schema like "wiki_%" will convert all databases with the name prefix wiki_. To convert all databases replace the condition with table_type!='SYSTEM VIEW'.

An issue that might arise. I had some varchar(255) columns in mysql keys. This causes an error:

ERROR 1071 (42000) at line 2229: Specified key was too long; max key length is 767 bytes


If that happens you can simply change the column to be smaller, like varchar(150), and rerun the command.

Please note: This answer converts the database to utf8mb4_unicode_ci instead of utf8mb4_bin, asked in the question. But you can simply replace this.

Answered by MrJingles87 on August 14, 2020

From my guide How to support full Unicode in MySQL databases, here are the queries you can run to update the charset and collation of a database, a table, or a column:

For each database:

ALTER DATABASE
database_name
CHARACTER SET = utf8mb4
COLLATE = utf8mb4_unicode_ci;


For each table:

ALTER TABLE
table_name
CONVERT TO CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;


For each column:

ALTER TABLE
table_name
CHANGE column_name column_name
VARCHAR(191)
CHARACTER SET utf8mb4
COLLATE utf8mb4_unicode_ci;


(Don’t blindly copy-paste this! The exact statement depends on the column type, maximum length, and other properties. The above line is just an example for a VARCHAR column.)

Note, however, that you cannot fully automate the conversion from utf8 to utf8mb4. As described in step 4 of the abovementioned guide, you’ll need to check the maximum length of columns and index keys, as the number you specify has a different meaning when utf8mb4 is used instead of utf8.

Section 10.1.11 of the MySQL 5.5 Reference Manual has some more information on this.

Answered by Mathias Bynens on August 14, 2020

I would write a script (in Perl, or whatever) to use information_schema (TABLES and COLUMNS) to walk through all the tables, and do MODIFY COLUMN on every CHAR/VARCHAR/TEXT field. I would collect all the MODIFYs into a single ALTER for each table; this will be more efficient.

I think (but am not sure) that Raihan's suggestion only changes the default for the table.

Answered by Rick James on August 14, 2020

## Related Questions

### Problem to apply replacement with regexp that include look ahead in Postgres

1  Asked on December 19, 2021 by randomize

### sliding window partitioning on date field

1  Asked on December 19, 2021

### How to perform multiple joins onto a table and then join resulting table onto multiple columns into another table

2  Asked on December 19, 2021 by cubesnyc

### How to order some rows one way, and the rest another way?

2  Asked on December 19, 2021 by ouz-can-sertel

### mssql server set up fail with errror on debian linux

3  Asked on December 17, 2021 by lilrazi

### Availability Group Upgrade 2012 to 2016 Data Move

1  Asked on December 17, 2021 by alcor8

### Windows Server 2019 and SQL Server Backups issue?

2  Asked on December 17, 2021 by m_lyons10

### How do I Order By on multiple rows in Postgresql

1  Asked on December 17, 2021

### why does Postgres raise a not-null constraint error on a serial column?

1  Asked on December 17, 2021

### Foreign Key Settings

2  Asked on December 17, 2021

### SQL Server Extended event filters: can you use an IN predicate with a subquery?

1  Asked on December 17, 2021

### Can an Oracle 19c database dblink to an Oracle 10.2.3 database?

2  Asked on December 17, 2021 by darcy-dommer

### Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL

8  Asked on December 15, 2021 by lucas-kauffman

### How to reinstall PostgreSQL over an existing installation

3  Asked on December 15, 2021 by bertus-kruger

### Why does coalescing make this query quicker?

1  Asked on December 15, 2021

### Prometheus High Memory and CPU Usage in PMM

2  Asked on December 15, 2021 by user5594148

### Check for global read lock on MySQL

0  Asked on December 15, 2021 by jelliclecat

### Can’t connect to Azure SQL with error 18456

2  Asked on December 15, 2021 by user2637453

### Avoid multiple Case When – MySQL

1  Asked on December 15, 2021 by binod-kalathil