MySQL CSV Upload
This section covers how you can stage information in comma separated value files and upload the data directly into a MySQL database. This is extremely useful when seeding data into relation data models and is a critical part of ETL (Extract, Transform, and Load) for data warehousing applications.
You have one approach to uploading CSV file data into a MySQL database. You use the LOAD DATA
or LOAD DATA LOCAL
statements. There two caveats about using these commands, and they are:
- You must convert dates to the default MySQL format – YYYYMMDD, or the four digit year, two digit month, and two digit day.
- You can’t use
AUTO_INCREMENT
from the source to the loading table.
It’s untrue that you must input dates as strings and convert them in the database. However, you can do that if you like. I’d suggest you open the file in Microsoft Excel, then create a column with the correct data format. You can do that by creating a custom format, or with a formula. I figure the custom format is easiest but sometimes a formula is necessary for applications.
Excel Formula to Convert a Date to MySQL Format ↓
This is an example formula that’ll convert any supported Excel date format into a CSV-compliant MySQL date string.
The formula is large because MySQL can’t process a date when a one character month or day occurs in the string. You can find more on the logic of the Excel mega formula in this blog post.
=CONCATENATE(CONCATENATE("",YEAR(A1)),IF(LEN(MONTH(A1)),CONCATENATE("0",MONTH(A1))),IF(LEN(DAY(A1)) = 1,CONCATENATE("0",DAY(A1)),CONCATENATE("",DAY(A1)))) |
You should note that loading a date at the end of a line has problems in MySQL. The easiest fix that I’ve found is to place a comma at the end of each line. In a Microsoft world, that eliminates the need for the \r
from the LINES TERMINATED BY
clause.
While this shows a trivial amount of data, here’s a CSV file to use in the test case. Create a directory like Data
off your base logical drive or mount point. Then create a file named transaction_upload.csv and put the following data set in it.
20090102,5 20091231,5 20091128,5 20090616,5 |
Create an importing.sql
file in the C:\Data
folder with the contents below (please note the line returns are only for Windows). It’s only a recommendation but generally import tables aren’t useful in between uploads, and that’s why I create the table as an in-memory table. In-memory tables are dropped when you recycle (boot) the MySQL instance.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | -- Conditionally drop the table. DROP TABLE IF EXISTS transaction_upload; -- Create the new upload target table. CREATE TABLE transaction_upload ( transaction_date DATE , transaction_amount FLOAT ) ENGINE=MEMORY; -- Load the data from a file, don't forget the \n after the \r on Windows or -- it won't work in Windows. You should use only \n in a Linux instance. LOAD DATA INFILE 'c:/Data/mysql/transaction_upload.csv' INTO TABLE transaction_upload FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n'; -- Select the uploaded records. SELECT * FROM transaction_upload; |
Before you connect as anything other than the root account, you’ll need to grant the global FILE
privilege to the targeted user. This is true because you’ve not qualified a LOCAL
file in the import statement. Users that have all privileges on a given database also have read-write access to LOCAL
tables, which is read write to client-side files. When the LOCAL
key word is omitted, you’re instructing a read from the server by the client tool. You must therefore grant a global permission to enable a connected user can call a server-side file.
If you want to import without granting any additional global permissions, you can rewrite the statement like this:
1 2 3 4 5 6 | LOAD DATA LOCAL INFILE 'c:/Data/mysql/transaction_upload.csv' INTO TABLE transaction_upload FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n'; |
Or, you can use this in the course’s Fedora Linux instance:
1 2 3 4 5 6 | LOAD DATA LOCAL INFILE '/u01/app/mysql/upload/transaction_upload.csv' INTO TABLE transaction_upload FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n'; |
If you don’t want to rewrite the statement, you’ll get the following error:
ERROR 1045 (28000): Access denied FOR USER 'student'@'localhost' (USING password: YES) Empty SET (0.01 sec) |
Grant Global File Permission ↓
This shows you how to grant a global file permissions to a restricted user. It is only necessary when you want the user to read or write server-side files.
C:\Data> mysql -uroot -pcangetin -P3306 |
Then, grant the global privilege to the user:
mysql> GRANT FILE ON *.* TO 'student'@'localhost' IDENTIFIED BY 'student'; |
Now, you can connect to the data base as the restricted user and read an external server-side file:
C:\Data> mysql -ustudent -pstudent -P3306 |
Once you’ve granted these permissions, the user is no longer truly a restricted user. I’d strongly discourage doing this if the user is accessed via web applications.
Run the script:
mysql> \. importing.sql |
Then, you should see the following:
+------------------+--------------------+ | transaction_date | transaction_amount | +------------------+--------------------+ | 2009-01-02 | 5 | | 2009-12-31 | 5 | | 2009-11-28 | 5 | | 2009-06-16 | 5 | +------------------+--------------------+ 4 rows in set (0.00 sec) |
Another quick tidbit, dates can be tricky when they’re the last column in the file. The best solution is to put a comma at the end of each string. If you put the comma at the end of each line, you can also dispense with the \r
on the Windows platform.
The “IF(LEN(MONTH(F1))” phrase is missing “=1” and the else portion of the same IF statemet “,CONCATENATE(“”,MONTH(F1))”
This formula works:
=CONCATENATE(CONCATENATE(“”,YEAR(E1)),IF(LEN(MONTH(E1))=1,CONCATENATE(“0”,MONTH(E1)),CONCATENATE(“”,MONTH(E1))),IF(LEN(DAY(E1)) = 1,CONCATENATE(“0”,DAY(E1)),CONCATENATE(“”,DAY(E1))))
Wil G.
9 Mar 10 at 10:48 pm
The clarification on this line
needs to be clarified. THere is a good explanation here.
Since it is
\r
on Mac and like OSes and\n
on Linux and like OSes and then\r\n
in Windows. I think an explanation outside code comment would be better.Matt
15 Nov 14 at 12:59 pm
Matt, That’s a good idea. I’ll add the reference.
michaelmclaughlin
15 Nov 14 at 4:06 pm
It is missing
LOCAL
:TA Matt
22 Nov 16 at 8:59 pm