{"id":1201,"date":"2009-12-11T16:43:02","date_gmt":"2009-12-11T15:43:02","guid":{"rendered":"http:\/\/www.devco.net\/?p=1201"},"modified":"2010-06-27T19:26:26","modified_gmt":"2010-06-27T18:26:26","slug":"splitting_mysql_dumps_by_table","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2009\/12\/11\/splitting_mysql_dumps_by_table.php","title":{"rendered":"Splitting MySQL dumps by table"},"content":{"rendered":"

I often need to split large mysql dumps into smaller files so I can do selective imports from live to dev for example where you might not want all the data. Each time I seem to rescript some solution for the problem. So here’s my current solution to the problem, it’s a simple Ruby script, you give it the path to a mysqldump and it outputs a string of echo’s and sed commands to do the work.<\/p>\n

UPDATE: Please do not use this code, it’s too slow and inefficient, new code can be found here<\/a>.<\/em><\/p>\n

Just pipe it’s output to a file and run it via shell when you’re ready to do the splitting. At the end you’ll have a file per table in your cwd.<\/p>\n

<\/p>\n

\r\n#!\/usr\/bin\/ruby\r\n\r\nprevtable = \"\"\r\nprevline = 0\r\n\r\nif ARGV.length == 1\r\n    dumpfile = ARGV.shift\r\nelse\r\n    puts(\"Please specify a dumpfile to process\")\r\n    exit 1\r\nend\r\n\r\nif File.exist?(dumpfile)\r\n   %x[grep -n \"Table structure for table\" #{dumpfile}].each do |line|\r\n       if line =~ \/(\\d+):-- Table structure for table .(.+).\/\r\n           curline = $1.to_i\r\n           table = $2\r\n   \r\n           unless prevtable == \"\"\r\n               puts(\"echo \\\"\\`date\\`: Processing #{prevtable} - lines #{prevline - 1} to #{curline - 2}\\\"\")\r\n               puts(\"sed -n '#{prevline - 1},#{curline - 2}p;#{curline - 2}q' #{dumpfile} > #{prevtable}.sql\")\r\n               puts\r\n           end\r\n   \r\n           prevline = curline\r\n           prevtable = table\r\n       end\r\n   end\r\nelse\r\n   puts(\"Can't find dumpfile #{dumpfile}\")\r\n   exit 1\r\nend\r\n<\/pre>\n

<\/code><\/p>\n

It’s pretty fast, the heavy lifting is all done with grep and sed, ruby just there to drive those commands and parse a few lines of output.<\/p>\n

Running it produces something like this:<\/p>\n

<\/p>\n

\r\n$ split-mysql-dump.rb exim.sql\r\necho \"`date`: Processing domain_sender_whitelist - lines 32 to 47\"\r\nsed -n '32,47p;47q' exim.sql > domain_sender_whitelist.sql\r\n\r\necho \"`date`: Processing domain_valid_users - lines 48 to 64\"\r\nsed -n '48,64p;64q' exim.sql > domain_valid_users.sql\r\n<\/pre>\n

<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"

I often need to split large mysql dumps into smaller files so I can do selective imports from live to dev for example where you might not want all the data. Each time I seem to rescript some solution for the problem. So here’s my current solution to the problem, it’s a simple Ruby script, […]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","footnotes":""},"categories":[1],"tags":[121,34,13],"_links":{"self":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1201"}],"collection":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/comments?post=1201"}],"version-history":[{"count":10,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1201\/revisions"}],"predecessor-version":[{"id":1526,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1201\/revisions\/1526"}],"wp:attachment":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/media?parent=1201"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/categories?post=1201"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/tags?post=1201"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}