{"id":1209,"date":"2009-12-14T14:41:21","date_gmt":"2009-12-14T13:41:21","guid":{"rendered":"http:\/\/www.devco.net\/?p=1209"},"modified":"2010-06-27T19:22:12","modified_gmt":"2010-06-27T18:22:12","slug":"splitting_mysql_dumps_by_table_-_take_2","status":"publish","type":"post","link":"https:\/\/www.devco.net\/archives\/2009\/12\/14\/splitting_mysql_dumps_by_table_-_take_2.php","title":{"rendered":"Splitting MySQL dumps by table – take 2"},"content":{"rendered":"

A few days ago I posted about splitting mysqldump files using sed<\/a> and a bit of Ruby to drive it, turns out that sucked, a lot.<\/p>\n

I eventually killed it after 2 days of not finishing, the problem is, obviously, that sed does not seek to the position, it reads the whole file. So pulling out the last line of a 150GB file requires reading 150GB of data, if you have 120 tables this is a huge problem.<\/p>\n

The below code is a new take on it, I am just reading the file with ruby and spitting out the resulting files with 1 read operation, start to finish on the same data was less than a hour. When run it gives you nice output like this:<\/p>\n

<\/p>\n

\r\nFound a new table: sms_queue_out_status\r\n    writing line: 1954 2001049770 bytes in 91 seconds 21989557 bytes\/sec\r\n\r\nFound a new table: sms_scheduling\r\n    writing line: 725 729256250 bytes in 33 seconds 22098674 bytes\/sec\r\n<\/pre>\n

<\/code><\/p>\n

The new code below:<\/p>\n

<\/p>\n

\r\n#!\/usr\/bin\/ruby\r\n\r\nif ARGV.length == 1\r\n    dumpfile = ARGV.shift\r\nelse\r\n    puts(\"Please specify a dumpfile to process\")\r\n    exit 1\r\nend\r\n\r\nSTDOUT.sync = true\r\n\r\nif File.exist?(dumpfile)\r\n    d = File.new(dumpfile, \"r\")\r\n\r\n    outfile = false\r\n    table = \"\"\r\n    linecount = tablecount = starttime = 0\r\n\r\n    while (line = d.gets)\r\n        if line =~ \/^-- Table structure for table .(.+).\/\r\n            table = $1\r\n            linecount = 0\r\n            tablecount += 1\r\n\r\n            puts(\"\\n\\n\") if outfile\r\n\r\n            puts(\"Found a new table: #{table}\")\r\n\r\n            starttime = Time.now\r\n            outfile = File.new(\"#{table}.sql\", \"w\")\r\n        end\r\n\r\n        if table != \"\" && outfile\r\n            outfile.syswrite line\r\n            linecount += 1\r\n            elapsed = Time.now.to_i - starttime.to_i + 1\r\n            print(\"    writing line: #{linecount} #{outfile.stat.size} bytes in #{elapsed} seconds #{outfile.stat.size \/ elapsed} bytes\/sec\\r\")\r\n        end\r\n    end\r\nend\r\n\r\nputs\r\n<\/pre>\n

<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"

A few days ago I posted about splitting mysqldump files using sed and a bit of Ruby to drive it, turns out that sucked, a lot. I eventually killed it after 2 days of not finishing, the problem is, obviously, that sed does not seek to the position, it reads the whole file. So pulling […]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","footnotes":""},"categories":[1],"tags":[121,34,13],"_links":{"self":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1209"}],"collection":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/comments?post=1209"}],"version-history":[{"count":6,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1209\/revisions"}],"predecessor-version":[{"id":1524,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/posts\/1209\/revisions\/1524"}],"wp:attachment":[{"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/media?parent=1209"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/categories?post=1209"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.devco.net\/wp-json\/wp\/v2\/tags?post=1209"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}