A few days ago I posted about splitting mysqldump files using sed and a bit of Ruby to drive it, turns out that sucked, a lot.
I eventually killed it after 2 days of not finishing, the problem is, obviously, that sed does not seek to the position, it reads the whole file. So pulling out the last line of a 150GB file requires reading 150GB of data, if you have 120 tables this is a huge problem.
The below code is a new take on it, I am just reading the file with ruby and spitting out the resulting files with 1 read operation, start to finish on the same data was less than a hour. When run it gives you nice output like this:
Found a new table: sms_queue_out_status writing line: 1954 2001049770 bytes in 91 seconds 21989557 bytes/sec Found a new table: sms_scheduling writing line: 725 729256250 bytes in 33 seconds 22098674 bytes/sec |
The new code below:
#!/usr/bin/ruby if ARGV.length == 1 dumpfile = ARGV.shift else puts("Please specify a dumpfile to process") exit 1 end STDOUT.sync = true if File.exist?(dumpfile) d = File.new(dumpfile, "r") outfile = false table = "" linecount = tablecount = starttime = 0 while (line = d.gets) if line =~ /^-- Table structure for table .(.+)./ table = $1 linecount = 0 tablecount += 1 puts("\n\n") if outfile puts("Found a new table: #{table}") starttime = Time.now outfile = File.new("#{table}.sql", "w") end if table != "" && outfile outfile.syswrite line linecount += 1 elapsed = Time.now.to_i - starttime.to_i + 1 print(" writing line: #{linecount} #{outfile.stat.size} bytes in #{elapsed} seconds #{outfile.stat.size / elapsed} bytes/sec\r") end end end puts |