Select Page
NOTE: This is a static archive of an old blog, no interactions like search or categories are current.

A few days ago I posted about splitting mysqldump files using sed and a bit of Ruby to drive it, turns out that sucked, a lot.

I eventually killed it after 2 days of not finishing, the problem is, obviously, that sed does not seek to the position, it reads the whole file. So pulling out the last line of a 150GB file requires reading 150GB of data, if you have 120 tables this is a huge problem.

The below code is a new take on it, I am just reading the file with ruby and spitting out the resulting files with 1 read operation, start to finish on the same data was less than a hour. When run it gives you nice output like this:

Found a new table: sms_queue_out_status
    writing line: 1954 2001049770 bytes in 91 seconds 21989557 bytes/sec
 
Found a new table: sms_scheduling
    writing line: 725 729256250 bytes in 33 seconds 22098674 bytes/sec

The new code below:

#!/usr/bin/ruby
 
if ARGV.length == 1
    dumpfile = ARGV.shift
else
    puts("Please specify a dumpfile to process")
    exit 1
end
 
STDOUT.sync = true
 
if File.exist?(dumpfile)
    d = File.new(dumpfile, "r")
 
    outfile = false
    table = ""
    linecount = tablecount = starttime = 0
 
    while (line = d.gets)
        if line =~ /^-- Table structure for table .(.+)./
            table = $1
            linecount = 0
            tablecount += 1
 
            puts("\n\n") if outfile
 
            puts("Found a new table: #{table}")
 
            starttime = Time.now
            outfile = File.new("#{table}.sql", "w")
        end
 
        if table != "" && outfile
            outfile.syswrite line
            linecount += 1
            elapsed = Time.now.to_i - starttime.to_i + 1
            print("    writing line: #{linecount} #{outfile.stat.size} bytes in #{elapsed} seconds #{outfile.stat.size / elapsed} bytes/sec\r")
        end
    end
end
 
puts