alexmv at bestpractical.com
Wed Feb 25 15:12:23 EST 2015
On Wed, 11 Feb 2015 17:29:58 +0100 Christian Loos <cloos at netcologne.de> wrote:
> I just want to give some feedback from our internal testing of this
Thanks for the feedback!
> Attachment table rows: 1593072
> Attachment table datafile size: 61G.
> It took 105 minutes to extract the attachments.
> After extraction, the attachment directory size was 34G.
> After a "optimize table rt4.Attachments;" run (which took 40 minutes),
> the attachment table datafile size was 13G.
> Attachment datafile + attachment directory = 13G + 34G = 47G
> Compared to the previously 61G datafile size we saved 14G.
Hm -- interesting. I wouldn't have expected mysql to be that
inefficient at binary storage. Assuming the 13G is all text, that
means that it was using 61G-13G = 48G to store 34G worth of data.
> After some checks for duplicate attachments, which this extension only
> extract once (which actually isn't mentioned in the documentation), I
> found out that the de-duplication feature saved 1.5G.
Good catch that the de-duplication wasn't documented; I'll add a note
about that shortly.
> The only annoying thing with this extension is, that even if you have it
> configured to save attachments on disk, it first saves the attachments
> in the DB and you then have to extract them.
This is intentional. The codepath which does the decision as to
whether to leave the data in the database needs information about other
objects to decide, which is not available at EncodeLOB time. Storing
the data temporarily in the database also makes it durable to hard
drives filling, or network storage being unavailable (in the case of S3
> This makes an regular "optimize table rt4.Attachments;" necessary.
> As this operation locks the table (up to MySQL version 5.6.17), you have
> to plan a regularly downtime.
Hm -- why is the "optimize table" necessary? Does MySQL not reuse the
BLOB space until it sees such?
More information about the rt-devel