[rt-users] Option to store attachments on the filesystem

Geoff Mayes gmayes at uoregon.edu
Thu Dec 22 15:42:43 EST 2011


Hi Kevin and Joe,

Joe -- that's exactly what I did at a previous company where we served 30TB of Bugzilla attachments and it worked very well.  I plan to do that with RT too.

Kevin -- thanks so much for your detailed response.  Yeah, definitely, I think this is an important conversation to have and I am so happy that others are weighing in.

> As far as the assertion that "a lot of folks would benefit from this
> feature", I doubt that would be the case for the vast majority of RT
> users. Most users can handle "one-stop-shopping" type applications
> with far fewer problems.

Yes, you're right.  I was a bit overzealous in claiming that "a lot of folks would benefit from this feature."  By "a lot" I did not mean the majority but instead that a not insignificant number of RT users would be interested in this feature (per the previous list postings I referenced).

> Once you divorce the metadate repository
> from the actual ticket data, you add a whole slew of different failure
> modes that will require much more sophisticated administration processes
> to prevent, ameliorate, or recover from.

I don't think this is the case, especially if this feature is implemented well.  RT only needs to deal with filesystem attachments for the "Create Ticket" and "Display Ticket" pages where users can upload or view attachments.  When displaying tickets RT could have a 5 second timeout when trying to access the local filesystem (over AJAX so the page still loads quickly entirely from the database) and if there is a timeout or failure, RT would log the issue (and optionally email) the admin and display a user-friendly message in the same attachments UI area stating that the attachments couldn't be displayed.

I envision searching of attachments being disabled in RT if the user has chosen to store attachments on the filesystem and this would of course be well-documented for those choosing to install RT with filesystem attachments.  RT could still keep the Attachments table and store metadata about files in that table when a user uploads a file, but this isn't necessary and I'd let Best Practical decide this.  Keeping attachments in the Transactions table isn't necessary, but it fits with the design of RT and would be nice to have.  Again, all of this is internal RT implementation details, and depending on how well it is done, the local administrator of RT could have an incredibly simple and easy experience if they chose to store attachments on the filesystem instead of the database. 

The local filesystem or fileserver itself is, granted, an additional piece to manage, but it is an incredibly simple piece.  I would much rather manage files on a filesystem than files in a database when I could have a database be 1.6GB instead of 15GB.  And there are so many free (as well as expensive) ways to manage data (e.g. NFS, rsync, RAID, high-end redundant SANs, etc).  Most importantly, the binary data is separated from textual data, and separate backup schedules and schemes are then permitted.  Yay for modular design!  As I said previously, I previously administered 30TB of attachments over NFS for a different tracking system and it worked very well.

> Your reference to leveraging an existing SAN+SAN management team gives a hint to the increase in
> both complexity and cost of running an instance.

This cost is up to the user depending on how secure and robust they would like their data to be.  RT only provides the option for local or database attachment storage.  The user can then decide whether a cheap SATA local disk serves the attachments or a super-expensive (or cheap), locally-mounted fileserver.

> There are a wide range of RT users from systems that manage a handful
> of tickets a week all the way to systems handling thousands of tickets
> or more a week. Those on the small end can/should use whatever DB
> backend that they are familiar with to simplify administration and
> the "what did I do?!" errors due to a lack of familiarity.

Totally agree with this.  An option to store attachments on the filesystem, however, is database-agnostic, so RT admins can select this option with MySQL, Oracle, Postgres, SQLite, etc.

> As you move towards larger implementations, your DB backend needs to be
> chosen based on it viability in an enterprise/large-scale environment.
> I do not know the level of your local MySQL expertise and I am certainly
> not a MySQL expert, but a 15GB database does not strike me as particularly
> large, by any metric. Maybe you would benefit by changing your backend DB
> to something that scales better. I know that other DBs support tablespaces
> that can allow you to move certain tables to different filesystems/locations
> to provide for more parallel partitioning across more I/O resources.

Our desire to store attachments outside of the database, at this point, has little to do with application performance and everything to do with backups, disaster recovery, upgrades, and downtimes.  That being said, I do know that there are big performance gains that come from storing attachments outside of the database.  Check out one of Bugzilla's core developers discussion of this issue and their work-in-progress implementation: https://bugzilla.mozilla.org/show_bug.cgi?id=577532.  So moving attachments out of the database *is* an actual tuning option, just like the other options you mentioned.  Why do something drastic like changing the database backend or performing complicated and expert-level tuning/sharding/partitioning, when I could just add a few config options to RT_SiteConfig.pm and run a script (for a pre-existing instance) that then sets up my instance to serve attachments from a filesystem instead of the db?

Here's one recent example of how our current database size is negatively impacting us:  We upgraded from 3.8.4 to 4.0.4 yesterday and it took almost an hour to dump our database and almost an hour to import the database (we were upgrading MySQL and the OSes as well).  And then we had to import it again because max_packet_size was set too small (which wouldn't have been a problem if attachments were outside the db: anecdotal and not logical argument, but nonetheless a real-world occurrence as errors happen) so add another hour instead of only another 10 minutes.  If attachments were stored outside of the database, we could have reduced just the backup and import phases from 3 hours to 20 minutes.  That is a huge difference, especially when your application is used by thousands of customers waiting to log back in.  The positive ramifications continue: internal development of RT is much faster with a small database because we can copy them around the network faster, perform imports in 1/10th the time, and keep our development database up-to-date much easier.

If someone knew of a simpler way to cut the dump and restore times by 1/10, I would love to hear it and be totally open to a different solution.

The main point I would like to restate is that larger or quickly-growing instances of RT are very different than smaller or slowly-growing instances.  One pain point of the larger instances is the size of the database and how that affects backups, restores, disaster recoveries, and development.  Having the option to store attachments outside of the database allows the larger RT instances to more easily manage their data for a much longer period of time.  Most importantly for the Best Practical folks, this option increase the appeal of RT to larger organizations instead of the "small- to medium-sized" market as stated at http://requesttracker.wikia.com/wiki/ManualIntroduction.  The addition of this feature along with the recent SphinxSE option truly makes RT more feasible and attractive to larger organizations.

Kind regards, Geoff Mayes

________________________________________
From: rt-users-bounces at lists.bestpractical.com [rt-users-bounces at lists.bestpractical.com] on behalf of Joe Harris [drey111 at gmail.com]
Sent: Thursday, December 22, 2011 9:43 AM
To: rt-users at lists.bestpractical.com
Subject: Re: [rt-users] Option to store attachments on the filesystem

I am looking into this type of functionality as well. We were thinking of an NSF share in a web directory to drop the attachment with a way to drop a link within the ticket. So the attachments may not even exist on the RT server, but there will be links in the ticket to a web server that houses the attachment.




On Dec 22, 2011, at 9:42 AM, "ktm at rice.edu" <ktm at rice.edu> wrote:

> On Wed, Dec 21, 2011 at 11:12:04PM +0000, Geoff Mayes wrote:
>> Hello RT Users and Developers,
>>
>> Our RT instance at the University of Oregon is outgrowing the standard settings in some ways.  One way is with attachments.  The size of our database is 15.3GB and 13.7GB of that comes from the Attachments table.  If our attachments were stored on a high-performance fileserver (or locally if you prefer), our database would shrink to 1.6GB.  This would have numerous positive ramifications:
>>
>> - Database dumps/backups would finish in 1/10 the time
>> - Database restores would finish in 1/10 the time
>> - Planned downtimes and disaster recovery situations could be more nimbly performed (scp'ing around the db dump, restoring, etc)
>> - Backups could be taken much more frequently
>> - More backups could be stored
>> - MySQL replication would be more robust with less binary data to chew on
>> - Larger attachments could be permitted because there would be less fear of the database growing too quickly
>> - Reduced database load querying/inserting/deleting/joining attachments
>>
>> I've read in previous posts to this mailing list (see below) that the arguments against this are that (1) attachments on the filesystem can't be searched and (2) the data backing the application will not be in one tidy database package but instead spread out across the db and filesystem.  For our instance we don't care about #1, and for #2, while I understand the argument, I would actually argue the opposite: when attachments are on a high-performance, redundant SAN managed by a dedicated storage team that I don't have to worry about, my job administering RT just got a whole lot easier because I only have to worry about ensuring the fileserver is mounted and $AttachmentsPath (just an example config option) is properly set.  I worked previously at a company that ran one of the largest instances of Bugzilla in the world and we served up 30TB of attachments over a fileserver without any problems.  Can you imagine those attachments in a MySQL database?  When ticket tracking sy
 s
> te
>> ms are no longer small-ish, moving attachments out of the database becomes a must.
>>
>> I'm not asking the RT folks to switch attachment storage to the filesystem instead of the database.  My wish is that RT offers its administrators the ability to choose one or the other.  I know this has been a hot topic in the past, but I was hoping we could revisit the issue.  Best Practical folks -- are you open to this?  If so, would it help the process if I did all the work and submitted a patch?  If so, should I file a bug so that we can talk about the way you would like this implemented?
>>
>> Given my reading of the history of this issue, I think a lot of folks would benefit from this feature.  I've included previous postings about this issue below.  Let me know if I can help and how I can.  We would love to upstream a patch so our local instance doesn't diverge too severely from you all.
>>
>> Thanks for your consideration, Geoff Mayes
>>
>> One of the first, meaty discussions:
>> http://www.gossamer-threads.com/lists/rt/devel/706
>> http://www.gossamer-threads.com/lists/rt/devel/37733
>> http://www.gossamer-threads.com/lists/rt/users/39507
>> The best discussion of the issue:
>> http://www.gossamer-threads.com/lists/rt/users/67406
>> Best Practical has recently worked on this issue:
>> http://www.gossamer-threads.com/lists/rt/users/89596
>>
>
> Hi Geoff,
>
> I had thought that something like this had already been implemented
> by Best Practical for a customer. Hopefully, they can provide some
> feedback regarding the utility and possible problems of such an
> approach from personal experience. Maybe they would consider releasing
> it as an extenstion.
>
> As far as the assertion that "a lot of folks would benefit from this
> feature", I doubt that would be the case for the vast majority of RT
> users. Most users can handle "one-stop-shopping" type applications
> with far fewer problems. Once you divorce the metadate repository
> from the actual ticket data, you add a whole slew of different failure
> modes that will require much more sophisticated administration processes
> to prevent, ameliorate, or recover from. Your reference to leveraging
> an existing SAN+SAN management team gives a hint to the increase in
> both complexity and cost of running an instance.
>
> There are a wide range of RT users from systems that manage a handful
> of tickets a week all the way to systems handling thousands of tickets
> or more a week. Those on the small end can/should use whatever DB
> backend that they are familiar with to simplify administration and
> the "what did I do?!" errors due to a lack of familiarity. As you
> move towards larger implementations, your DB backend needs to be
> chosen based on it viability in an enterprise/large-scale environment.
> I do not know the level of your local MySQL expertise and I am certainly
> not a MySQL expert, but a 15GB database does not strike me as particularly
> large, by any metric. Maybe you would benefit by changing your backend DB
> to something that scales better. I know that other DBs support tablespaces
> that can allow you to move certain tables to different filesystems/locations
> to provide for more parallel partitioning across more I/O resources.
>
> Sorry for the slight ramble. I am looking forward to this discussion and
> if this feature is added some documentation describing when and when not
> to use it will be essential.
>
> Regards,
> Ken
>> --------
>> RT Training Sessions (http://bestpractical.com/services/training.html)
>> * Boston  March 5 & 6, 2012
>>
> --------
> RT Training Sessions (http://bestpractical.com/services/training.html)
> * Boston  March 5 & 6, 2012
--------
RT Training Sessions (http://bestpractical.com/services/training.html)
* Boston  March 5 & 6, 2012



More information about the rt-users mailing list