Box Eliminates Unlimited Storage for Education
In late 2019, Box announced changes to their pricing model for all educational institutions. Box eliminated the unlimited file storage agreement at the current annual spend. (See: https://it.wisc.edu/news/new-storage-quotas-for-box-after-unexpected-contract-changes/, https://bconnected.berkeley.edu/projects/box-service-changes) We will look at one solution to this problem using Box to AWS S3.
Many universities stored petabytes of data in Box based on their previous contracts for unlimited storage. The contract changes are leaving many schools scrambling to find other solutions.
AWS S3 to the Rescue
AWS has an inexpensive highly available storage solution that is web addressable. The S3 family of products has storage options from $0.023 per GB to $0.00099 per GB as of July 2020. For comparison, 1 PB of data stored at $0.023 per GB per month is $22,583.30 per month and stored at $0.00099 is $1,038.09 per month.
If an university is storing petabytes of data in Box, it probably is not all users’ personal documents and data, but includes research data sets and other backups. This use case lends itself to long term storage with infrequent access such as S3 Glacier Deep Archive, the lowest end of the cost spectrum.
One Solution using AWS S3
Here is a brief outline of a solution for a Box to AWS S3 storage migration for a small group of users using FTP to access their individual files, which are inaccessible by other users. This was designed with Box to AWS S3 in mind but could be used for any AWS S3 storage project.
- Amazon S3 – used for storage of documents with either individual buckets or prefixes in single bucket (e.g. bucketname.s3.amazonaws.com/team-a).
- IAM Role – one for each user/team/bucket that needs to be separated from other users. Only users with the ability to assume the role would have access to the bucket or bucket prefix.
- Amazon Single Sign-On – provides an easy to use solution to grant users permissions to assume an IAM role. It can be used stand alone or integrated with AD or SAML for authentication.
- User access to files:
- AWS Command Line Interface – users can install and run simple AWS CLI commands such as “s3 sync” to keep their files in sync.
- AWS Transfer for SFTP – users can use familiar tools like SFTP to transfer files to AWS S3.
- AWS Storage Gateway – can be used to access files in S3 with a shared network drive (SMB or NFS), with a local VM server.
Final Thoughts
I don’t know of any universities that are willing to go from spending nothing (unlimited Box storage) to over $22,000 per PB per month. Careful use of S3 storage classes is the key to reducing the price. While close to $1000 per PB per month is possible, the most likely scenario is using multiple storage classes for different storage use cases. Ideally, no more than a small subset of data is frequently accessed. Infrequently accessed or archival data, that users can wait up to 12 hours to retrieve, can be shifted to lowest cost storage, with multiple variations in between.
Using S3 for storage has other benefits, especially for researchers such as:
- Amazon Athena allows the ability to query items in S3 buckets.
- Amazon S3 Batch allows Lambda functions to run on files in S3 buckets.
- Amazon EMR can run big data operations on S3 buckets.
This is not meant to be an implementation guide, but rather a high level solution outline. If you are interested in exploring such a solution, please reach out to us for more information.