s3cmd Tutorial
Contents
IMPORTANT NOTE - Please read before you proceed
- I wrote this tutorial for a course, Parallel Programming, which I TA’ed at Johns Hopkins. There is some text in here which is related to a few assignments in the course. If you are not a student in this course, please ignore these points.
- If you find the commands below confusing, you can access a GUI to s3 from within AWS Console in the browser. There are other options: Firefox plugin, Chrome plugin, Windows Application, MAC-OS Application.
- You cannot have an underscore '_' or uppercase letters in a bucket name. For more details, please refer to the AWS Bucket Restrictions and Limitations.
- For power users, who intend to upload/download terabytes of data from AWS S3(as might be the case with some people using NeuroData services), this might not be the ideal application. The AWS team wrote an awesome tool called AWS Command Line Interface which enables you to transfer data using multiple threads. They have a pretty extensive documentation for this here. There is a subtle difference between aws s3 and aws s3api, former deals with the bucket as a whole, the latter deals with individual s3 objects.
Installing s3cmd
- Choose one based on your Linux variant.
- Parallel Programming students if you are not sure then use Ubuntu which is our default
Ubuntu, Debian-based Linux
- Install s3cmd on the system
CentO, RHEL, Fedora-based LINUX
- You will need to add an external repository to enable to install this package. The repo file can be downloaded here.
- If you do not know how to add repo files the you can learn here.
- Install s3cmd on the system
Configuring s3cmd
- Now we configure this to your AWS account. This is necessary as there has to be an associated user with every bucket (they have to attribute the storage cost to someone!!)
- Now it will prompt you for an Access key and Secret Key. For these, you do the following
- Log in to your AWS account (I hope you have one!!!)
- Click on your username in the upper-right corner and select the My Security Credentials option
- Choose the option “Access Keys (Access Key ID and Secret Access Key)”
- Create a new Access Key
- Access Key ID on AWS = Access Key on s3cmd and Secret Access Key on AWS = Secret Key on s3cmd
- You can only access the Secret Access Key once. Either download it and keep it in a safe location or copy it directly into the command line. AWS allows you the option to create a new Access Key if you do not have access to the old one.
- Next are a bunch of options we do not care about for this tutorial. We will skip them by pressing <Enter>
- Encryption Password, Press <Enter> here to skip this step
- Path to GPG program, Press <Enter> here to skip this step
- Use HTTPS Protocol, Press <Enter> here to skip this step
- HTTP Proxy Server Name, Press <Enter> here to skip this step
- Test access with supplied credentials?, Type <y>
- Success. Your access key and secret key worked fine :-), You will see this message. This indicates that the setup was correct.
- Save Settings?, Type <y>
- Usually the configuration will be saved to a file called .s3cfg in your home directory.
Usage
Fetch data from a bucket
- This bucket should exist and should have the data set to Public Readable.
Create a bucket
- The bucket name has to be unique to all the buckets in the world. Think of it has a DNS Name which cannot be duplicated.
Put Data into a bucket
- Notice the subtle differences in the syntax here for what you want to do.
- The command below will copy the directory and the contents within it to the s3 bucket.
- The command below will copy only the contents of the directory to the s3 bucket. It will not create this directory on the s3 bucket but will create any nested directories.
View Contents in a bucket
- List the directories,files recursively in your bucket. Also, look at the files/directories for a publicly readable bucket.
Set the ACL for a bucket
- By default the contents of your bucket are private and can be only read,written by you.
- This command makes the contents of your bucket publicly readable
- This command can also be written without a trailing slash to the bucket name. The difference in this case is that the bucket name itself won’t get the public permission but it’s contents will.
Check the ACL for a bucket
- You can check some information about your bucket, this includes ACL and the http link
- This will show an output similar to one shown below. The point to note is that it is readable by anonymous which translates to public.
Delete a file from the bucket
- This command will delete any data from your bucket.
- You cannot get this data back. Use with care.
Remove a bucket
- This command will remove your bucket forever. You cannot get the data back if you do this.
- Only do this once you have your grades for this assignment.
There is much more you can do, just type the magic words to learn more.