Delete s3 data in AWS data pipeline

Server Management Service

It is possible to delete s3 data in the AWS data pipeline using the console or using the AWS CLI.

Here at Ibmi Media, we regularly help our Customers to fix AWS related tasks as part of our Server Management Services.

In this context, we shall look into the steps to delete s3 data in the AWS data pipeline.

More information about AWS Data Pipeline?

AWS Data Pipeline is a web service. It is mainly designed to make it easier for users to integrate data spread across multiple AWS services and analyze it from a single location.

Here are some of its features;

i. Simple and cost-efficient

The drag-and-drop feature makes the creation of pipelines easier on the console. Also, the visual pipeline creator provides a library of pipeline templates.

ii. Reliable

Its infrastructure is designed for fault-tolerant execution activities. In case if any failures occur in the activity logic or data sources, then AWS Data Pipeline automatically retries the activity. However, if the failure continues, then it will send a failure notification.

iii. Flexible

AWS Data Pipeline provides many features that include scheduling, tracking, error handling, etc. We can configure it such a way that it can take actions like run Amazon EMR jobs, execute SQL queries directly against databases, execute custom applications running on Amazon EC2, etc.

iv. Scalable

AWS Data Pipeline makes it easy to dispatch work to one machine or many, in serial or parallel. With its flexible design, processing a million files is as easy as processing a single file.

How to Set Up Data Pipeline ?

To create the pipeline, follow the steps below:

1. First, we sign-in to the AWS account.

2. Next, we use this link to Open AWS Data Pipeline console .

3. Here we select the region in the navigation bar.

4. Then we click the Create New Pipeline button.

5. After that, we fill in the required details in the respective fields;

i. In the Source field, we choose 'Build using a template' and then select this template − Getting Started using ShellCommandActivity.

ii. The Parameters section opens only when we select the template. We leave the S3 input folder and Shell command to run with their default values. After that, we click the folder icon next to the S3 output folder and select the buckets.

iii. In Schedule, we leave the values as default.

iv. In Pipeline Configuration, we leave the logging as enabled. Then we click the folder icon under the S3 location for logs and select the buckets.

v. After that, in Security/Access, we leave IAM roles values as default.

vi. Finally, we click the Activate button.

How to Delete S3 data in AWS Data Pipeline?

When we no longer need a pipeline, such as a pipeline that we created during application testing, we should delete it to remove it from active use. Deleting a pipeline will put it into a deleting state.

When the pipeline is in the deleted state, its run history and pipeline definition are gone. Therefore, we will no longer be able to perform operations on the pipeline, including describing it. We can’t restore a pipeline once we delete it. Moreover, deleting a pipeline will also delete all its associated objects.