Skip to main content


Welcome to Daily Updates

*** Happy Learning***

Quick Links


==> Online Fun Games hub

==>Daily Job Updates IT and Non IT

==>Every month Current Affairs

==>Every month Dividend, Bonus Issue and Stock Split information

==> JAVASCRIPT QUIZ

==>AADHAAR download, Date of birth,name and Address correction, PVC card and more links

==> MICROSOFT EXCEL TUTORIAL

Unix/Linux Basics for Data Engineers – Complete ETL Server Guide

Unix/Linux Basics for Data Engineers – Complete ETL Server Guide

Linux commands are essential for every Data Engineer working with ETL pipelines. Most production data pipelines run on Linux servers, and understanding command-line tools helps in debugging, automation, monitoring, and file processing.

1. pwd – Check Current Directory

Purpose: Shows your current working directory.

pwd

Example Output:

/home/etl_user/projects/sales_pipeline
ETL Scenario: Before running a pipeline script, confirm you're inside the correct project directory.

2. ls -l – List Files with Permissions

Purpose: Displays files with detailed permissions and ownership.

ls -l

Example Output:

-rwxr-xr-- 1 etl_user data_team 2048 Mar 1 02:00 etl_job.sh
ETL Scenario: Verify whether your ETL script has execution permission.

3. head – View Beginning of File

Purpose: View first few lines of a large CSV file.

head -n 10 sales.csv

Example Output:

id,name,amount
1,John,200
2,Alice,150
ETL Scenario: Validate headers before loading data into database.

4. tail – View End of Log File

Purpose: View last lines of log file.

tail -n 50 etl_job.log

Live Monitoring:

tail -f etl_job.log
ETL Scenario: Most ETL failures appear at the end of logs.

5. grep – Search Errors in Logs

Purpose: Search specific keywords inside files.

grep -i "error" etl_job.log

Example Output:

ERROR: Database connection timeout
ETL Scenario: Quickly identify root cause of job failure.

6. wc -l – Count Records

Purpose: Count number of rows in file.

wc -l sales.csv

Example Output:

100001 sales.csv
ETL Scenario: Validate source vs target record count.

7. awk – Perform Calculations

Purpose: Perform column-based operations.

awk -F',' '{sum+=$3} END {print sum}' sales.csv

Example Output:

350000
ETL Scenario: Validate total revenue before loading to warehouse.

8. chmod – Fix Permission Issues

Purpose: Grant execution permission to script.

chmod +x etl_job.sh
ETL Scenario: Fix "Permission Denied" error in production.

9. ps -ef – Check Running Jobs

Purpose: See running processes.

ps -ef | grep etl
ETL Scenario: Check whether scheduled pipeline is still running.

10. kill -9 – Stop Stuck Job

Purpose: Terminate process forcefully.

kill -9 12345
ETL Scenario: Stop a frozen ETL process consuming high CPU.

11. crontab – Schedule ETL Job

Edit Cron:

crontab -e

Example (Run Daily at 2 AM):

0 2 * * * /home/etl_user/etl_job.sh >> job.log 2>&1
ETL Scenario: Automate daily data warehouse load.

12. df -h – Check Disk Space

Purpose: Check disk usage.

df -h
ETL Scenario: Full disk is a common cause of ETL failures.

13. gzip – Compress Data Files

Purpose: Compress file before transfer.

gzip sales.csv

Result:

sales.csv.gz
ETL Scenario: Compress file before sending to S3 or FTP.

Conclusion

Mastering these Unix/Linux commands enables Data Engineers to debug production issues, monitor ETL pipelines, validate data, and automate workflows efficiently. Strong command-line knowledge significantly improves troubleshooting speed and reliability in real-world data engineering environments.

Comments

Popular posts from this blog

JNTUK R16 SGPA and CGPA calculator for Lateral entry b.tech

Lateral entry students those are joined directly engineering by completing polytechnic, they may or may not appeared for ECET for getting seat in engineering course. that is B.Tech students studied course in 4 years but lateral entry students studied course is 3 years, that one year spend in polytechnic course. Lateral entry students strong in Technically than regular students. for SGPA calculator -   click here NOTE: IF ANYONE WANT CALCULATE UPTO SOME SEMISTERS(LIKE UPTO 3-2) FOR PLACEMENTS CAN PROVIDE REMAINING SGPAS AND TOTAL CREDITS AS ZEROS(0) THEN WILL GET ACCURATE CGPA TILL THAT PARTICULAR SEMISTER. FOR LATERAL ENTRY SCHEME B.TECH CGPA IS... FIRST SEMISTER SGPA    total credits   SECOND SEMISTER SGPA    total credits   THIRD SEMISTER SGPA    total credits   FOURTH SEMISTER SGPA    total credits  ...

JNTUK Convocation VIII for 2018-19 and 2019-20 batch OD apply

JNTUK Convocation VIII for 2018-19 and 2019-20 batch, who have taken the PC in the period of 01/01/2019 to 31/12/2020 For more details Click here last Date to apply - 18-12-2021 Required Documents: 1. PC 2. CMM (For UG ) 3. SSC 4. Recent Photo  5. For PG courses - Sem wise mark sheets 6. Adhaar front and back scanned copy 7. Bank challan or Payment - Those made offline payment should submit hard copies of above documents at the university examination Fee - Rs.2000/- Process to apply: 1. Register with Hall ticket number and email Id(user name) - Click here 2. Login with user name(E-mail ID) and password - click here 3. Enter the details and next step will be payment  Note: Updation of payment status into complete status may take 2-3 days 4. Then need to attach the required documents 5. Next step is to check the details and everything and then click on check box by agreeing that there information was furnished by you is true. 6. There will be no Submit option that is only downlo...

JNTUK R16, R19 SGPA CALCULATOR

JNTUK R16, R19 SGPA calculator, it is calculated as the sum of multiple of grade point with credit and then division with sum of credits to that semister. SGPA gives the marks in points out of 10 in the particular semister. Grade - Grade points 1. O - 10 2. S - 9 3. A - 8 4. B - 7 5. C - 6 6. D - 5 7. F - FAIL NOTE: IF ANY SEMISTER HAVE LESS THAN 9 SUBJECTS THEN PLACE ZEROS REMAINING BOXES IN GARDE POINT AS WELL AS CREDT BOX PLACE ZEROS. ACCURATE SGPA WILL COME. CALCULATE SGPA : FIRST SUBJECT Grade points    Credits   SECOND SUBJECT Grade points    credits   THIRD SUBJECT Grade points     credits   FOURTH SUBJECT Grade points    credits   FIFTH SUBJECT Grade points    credits   SIXTH SUBJECT Grade points    credits   SEVENTH SUBJECT Grade points     c...