Getting Structured Data From The Internet Running Web Crawlers/Scrapers On A Big Data Production Scale, Computer Science and Information Technology Books, Apress India

Getting Structured Data From The Internet Running Web Crawlers/Scrapers On A Big Data Production Scale by Patel, Apress India

Books from same Author: Patel

Books from same Publisher: Apress India

Related Category: Author List / Publisher List

Price: ₹ 1299.00/- [ 5.00% off ]

Seller Price: ₹ 1234.00

Estimated Delivery Time : 4-5 Business Days

Sold By: Meripustak Click for Bulk Order

In Stock

We deliver across all postal codes in India

Orders Outside India

Add To Cart

Outside India Order Estimated Delivery Time
7-10 Business Days

We Deliver Across 100+ Countries

MeriPustak’s Books are 100% New & Original

General Information
Author(s)	Patel
Publisher	Apress India
ISBN	9781484283844
Pages	397
Binding	Softcover
Language	English
Publish Year	January 2022

Description

Apress India Getting Structured Data From The Internet Running Web Crawlers/Scrapers On A Big Data Production Scale by Patel

Utilize web scraping at scale to quickly get unlimited amounts of free data available on the web into a structured format. This book teaches you to use Python scripts to crawl through websites at scale and scrape data from HTML and JavaScript-enabled pages and convert it into structured data formats such as CSV, Excel, JSON, or load it into a SQL database of your choice. This book goes beyond the basics of web scraping and covers advanced topics such as natural language processing (NLP) and text analytics to extract names of people, places, email addresses, contact details, etc., from a page at production scale using distributed big data techniques on an Amazon Web Services (AWS)-based cloud infrastructure. It covers developing a robust data processing and ingestion pipeline on the Common Crawl corpus, containing petabytes of data publicly available and a web crawl data set available on AWS's registry of open data. Getting Structured Data from the Internet also includes a step-by-step tutorial on deploying your own crawlers using a production web scraping framework (such as Scrapy) and dealing with real-world issues (such as breaking Captcha, proxy IP rotation, and more). Code used in the book is provided to help you understand the concepts in practice and write your own web crawler to power your business ideas. You will: Understand web scraping, its applications/uses, and how to avoid web scraping by hitting publicly available rest API endpoints to directly get data Develop a web scraper and crawler from scratch using lxml and BeautifulSoup library, and learn about scraping from JavaScript-enabled pages using Selenium Use AWS-based cloud computing with EC2, S3, Athena, SQS, and SNS to analyze, extract, and store useful insights from crawled pages Use SQL language on PostgreSQL running on Amazon Relational Database Service (RDS) and SQLite using SQLalchemy Review sci-kit learn, Gensim, and spaCy to perform NLP tasks on scraped web pages such as name entity recogni

You may also be interested in following Books

Data Analytics A Small Data Approach
Publish Year : January 2024
Rs. 1815.00 /-
Deep Learning A Comprehensive Guide (PB)
Publish Year : January 2024
Rs. 1451.00 /-
Deep Learning in Practice
Publish Year : January 2024
Rs. 1178.00 /-
Model-Driven DevOps Increasing agility and security in your physical network through DevOps 1st Edition
Publish Year : January 2024
Rs. 359.00 /-
Fundamentals of International Business 5e
Publish Year : January 2024
Rs. 449.00 /-

We sell 100% Genuine & New Books only!

Getting Structured Data From The Internet Running Web Crawlers/Scrapers On A Big Data Production Scale by Patel, Apress India

Books from same Author: Patel

Books from same Publisher: Apress India

Related Category: Author List / Publisher List

Orders Outside India

Description

Categories Link