<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Web Scraping on Python Brasil — Aprenda Python em Português</title>
    <link>https://python.dev.br/tags/web-scraping/</link>
    <description>Recent content in Web Scraping on Python Brasil — Aprenda Python em Português</description>
    <generator>Hugo</generator>
    <language>pt-br</language>
    <lastBuildDate>Wed, 20 May 2026 10:33:48 +0000</lastBuildDate>
    <atom:link href="https://python.dev.br/tags/web-scraping/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Crawl4AI: Web Scraping com IA em Python</title>
      <link>https://python.dev.br/blog/crawl4ai-web-scraping-ia-python/</link>
      <pubDate>Mon, 27 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/blog/crawl4ai-web-scraping-ia-python/</guid>
      <description>&lt;p&gt;Web scraping tradicional exige que você escreva seletores CSS, XPath ou expressões regulares para cada site. Quando a estrutura da página muda, tudo quebra. O &lt;strong&gt;Crawl4AI&lt;/strong&gt; propõe uma abordagem diferente: usar &lt;strong&gt;modelos de linguagem (LLMs)&lt;/strong&gt; para entender o conteúdo da página e extrair dados de forma inteligente, sem depender da estrutura HTML.&lt;/p&gt;&#xA;&lt;p&gt;Com mais de 30 mil estrelas no GitHub, o Crawl4AI se tornou a principal biblioteca open-source de web scraping com IA em Python. Neste artigo, vamos ver como instalar, configurar e usar o Crawl4AI para extrair dados estruturados de qualquer página web.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Web Scraping com Python: Tutorial Completo</title>
      <link>https://python.dev.br/blog/web-scraping-python/</link>
      <pubDate>Sun, 15 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/blog/web-scraping-python/</guid>
      <description>&lt;p&gt;Web scraping é a técnica de extrair dados de páginas web automaticamente. Python é a linguagem mais popular para isso, graças a bibliotecas como &lt;strong&gt;requests&lt;/strong&gt; e &lt;strong&gt;BeautifulSoup&lt;/strong&gt;. Neste tutorial, você vai aprender desde o básico até projetos práticos completos.&lt;/p&gt;&#xA;&lt;h2 id=&#34;configuração-inicial&#34;&gt;Configuração Inicial&lt;/h2&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Instalar as bibliotecas necessárias&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# pip install requests beautifulsoup4 lxml&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;requests&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;bs4&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;BeautifulSoup&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h2 id=&#34;conceitos-básicos&#34;&gt;Conceitos Básicos&lt;/h2&gt;&#xA;&lt;h3 id=&#34;fazendo-requisições-http&#34;&gt;Fazendo Requisições HTTP&lt;/h3&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;requests&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# GET simples&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;requests&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;https://httpbin.org/get&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Status: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;status_code&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Tipo do conteúdo: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;headers&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s1&#34;&gt;&amp;#39;content-type&amp;#39;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;Tamanho: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;nb&#34;&gt;len&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;text&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt; caracteres&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Com headers personalizados (boa prática!)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;headers&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;s2&#34;&gt;&amp;#34;User-Agent&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Mozilla/5.0 (Windows NT 10.0; Win64; x64) &amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                  &lt;span class=&#34;s2&#34;&gt;&amp;#34;AppleWebKit/537.36 (KHTML, like Gecko) &amp;#34;&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;                  &lt;span class=&#34;s2&#34;&gt;&amp;#34;Chrome/120.0.0.0 Safari/537.36&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;s2&#34;&gt;&amp;#34;Accept-Language&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;pt-BR,pt;q=0.9,en;q=0.8&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;requests&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;https://httpbin.org/headers&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;headers&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;headers&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;json&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;())&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Com parâmetros de query&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;params&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;q&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;python&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;page&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;}&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;requests&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;get&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;https://httpbin.org/get&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;params&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;params&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;sa&#34;&gt;f&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;URL final: &lt;/span&gt;&lt;span class=&#34;si&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;response&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;url&lt;/span&gt;&lt;span class=&#34;si&#34;&gt;}&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;h3 id=&#34;entendendo-html&#34;&gt;Entendendo HTML&lt;/h3&gt;&#xA;&lt;p&gt;Antes de fazer scraping, é essencial entender a estrutura básica do HTML:&lt;/p&gt;</description>
    </item>
    <item>
      <title>Selenium: O que É e Como Funciona | Python Brasil</title>
      <link>https://python.dev.br/glossario/selenium/</link>
      <pubDate>Sun, 22 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/glossario/selenium/</guid>
      <description>&lt;h2 id=&#34;o-que-e-selenium&#34;&gt;O que e Selenium?&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;Selenium&lt;/strong&gt; e uma ferramenta de automacao de navegadores web que permite controlar o Chrome, Firefox, Edge e outros navegadores programaticamente usando Python. Diferente de BeautifulSoup e Requests que trabalham apenas com HTML estatico, Selenium executa JavaScript, interage com elementos dinamicos e simula acoes reais de um usuario — cliques, digitacao, scroll e navegacao.&lt;/p&gt;&#xA;&lt;p&gt;Selenium e usado em dois cenarios principais: &lt;strong&gt;web scraping&lt;/strong&gt; de sites que dependem de JavaScript para renderizar conteudo e &lt;strong&gt;testes automatizados&lt;/strong&gt; de interfaces web.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Automatização com Python: Guia Prático</title>
      <link>https://python.dev.br/blog/automatizacao-com-python/</link>
      <pubDate>Wed, 10 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/blog/automatizacao-com-python/</guid>
      <description>&lt;p&gt;Uma das maiores forças do Python é a capacidade de automatizar tarefas repetitivas. Se você passa tempo fazendo algo manual no computador, provavelmente dá para automatizar com Python. Neste guia prático, a gente vai explorar as formas mais comuns de automação com exemplos prontos para usar.&lt;/p&gt;&#xA;&lt;h2 id=&#34;manipulação-de-arquivos-e-pastas&#34;&gt;Manipulação de Arquivos e Pastas&lt;/h2&gt;&#xA;&lt;h3 id=&#34;organizando-arquivos-por-extensão&#34;&gt;Organizando arquivos por extensão&lt;/h3&gt;&#xA;&lt;p&gt;Um dos scripts mais úteis do dia a dia: organizar aquela pasta de downloads bagunçada.&lt;/p&gt;</description>
    </item>
    <item>
      <title>BeautifulSoup: O que É e Como Funciona | Python Brasil</title>
      <link>https://python.dev.br/glossario/beautifulsoup/</link>
      <pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/glossario/beautifulsoup/</guid>
      <description>&lt;h2 id=&#34;o-que-e-beautifulsoup&#34;&gt;O que e BeautifulSoup?&lt;/h2&gt;&#xA;&lt;p&gt;&lt;strong&gt;BeautifulSoup&lt;/strong&gt; e uma biblioteca Python para parsing (analise sintatica) de documentos HTML e XML. Ela cria uma arvore de navegacao que permite buscar, navegar e extrair dados de paginas web de forma simples e intuitiva. BeautifulSoup e a ferramenta mais popular para &lt;strong&gt;web scraping&lt;/strong&gt; em Python — a pratica de extrair dados de sites automaticamente.&lt;/p&gt;&#xA;&lt;p&gt;A biblioteca nao faz requisicoes HTTP por conta propria. Ela trabalha em conjunto com bibliotecas como &lt;code&gt;requests&lt;/code&gt; para obter o HTML e entao analisa-lo e extrai as informacoes desejadas.&lt;/p&gt;</description>
    </item>
    <item>
      <title>Automacao Web com Selenium e Python — 2025 | Python Brasil</title>
      <link>https://python.dev.br/blog/python-e-selenium-automacao-web/</link>
      <pubDate>Mon, 25 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://python.dev.br/blog/python-e-selenium-automacao-web/</guid>
      <description>&lt;p&gt;Selenium e a ferramenta mais popular para automacao de navegadores web. Com Python e Selenium, voce pode automatizar testes, preencher formularios, extrair dados de sites dinamicos e muito mais. Neste guia, a gente vai aprender desde a configuracao ate tecnicas avancadas de automacao.&lt;/p&gt;&#xA;&lt;h2 id=&#34;instalacao-e-configuracao&#34;&gt;Instalacao e Configuracao&lt;/h2&gt;&#xA;&lt;p&gt;Instale o Selenium e o gerenciador de drivers:&lt;/p&gt;&#xA;&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install selenium webdriver-manager&#xA;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;O &lt;code&gt;webdriver-manager&lt;/code&gt; baixa automaticamente o driver correto para o seu navegador, eliminando a necessidade de configuracao manual.&lt;/p&gt;</description>
    </item>
  </channel>
</rss>
