自动抓取网页与整站结构化数据,批量提取并映射网站内容。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "Firecrawl Automation" 技能: 1. 下载 https://raw.githubusercontent.com/ComposioHQ/awesome-claude-skills/master/composio-skills/firecrawl-automation/SKILL.md 2. 保存为 ~/.claude/skills/firecrawl-automation/SKILL.md 3. 装好后重载技能,告诉我可以用了
请使用 Firecrawl Automation 抓取以下 50 个产品页面,提取产品名称、价格、规格、库存状态和主图链接,并输出为结构化 JSON。若字段缺失,请标记为空值。
包含每个产品关键字段的结构化 JSON 列表,便于后续分析或导入。
请使用 Firecrawl Automation 爬取这个博客网站,遍历所有文章页,提取标题、作者、发布日期、分类、摘要和 URL,并按时间倒序整理。
一份按时间排序的文章清单,可用于内容审计、研究或选题分析。
请使用 Firecrawl Automation 分析这个网站的结构,输出主要栏目、子页面层级、重要落地页以及可访问 URL 清单,帮助我快速了解信息架构。
网站结构映射结果,包含栏目层级与页面清单,便于导航分析和站点梳理。
Run Firecrawl web crawling and extraction directly from Claude Code. Scrape individual pages, crawl entire sites, extract structured data with AI, batch process URL lists, and map website structures without leaving your terminal.
Toolkit docs: composio.dev/toolkits/firecrawl
https://rube.app/mcp
Fetch content from a URL in multiple formats with optional browser actions for dynamic pages.
Tool: FIRECRAWL_SCRAPE
Key parameters:
url (required) -- fully qualified URL to scrapeformats -- output formats: markdown (default), , , , , htmlrawHtmllinksscreenshotjsononlyMainContent (default true) -- extract main content only, excluding nav/footer/adswaitFor -- milliseconds to wait for JS rendering (default 0)timeout -- max wait in ms (default 30000)actions -- browser actions before scraping (click, write, wait, press, scroll)includeTags / excludeTags -- filter by HTML tagsjsonOptions -- for structured extraction with schema and/or promptExample prompt: "Scrape the main content from https://example.com/pricing as markdown"
Discover and scrape multiple pages from a website with configurable depth, path filters, and concurrency.
Tool: FIRECRAWL_CRAWL_V2
Key parameters:
url (required) -- starting URL for the crawllimit (default 10) -- max pages to crawlmaxDiscoveryDepth -- depth limit from the root pageincludePaths / excludePaths -- regex patterns for URL pathsallowSubdomains -- include subdomains (default false)crawlEntireDomain -- follow sibling/parent links, not just children (default false)sitemap -- include (default), skip, or onlyprompt -- natural language to auto-configure crawler settingsscrapeOptions_formats -- output format for each pagescrapeOptions_onlyMainContent -- main content extraction per pageExample prompt: "Crawl the docs section of firecrawl.dev, max 50 pages, only paths matching docs"
Extract structured JSON data from web pages using AI with a natural language prompt or JSON schema.
Tool: FIRECRAWL_EXTRACT
Key parameters:
urls (required) -- array of URLs to extract from (max 10 in beta). Supports wildcards like https://example.com/blog/*prompt -- natural language description of what to extractschema -- JSON Schema defining the desired output structureenable_web_search -- allow crawling links outside initial domains (default false)At least one of prompt or schema must be provided.
Check extraction status with FIRECRAWL_EXTRACT_GET using the returned job id.
Example prompt: "Extract company name, pricing tiers, and feature lists from https://example.com/pricing"
Scrape many URLs concurrently with shared configuration for efficient bulk data collection.
Tool: FIRECRAWL_BATCH_SCRAPE
Key parameters:
urls (required) -- array of URLs to scrapeformats -- output format for all pages (default markdown)onlyMainContent (default true) -- main content extractionmaxConcurrency -- parallel scrape limitignoreInvalidURLs (default true) -- skip bad URLs instead of failing the batchlocation -- geolocation settings with country codeactions -- browser actions applied to each pageblockAds (default true) -- block advertisementsExample prompt: "Batch scrape these 20 product page URLs as markdown with ad blocking"
…
通过 Rube MCP 自动化 Agiled 中的客户、项目与业务流程操作。
通过 Apify 自动运行网页抓取任务、管理数据集并获取结构化采集结果。