通过 Apify 自动运行网页抓取任务、管理数据集并获取结构化采集结果。
复制安装指令,让 AI 自动完成配置 · 推荐新手
请帮我安装 askskill 上的 "Apify Automation" 技能: 1. 下载 https://raw.githubusercontent.com/ComposioHQ/awesome-claude-skills/master/composio-skills/apify-automation/SKILL.md 2. 保存为 ~/.claude/skills/apify-automation/SKILL.md 3. 装好后重载技能,告诉我可以用了
请通过 Apify 运行一个电商商品抓取 Actor,采集关键词“无线耳机”前 100 条商品,返回商品名、价格、评分、评论数和链接,并将结果保存到数据集。
返回一次抓取任务结果,包含结构化商品数据及对应数据集信息。
帮我基于一个新闻网站抓取 Actor 创建可复用任务,固定抓取科技频道最新文章,并说明之后如何直接运行这个任务获取标题、发布时间和正文链接。
生成可复用任务配置,并给出后续执行任务与读取结果的方法。
请从 Apify 中读取我最近一次房源抓取的数据集结果,整理出城市、租金、面积、户型和房源链接,按租金从低到高列出前 20 条。
返回清洗后的历史采集数据摘要,并按要求完成排序与筛选。
Run Apify web scraping Actors and manage datasets directly from Claude Code. Execute crawlers synchronously or asynchronously, retrieve structured data, create reusable tasks, and inspect run logs without leaving your terminal.
Toolkit docs: composio.dev/toolkits/apify
https://rube.app/mcp
Execute an Actor and immediately retrieve its dataset items in a single call. Best for quick scraping jobs.
Tool: APIFY_RUN_ACTOR_SYNC_GET_DATASET_ITEMS
Key parameters:
actorId (required) -- Actor ID in format username/actor-name (e.g., compass/crawler-google-places)input -- JSON input object matching the Actor's schema. Each Actor has unique field names -- check apify.com/store for the exact schema.limit -- max items to returnoffset -- skip items for paginationformat -- json (default), csv, jsonl, html, xlsx, xmltimeout -- run timeout in secondswaitForFinish -- max wait time (0-300 seconds)fields -- comma-separated list of fields to includeomit -- comma-separated list of fields to excludeExample prompt: "Run the Google Places scraper for 'restaurants in New York' and return the first 50 results"
Trigger an Actor run without waiting for completion. Use for long-running scraping jobs.
Tool: APIFY_RUN_ACTOR
Key parameters:
actorId (required) -- Actor slug or IDbody -- JSON input object for the Actormemory -- memory limit in MB (must be power of 2, minimum 128)timeout -- run timeout in secondsmaxItems -- cap on returned itemsbuild -- specific build tag (e.g., latest, beta)Follow up with APIFY_GET_DATASET_ITEMS to retrieve results using the run's datasetId.
Example prompt: "Start the web scraper Actor for example.com asynchronously with 1024MB memory"
Fetch data from a specific dataset with pagination, field selection, and filtering.
Tool: APIFY_GET_DATASET_ITEMS
Key parameters:
datasetId (required) -- dataset identifierlimit (default/max 1000) -- items per pageoffset (default 0) -- pagination offsetformat -- json (recommended), csv, xlsxfields -- include only specific fieldsomit -- exclude specific fieldsclean -- remove Apify-specific metadatadesc -- reverse order (newest first)Example prompt: "Get the first 500 items from dataset myDatasetId in JSON format"
View Actor metadata, input schema, and configuration before running it.
Tool: APIFY_GET_ACTOR
Key parameters:
actorId (required) -- Actor ID in format username/actor-name or hex IDExample prompt: "Show me the details and input schema for the apify/web-scraper Actor"
Configure reusable Actor tasks with preset inputs for recurring scraping jobs.
Tool: APIFY_CREATE_TASK
Configure a task once, then trigger it repeatedly with consistent input parameters. Useful for scheduled or recurring data collection workflows.
Example prompt: "Create an Apify task for the Google Search scraper with default query 'AI startups' and US location"
List Actor runs, browse datasets, and inspect run details for monitoring and debugging.
Tools: APIFY_GET_LIST_OF_RUNS, APIFY_DATASETS_GET, APIFY_DATASET_GET, APIFY_GET_LOG
For listing runs:
…
通过 Rube MCP 自动化 Agiled 中的客户、项目与业务流程操作。
智能选择抓取策略并生成 TypeScript 优先的网页采集方案与代码