Image2077

首页创作模型库灵感订阅个人中心
登录 / 注册
首页创作模型库灵感订阅个人中心
描述你的画面
Landscape 16:9 sankey diagram of a pretraining data mixture, three stages with translucent colored ribbons. LEFT (8 source blocks, heights proportional to tokens): "Common Crawl (web) 540B" (muted navy, largest), "arXiv papers 180B" (dusty teal), "GitHub code 160B" (slate gray), "Wikipedia 40B" (soft terracotta), "StackExchange QA 30B" (warm copper), "Books (public domain) 25B" (pale olive), "Patents 18B" (pale navy), "Curated news & forums 15B" (dusty teal). MIDDLE (3 processing blocks, stacked): "Deduplicated (MinHash + exact)", "Quality-filtered (classifier + heuristics)", "PII-scrubbed (regex + NER)". RIGHT (3 final splits): "Pretraining set 1.4T tokens" (largest), "Instruction-tune pool 12B tokens", "RLHF preference pool 3B tokens". Flow ribbons inherit source color with mid-labels showing token counts ("85B", "320B", "44B"). Legend strip at bottom. Title: "LLM pretraining data mixture and downstream splits". Subtitle: "token counts after deduplication and quality filtering; ribbon thickness ∝ token flow."
1031/4000
高级设置批量与后处理参数
批量生成
待接入

需要批量任务合同、单任务价格聚合和队列状态后再开放。当前保持单次提交,避免消耗预估失真。

高清修复
待接入

需要独立后处理 API 与结果回写链路。未接入前不展示成可执行开关。

暂无图片作品

当前筛选下没有可展示记录。切回全部或继续生成后,这里会按所选布局展示作品。

收藏夹4

未归类收藏

暂无未归类作品

极影2077

越界生成,触达未来。

订阅/充值个人中心灵感来源