Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
project-collie
project-collie
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 5
    • Issues 5
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 2
    • Merge requests 2
  • Operations
    • Operations
    • Incidents
  • Analytics
    • Analytics
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • granite
  • project-collieproject-collie
  • Wiki
    • Data_pump
    • Readers
  • file

Last edited by 宋志鹏 Apr 29, 2020
Page history
This is an old version of this page. You can view the most recent version or browse the history.

file

文件输入 file

从文件中读取

class参数配置为file.FileDocReader

示例:

company_name:  # 名称(自定义)
    class: file.FileDocReader
    init:
      path: "hdfs://hdp-nn-001:8020/user/data/digest_company_name/"
      formater: company_name_digest
      pattern: "*.gz"
  • path:文件路径,支持hdfs、ftp和本地文件
  • formater: 格式化处理器
  • pattern: 文件名匹配模式。比如*.py匹配.py结尾的文件,*.gz匹配.gz格式压缩文件
Clone repository
  • README
  • data_pump
    • data_pump
    • filters
    • filters
      • bloom
    • readers
    • readers
      • file
      • kafka
      • mongodb
      • sql
    • writers
    • writers
      • file
  • dev_guide
  • dev_manual
  • Home
  • ops
    • ansible
View All Pages