Pinot 是一个实时分布式 OLAP 数据存储,专为提供超低延迟分析而构建,即使在极高吞吐量下也是如此。如果你还不了解Pinot,那么可以先阅读这篇文章《Apache Pinot基本介绍》,本文介绍如何以Docker方式运行Pinot,在Docker中运行Pinot对于了解Docker的新手来说是最简单不过的了。
拉取镜像
docker pull apachepinot/pinot:latest或者你也可以指定pinot版本
docker pull apachepinot/pinot:0.9.3在同一个docker容器中运行所有组件
docker run \
    -p 9000:9000 \
    apachepinot/pinot:latest QuickStart \
    -type batch随后在浏览器输入:http://localhost:9000,即可看到如下界面

上述模式为启动Batch模式的Pinot进程,可以参考这里实践其他的Pinot运行模式。
使用Docker compose在多个容器中运行Pinot进行
docker-compose.yml内容如下:
version'3.7'
services
  zookeeper
    imagezookeeper3.5.6
    hostnamezookeeper
    container_namemanual-zookeeper
    ports
"2181:2181"
    environment
      ZOOKEEPER_CLIENT_PORT2181
      ZOOKEEPER_TICK_TIME2000
  pinot-controller
    imageapachepinot/pinotlatest
    command"StartController -zkAddress manual-zookeeper:2181"
    container_name"manual-pinot-controller"
    restartunless-stopped
    ports
"9000:9000"
    environment
      JAVA_OPTS"-Dplugins.dir=/opt/pinot/plugins -Xms1G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-controller.log"
    depends_on
zookeeper
  pinot-broker
    imageapachepinot/pinotlatest
    command"StartBroker -zkAddress manual-zookeeper:2181"
    restartunless-stopped
    container_name"manual-pinot-broker"
    ports
"8099:8099"
    environment
      JAVA_OPTS"-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-broker.log"
    depends_on
pinot-controller
  pinot-server
    imageapachepinot/pinotlatest
    command"StartServer -zkAddress manual-zookeeper:2181"
    restartunless-stopped
    container_name"manual-pinot-server" 
    environment
      JAVA_OPTS"-Dplugins.dir=/opt/pinot/plugins -Xms4G -Xmx16G -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -Xloggc:gc-pinot-server.log"
    depends_on
pinot-broker将上述文件拷贝到本地docker-compose.yml文件中,执行如下命令启动:
docker-compose --project-name pinot-demo up查看容器运行状态
docker ps
同样在浏览器输入:http://localhost:9000,即可看到如下界面:

导入批量数据
在上述步骤中,我们已经在Dokcer中拉起Pinot运行环境,接下来便可导入数据进行查询。
进入/tmp目录,执行如下命令:
mkdir -p pinot-quick-start/rawdata
vim pinot-quick-start/rawdata/transcript.csv在上述新建的csv文件中填入下述数据:
studentID,firstName,lastName,gender,subject,score,timestampInEpoch
200,Lucy,Smith,Female,Maths,3.8,1570863600000
200,Lucy,Smith,Female,English,3.5,1571036400000
201,Bob,King,Male,Maths,3.2,1571900400000
202,Nick,Young,Male,Physics,3.6,1572418800000新建schema文件
vim pinot-quick-start/transcript-schema.json填入如下内容:
{
  "schemaName": "transcript",
  "dimensionFieldSpecs": [
    {
      "name": "studentID",
      "dataType": "INT"
    },
    {
      "name": "firstName",
      "dataType": "STRING"
    },
    {
      "name": "lastName",
      "dataType": "STRING"
    },
    {
      "name": "gender",
      "dataType": "STRING"
    },
    {
      "name": "subject",
      "dataType": "STRING"
    }
  ],
  "metricFieldSpecs": [
    {
      "name": "score",
      "dataType": "FLOAT"
    }
  ],
  "dateTimeFieldSpecs": [{
    "name": "timestampInEpoch",
    "dataType": "LONG",
    "format" : "1:MILLISECONDS:EPOCH",
    "granularity": "1:MILLISECONDS"
  }]
}创建表配置项
vim pinot-quick-start/transcript-table-offline.json填入如下内容
{
  "tableName": "transcript",
  "segmentsConfig" : {
    "timeColumnName": "timestampInEpoch",
    "timeType": "MILLISECONDS",
    "replication" : "1",
    "schemaName" : "transcript"
  },
  "tableIndexConfig" : {
    "invertedIndexColumns" : [],
    "loadMode"  : "MMAP"
  },
  "tenants" : {
    "broker":"DefaultTenant",
    "server":"DefaultTenant"
  },
  "tableType":"OFFLINE",
  "metadata": {}
}执行如下命令创建表
docker run --rm -ti \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-batch-table-creation \
    apachepinot/pinot:latest AddTable \
    -schemaFile /tmp/pinot-quick-start/transcript-schema.json \
    -tableConfigFile /tmp/pinot-quick-start/transcript-table-offline.json \
    -controllerHost manual-pinot-controller \
    -controllerPort 9000 -exec得到如下输出:

Pinot 表的数据存储为 Pinot 段。
要生成段,我们需要首先创建一个作业规范 yaml 文件。 JobSpec yaml 文件包含有关数据格式、输入数据位置和 Pinot 簇坐标的所有信息。 您可以复制此作业规范文件。 如果您使用自己的数据,请确保 1) 用您的表名替换成transcript 2) 设置正确的 recordReaderSpec
executionFrameworkSpec
  name'standalone'
  segmentGenerationJobRunnerClassName'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner'
  segmentTarPushJobRunnerClassName'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner'
  segmentUriPushJobRunnerClassName'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner'
jobTypeSegmentCreationAndTarPush
inputDirURI'/tmp/pinot-quick-start/rawdata/'
includeFileNamePattern'glob:**/*.csv'
outputDirURI'/tmp/pinot-quick-start/segments/'
overwriteOutputtrue
pinotFSSpecs
schemefile
    classNameorg.apache.pinot.spi.filesystem.LocalPinotFS
recordReaderSpec
  dataFormat'csv'
  className'org.apache.pinot.plugin.inputformat.csv.CSVRecordReader'
  configClassName'org.apache.pinot.plugin.inputformat.csv.CSVRecordReaderConfig'
tableSpec
  tableName'transcript'
  schemaURI'http://manual-pinot-controller:9000/tables/transcript/schema'
  tableConfigURI'http://manual-pinot-controller:9000/tables/transcript'
pinotClusterSpecs
controllerURI'http://manual-pinot-controller:9000'使用以下命令生成段并上传数据
docker run --rm -ti \
    --network=pinot-demo_default \
    -v /tmp/pinot-quick-start:/tmp/pinot-quick-start \
    --name pinot-data-ingestion-job \
    apachepinot/pinot:latest LaunchDataIngestionJob \
    -jobSpecFile /tmp/pinot-quick-start/docker-job-spec.yml导入完数据之后即可在前端界面进行查询:












