arroyo 对于kafka 有着很不错的集成支持(目前版本可以说是优先支持的),使用原生kafka 是一个选择,但是部署以及管理感觉比较费事
以前简单介绍过redpanda,所以尝试下集成
环境准备
- docker-compose
包含了redpandadata console,connect,以及一个测试,对于arroyo 使用了与redpandadata 同一个网络
version: '3'networks:redpanda_network:driver: bridgevolumes:redpanda: nullservices:redpanda:image: docker.redpanda.com/redpandadata/redpanda:v23.1.4command:- redpanda start- --smp 1- --overprovisioned- --kafka-addr PLAINTEXT://0.0.0.0:29092,OUTSIDE://0.0.0.0:9092- --advertise-kafka-addr PLAINTEXT://redpanda:29092,OUTSIDE://localhost:9092- --pandaproxy-addr 0.0.0.0:8082- --advertise-pandaproxy-addr localhost:8082ports:- 8081:8081- 8082:8082- 9092:9092- 9644:9644- 29092:29092volumes:- redpanda:/var/lib/redpanda/datanetworks:- redpanda_network
console:image: docker.redpanda.com/redpandadata/console:v2.2.3entrypoint: /bin/shcommand: -c "echo \"$$CONSOLE_CONFIG_FILE\" > /tmp/config.yml; /app/console"environment:CONFIG_FILEPATH: /tmp/config.ymlCONSOLE_CONFIG_FILE: |kafka:brokers: ["redpanda:29092"]schemaRegistry:enabled: trueurls: ["http://redpanda:8081"]redpanda:adminApi:enabled: trueurls: ["http://redpanda:9644"]connect:enabled: trueclusters:- name: local-connect-clusterurl: http://connect:8083ports:- 8080:8080networks:- redpanda_networkdepends_on:- redpanda
owl-shop:image: quay.io/cloudhut/owl-shop:latestnetworks:- redpanda_network#platform: 'linux/amd64'environment:- SHOP_KAFKA_BROKERS=redpanda:29092- SHOP_KAFKA_TOPICREPLICATIONFACTOR=1- SHOP_TRAFFIC_INTERVAL_RATE=1- SHOP_TRAFFIC_INTERVAL_DURATION=0.1sdepends_on:- redpanda
connect:image: docker.redpanda.com/redpandadata/connectors:latesthostname: connectcontainer_name: connectnetworks:- redpanda_network#platform: 'linux/amd64'depends_on:- redpandaports:- "8083:8083"environment:CONNECT_CONFIGURATION: |key.converter=org.apache.kafka.connect.converters.ByteArrayConvertervalue.converter=org.apache.kafka.connect.converters.ByteArrayConvertergroup.id=connectors-clusteroffset.storage.topic=_internal_connectors_offsetsconfig.storage.topic=_internal_connectors_configsstatus.storage.topic=_internal_connectors_statusconfig.storage.replication.factor=-1offset.storage.replication.factor=-1status.storage.replication.factor=-1offset.flush.interval.ms=1000producer.linger.ms=50producer.batch.size=131072CONNECT_BOOTSTRAP_SERVERS: redpanda:29092CONNECT_GC_LOG_ENABLED: "false"CONNECT_HEAP_OPTS: -Xms512M -Xmx512MCONNECT_LOG_LEVEL: infoapp:image: ghcr.io/arroyosystems/arroyo-single:multi-archnetworks:- redpanda_networkports:- "8000:8000"- "8001:8001"启动&测试
- 启动
docker-compose up -d- 创建kafka topic

- 配置arroyo connections、source 以及sink
我们测试的时候因为部署了一个owl-shop 服务,可以直接将owl-shop 的topic 作为source,然后写入到demo topic中(sink)
connections 配置

kafka source 配置

json schema 定义
{"$schema": "http://json-schema.org/draft-04/schema#","title": "Product","type": "object","properties": {"version": { "type": "number" },"id": { "type": "string" },"customer": {"type": "object","properties": { "id": { "type": "string" }, "type": { "type": "string" } }},"type": { "type": "string" },"firstName": { "type": "string" },"lastName": { "type": "string" },"state": { "type": "string" },"street": { "type": "string" },"houseNumber": { "type": "string" },"city": { "type": "string" },"zip": { "type": "string" },"latitude": { "type": "number" },"longitude": { "type": "number" },"phone": { "type": "string" },"additionalAddressInfo": { "type": "string" },"createdAt": { "type": "string" },"revision": { "type": "number" }}}
kafka sink 创建

- 创建job (pipeline)
sql 查询

job 执行

redpanda console 效果

说明
arroyo+redpanda 的pipeline 还是一种不错的选择,至少redpanda 部署以及使用还是比较简单的,而且支持不少kafaka 周边的兼容,是一个不错的选择,
arroyo 测试目前来说支持的source 以及sink 不是很多(核心还是kafaka)但是基于sql 的处理模式真的很方便,相比kafaka 的stream 简单不少,值得尝试下
参考资料
https://github.com/rongfengliang/arroyo-redpanda-learning
https://github.com/ArroyoSystems/arroyo
https://github.com/redpanda-data/redpanda
https://redpanda.com/










