Return

This Week in Databend #82

February 22, 2023 Β· 5 min read

PsiACE

Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: .


Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

Features & Improvements ✨

AST

  • select from stage support uri with connection options (#10066)

Catalog

  • Iceberg/create-catalog (#9017)

Expression

  • type decimal support agg func min/max (#10085)
  • add sum/avg for decimal types (#10059)

Pipeline

  • enrich core pipelines processors (#10098)

Query

  • create stage, select stage, copy, infer_schema support named file format (#10084)
  • query result cache (#10042)

Storage

  • table data cache (#9772)
  • use drop_table_by_id api in drop all (#10054)
  • native storage format support nested data types (#9798)

Code Refactoring πŸŽ‰

Meta

  • add compatible layer for upgrade (#10082)
  • More elegant error handling (#10112, #10114, etc.)

Cluster

  • support exchange sorting (#10149)

Executor

  • add check processor graph completed (#10166)

Planner

  • apply constant folder at physical plan builder (#9889)

Query

  • use accumulating to impl single state aggregator (#10125)

Storage

  • adopt OpenDAL's batch delete support (#10150)
  • adopt OpenDAL query based metadata cache (#10162)

Build/Testing/CI Infra Changes πŸ”Œ

  • release deb repository (#10080)
  • release with systemd units (#10145)

Bug Fixes πŸ”§

Expression

  • no longer return Variant as common super type (#9961)
  • allow auto cast from string and variant (#10111)

Cluster

  • fix limit query hang in cluster mode (#10006)

Storage

  • wrong column statistics when contain tuple type (#10068)
  • compact not work as expected with add column (#10070)
  • fix add column min/max stat bug (#10137)

What's On In Databend

Stay connected with the latest news about Databend.

Query Result Cache

In the past week, Databend now supports caching of query results!

             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 1  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 1
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–ΊDummy───►Downstream
Upstream────►│Duplicateβ”‚ 2  β”‚         β”‚ 3
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–ΊDummy───►Downstream
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚         β”‚
                            β”‚ Shuffle β”‚
             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” 3  β”‚         β”‚ 2  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–Ίβ”‚  Write  β”‚
Upstream────►│Duplicateβ”‚ 4  β”‚         β”‚ 4  β”‚ Result  β”‚
             β”‚         β”œβ”€β”€β”€β–Ίβ”‚         β”œβ”€β”€β”€β–Ίβ”‚  Cache  β”‚
             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Learn More

Table Data Cache

Databend now supports table data cache:

  • disk cache: raw column(compressed) data of the data block.
  • in-memory cache(experimental): deserialized column objects of a data block.

For cache-friendly workloads, the performance gains are significant.

Learn More

Deb Source & Systemd Support

Databend now offers the official Deb package source and supports the use of systemd to manage the service.

For DEB822 Source Format:

sudo curl -L -o /etc/apt/sources.list.d/datafuselabs.sources https://repo.databend.rs/deb/datafuselabs.sources
sudo apt update
sudo apt install databend
sudo systemctl start databend-meta
sudo systemctl start databend-query

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Service Activation Progress Report

When starting a Query/Meta node, it is necessary to perform checks and output them explicitly to help the user diagnose faults and confirm status.

Example:

storage check succeed
meta check failed: timeout, no response. endpoints: xxxxxxxx .
status check failed: address already in use.

Issue 10193: Feature: output the necessary progress when starting a query/meta node

Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBig-WuuBohuTANGcameronbraid
andylokandyariesdevilb41shBig-WuuBohuTANGcameronbraid
Chasen-ZhangClSlaiddantengskydrmingdrmereverpcpcjohnhaxx7
Chasen-ZhangClSlaiddantengskydrmingdrmereverpcpcjohnhaxx7
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
lichuangmergify[bot]PsiACERinChanNOWWWsoyeric128sundy-li
suyanhanxTCeasonXuanwoxudong963youngsofunzhang2014
suyanhanxTCeasonXuanwoxudong963youngsofunzhang2014
zhyass
zhyass

🎈Connect With Us

Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.

Join the Databend Community to try, get help, and contribute!