Logging Development

mike · 4 December 2022 08:04

I found a blog about compressing logs while still maintaining the ability to search the logs.

Result of Phase 1 compression: In a 30-day window, our entire Spark ecosystem generated 5.38PB of uncompressed INFO level unstructured logs yet our CLP appender compressed them to only 31.4TB, amounting to an unprecedented 169x compression ratio. Now with CLP, we have restored our log verbosity from WARN back to INFO, and we can afford to retain all the logs for 1 month (as requested by our engineers).

mike · 4 December 2022 19:24

Apache IoTDB 1.0 is an open-source time-series database focused on being lightweight, high performance, and usable features around IoT data collection, storage, and analysis. The Apache IoTDB can integrate with other Apache software projects like Spark and Hadoop.

mike · 13 December 2022 15:08

This article talks about column databases which are a less common type of database that can be used for application performance monitoring.

Note: This article was written and paid for by the vendor mentioned in the article.

The first reason is improved compression. This is due to column databases being able to use optimal compression algorithms for each data type because each column is the same type of data rather than a row of mixed data types. This not only reduces storage cost on disks but improves performance because fewer disk seeks are needed, and more data can fit into RAM.

What Is a Column Database and When Should You Use One? - The New Stack

I never took a database course. But a column database looks like a big Excel datasheet to me. I guess that is different from how relational databases are organized.