Function Don't Work When Applied To Rows, Only Result For First Row Is Returned
Introduction
The Netquack extension in DuckDB is a powerful tool for extracting information from strings. However, a bug has been discovered where the macro functions only work on scalar values and not on rows. When applied to rows, only the result for the first row is returned, followed by a segfault. In this article, we will delve into the details of this bug, provide steps to reproduce it, and discuss the expected behavior.
Describe the Bug
The macro functions in Netquack only seem to work on scalar values and not on rows. When applied to rows, only the result for the first row is returned. This is typically followed by a segfault. This behavior is not expected, as macro functions should return the result for every row.
To Reproduce
The steps to reproduce this behavior are as follows:
Step 1: Create a Simple Table of URLs
Create a simple table of URLs using the following SQL query:
CREATE OR REPLACE TABLE urls AS
SELECT 'a.example.com' AS url
UNION ALL
SELECT 'https://b.a.example.com/path/path';
This will create a table with two rows, each containing a URL.
Step 2: Query the Table
Query the table using the following SQL query:
SELECT extract_host(url), url
FROM urls;
This will return the host extracted from each URL, along with the original URL.
Expected Result
The expected result is:
βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β extract_host(url) β url β
β varchar β varchar β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β b.a.example.com β https://b.a.example.com/path/path β
β a.example.com β a.example.com β
βββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
However, the actual result is:
βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β extract_host(url) β url β
β varchar β varchar β
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β a.example.com β a.example.com β
β β https://b.a.example.com/path/path β
βββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
As you can see, the host is only extracted for the first row.
Workaround
To get the expected result, you can use the following SQL query:
SELECT extract_host(url), url
FROM (SELECT url FROM urls ORDER BY url DESC) AS t;
This will return the host extracted from each URL, along with the original URL.
Expected Behavior
The expected behavior is that the macro functions should return the result for every row. In this case, the extract_host
function should return the host extracted from each URL, along with the original URL.
Versions
The versions of Netquack and DuckDB used to reproduce this bug are:
- Netquack Version: d7f4bec
- DuckDB Version: v1.2.1
Additional Context
This bug was reproduced on an M1 Mac running macOS 15.3.
Conclusion
Introduction
In our previous article, we discussed a bug in the Netquack extension of DuckDB where the macro functions only work on scalar values and not on rows. When applied to rows, only the result for the first row is returned, followed by a segfault. In this article, we will answer some frequently asked questions about this bug.
Q: What is the Netquack extension?
A: The Netquack extension is a powerful tool for extracting information from strings in DuckDB. It provides a set of macro functions that can be used to extract information from strings, such as extracting hosts, domains, and paths.
Q: What is the bug in the Netquack extension?
A: The bug in the Netquack extension is that the macro functions only work on scalar values and not on rows. When applied to rows, only the result for the first row is returned, followed by a segfault.
Q: How do I reproduce the bug?
A: To reproduce the bug, you can follow these steps:
- Create a simple table of URLs using the following SQL query:
CREATE OR REPLACE TABLE urls AS
SELECT 'a.example.com' AS url
UNION ALL
SELECT 'https://b.a.example.com/path/path';
- Query the table using the following SQL query:
SELECT extract_host(url), url
FROM urls;
- Observe that the host is only extracted for the first row.
Q: What is the expected behavior?
A: The expected behavior is that the macro functions should return the result for every row. In this case, the extract_host
function should return the host extracted from each URL, along with the original URL.
Q: How do I get the expected result?
A: To get the expected result, you can use the following SQL query:
SELECT extract_host(url), url
FROM (SELECT url FROM urls ORDER BY url DESC) AS t;
This will return the host extracted from each URL, along with the original URL.
Q: What are the versions of Netquack and DuckDB used to reproduce this bug?
A: The versions of Netquack and DuckDB used to reproduce this bug are:
- Netquack Version: d7f4bec
- DuckDB Version: v1.2.1
Q: What is the additional context for this bug?
A: This bug was reproduced on an M1 Mac running macOS 15.3.
Q: Is there a workaround for this bug?
A: Yes, there is a workaround for this bug. You can use the SQL query mentioned above to get the expected result.
Q: Is this bug fixed in the latest version of Netquack?
A: We do not have information on whether this bug is fixed in the latest version of Netquack. However, we recommend checking the latest version of Netquack to see if the bug has been fixed.
Conclusion
In conclusion, the Netquack extension has a bug where the macro functions only work on scalar values and not on rows. When applied to rows, only the result for the first row is returned, followed by a segfault. We hope this Q&A article has provided you with the information you need to understand and work around this bug.