You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 19, 2021. It is now read-only.
If I have a table with a pipe (|) in a character field and try to export and write to a string, there's no checking for whether the delimiter exists in any of the fields:
import giraffez
sql = """SELECT * FROM
(
SELECT
CAST('first|field' AS VARCHAR(20)) AS first_char,
CAST('second field' AS VARCHAR(20)) AS second_char
) AS tbl
"""
with giraffez.BulkExport(sql) as export:
r = list(export.to_str(delimiter='|'))
print(r)
['first|field|second field']
I could bring it out with a different delimiter, or roll my own to_str that takes in a dict from to_dict/to_list, but wondering if there should be functionality pushed up into to_str to error out, warn, or remove the field if the delimiter exists in a column?
The text was updated successfully, but these errors were encountered:
Checking the delimiter for every byte of data would introduce a very significant performance change to cover this case. While I agree that it would be nice to find that the source data has a character that is being used for a delimiter, but I think for the most part this is why the to_list and to_dict exists so that in the case that this behavior is desired they can be used to produce those results with a higher level of safety (i.e. use to_list and pass to Python's csv module). Since to_str is building a buffer in C and returning that as text it would be a fairly large change to try and introduce the kind of logic that may address some of these concerns like a quote character. This was not originally in scope for this because to_str is suppose to be fast and straight-forward, and I also wish to discourage the use of character delimited formats altogether because they are not portable or safe to use in far too many cases, vs. something like JSON which fully supports unicode, is self-describing of all fields, and is portable regardless of OS/CPU Architecture.
I think it is possible to better abstract this so that it is either more apparent how to solve issues like in your example, or to just provide a better interface that would solve for this automatically. For example, to_str is still available but under the hood it just calls to_list and uses something like csv rather than providing the other option that builds the string in C. Thoughts?
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
If I have a table with a pipe (
|
) in a character field and try to export and write to a string, there's no checking for whether the delimiter exists in any of the fields:I could bring it out with a different delimiter, or roll my own
to_str
that takes in adict
fromto_dict
/to_list
, but wondering if there should be functionality pushed up intoto_str
to error out, warn, or remove the field if the delimiter exists in a column?The text was updated successfully, but these errors were encountered: