Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data.ref: versioning XML schemas of "factura electrónica" #211

Draft
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

yaselc
Copy link

@yaselc yaselc commented Apr 12, 2021

  • The contents of the schemas-xml directory are separated into subdirectories, according to the sources used to build the set.
    The subdirectories are named considering the source used and the most recent modification timestamp present between the files in the set.
  • New enum cl_sii.base.constants.XmlSchemasVersionEnum to define the available XML schema versions
  • Create a new version of the XML schemas for "factura electrónica" from the official XML schemas of AEC (Archivo Electrónico de Cesión). Last update timestamp is 2019-12-12.

Source: cl-sii-extraoficial/archivos-oficiales@c89dec5

Changelog:

  • SiiTypes_v10.xsd:
    • Replaces CRLF line endings with LF.
    • root: A new simple type Dec14_4-0Type is added for
      non-negative decimals (admits 0)
    • TipoTransCOMPRA: The base type is changed and adds a restriction
      for the minimum and the maximum value (1 - 7)
    • TipoTransVENTA: Adds restriction for the minimum and maximum
      value (1 - 4)
  • DTE_v10.xsd
    • Replaces CRLF line endings with LF.
    • IdDoc: Adds the element TipoFactEsp
    • Receptor.Extranjero: Adds the element TipoDocID
    • IndServicio: Adds a new item to the enumeration
    • MntExeOtrMnda: Type changed to Dec14_4-0Type
    • MntTotOtrMnda: Type changed to Dec14_4-0Type

@yaselc
Copy link
Author

yaselc commented Apr 12, 2021

@glarrain @jtrh I run the schema validation on a sample of 73.839 DTEs, from this sample the validation failed for a total of 5.039 DTEs, the same number of DTEs regardless of the version of the XML schemas used, which means that the new version of the XML schemas doesn't introduce new errors.
The criterion for the selection of the DTEs in the sample was that these have been used in a "cesión" made by the FP platform at some point and this "cesión" has been approved by the SII.

@yaselc
Copy link
Author

yaselc commented Apr 12, 2021

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions. To be authentic, I think we should remove the XML schemas taken from unofficial sources.

@@ -209,3 +209,19 @@ def emisor_is_vendedor(self) -> bool:
@property
def receptor_is_vendedor(self) -> bool:
return self.is_factura_compra


class XmlSchemasVersionEnum(enum.Enum):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class XmlSchemasVersionEnum(enum.Enum):
class DteXmlSchemaVersionEnum(enum.Enum):

Copy link
Author

@yaselc yaselc Apr 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glarrain the directory also contains the XML schemas for RTC and RCV, there are even changes in this PR that apply to the validation of the AEC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the enum shouldn't belong in cl_sii/dte/constants.py, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the enum shouldn't belong in cl_sii/dte/constants.py, right?

Good catch. Where do you think it will fit best? cl_sii/base/constants.py?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the enum shouldn't belong in cl_sii/dte/constants.py, right?

Good catch. Where do you think it will fit best? cl_sii/base/constants.py?

cl_sii/base/constants.py looks good to me.


class XmlSchemasVersionEnum(enum.Enum):
"""
Enum of "SII XML Schema Versions".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Enum of "SII XML Schema Versions".
Enum of "SII DTE XML Schema Version"

LATEST = '2019_12_12'
"""Reference to the latest version available"""

V2019_12_12 = '2019_12_12'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add reference to the source or where some more information can be looked at


"""

LATEST = '2019_12_12'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion this is bound to create problems because at some point we will update this value and it will break things that depend on it. Also, enum values shouldn't change.

@yaselc yaselc force-pushed the feature/versioning-dte-xml-schema branch from 7d0f631 to 6f9486a Compare April 13, 2021 18:39
@glarrain glarrain marked this pull request as draft April 15, 2021 03:53
@jtrh
Copy link
Contributor

jtrh commented Apr 15, 2021

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions.

  • Do all the files of a year-month-day (YMD) version (e.g. V2019_12_12) originate from the same source package (e.g. the same SII ZIP file)?
  • Could a version contain files with different timestamps?
  • Does it make sense to associate all the files of a specific version with a single date?

To be authentic, I think we should remove the XML schemas taken from unofficial sources.

For better or worse, we have already used those unofficial schemas, so it may be a good idea to keep them. Maybe we could add a suffix to unofficial versions (e.g. V2019_12_31_LibreDTE)?

@yaselc yaselc force-pushed the feature/versioning-dte-xml-schema branch from 6f9486a to 60ee41e Compare April 15, 2021 05:28
@jtrh
Copy link
Contributor

jtrh commented Apr 15, 2021

@glarrain @jtrh, please suggest what you think would be the most suitable name for the XML schema versions.

Another idea: Instead of version 2019_12_12, use something like sii_rtc_2019_12_12_schema_cesion or sii_rtc_2019_12_12 to make it easier to associate the version with its source in src/code/rtc/2019-12-12-schema_cesion.

@codecov-io
Copy link

codecov-io commented Apr 15, 2021

Codecov Report

Merging #211 (60ee41e) into develop (dcec499) will decrease coverage by 0.04%.
The diff coverage is 81.08%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #211      +/-   ##
===========================================
- Coverage    81.02%   80.98%   -0.05%     
===========================================
  Files           32       32              
  Lines         2525     2556      +31     
  Branches       375      378       +3     
===========================================
+ Hits          2046     2070      +24     
- Misses         306      310       +4     
- Partials       173      176       +3     
Impacted Files Coverage Δ
cl_sii/libs/xml_utils.py 78.35% <75.00%> (-0.60%) ⬇️
cl_sii/dte/parse.py 81.75% <76.92%> (-0.79%) ⬇️
cl_sii/rtc/parse_aec.py 89.08% <76.92%> (-0.66%) ⬇️
cl_sii/base/constants.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcec499...60ee41e. Read the comment docs.

Copy link
Contributor

@jtrh jtrh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yaselc yaselc force-pushed the feature/versioning-dte-xml-schema branch from 60ee41e to 0464024 Compare April 20, 2021 02:58
@codecov-commenter
Copy link

codecov-commenter commented Apr 20, 2021

Codecov Report

Merging #211 (0464024) into develop (dcec499) will decrease coverage by 0.02%.
The diff coverage is 82.05%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #211      +/-   ##
===========================================
- Coverage    81.02%   81.00%   -0.03%     
===========================================
  Files           32       32              
  Lines         2525     2558      +33     
  Branches       375      378       +3     
===========================================
+ Hits          2046     2072      +26     
- Misses         306      310       +4     
- Partials       173      176       +3     
Impacted Files Coverage Δ
cl_sii/libs/xml_utils.py 78.35% <75.00%> (-0.60%) ⬇️
cl_sii/dte/parse.py 81.75% <76.92%> (-0.79%) ⬇️
cl_sii/rtc/parse_aec.py 89.08% <76.92%> (-0.66%) ⬇️
cl_sii/base/constants.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dcec499...0464024. Read the comment docs.

@yaselc
Copy link
Author

yaselc commented Apr 20, 2021

  • Do all the files of a year-month-day (YMD) version (e.g. V2019_12_12) originate from the same source package (e.g. the same SII ZIP file)?

Not necessarily, for example, the first official version was built from various sets (ZIP files) available on the official site:
http://www.sii.cl/factura_electronica/schema_dte.zip
http://www.sii.cl/factura_electronica/schema_iecv.zip
http://www.sii.cl/factura_electronica/schema_cesion.zip

  • Could a version contain files with different timestamps?

yes, this is definitely always the case, but the last modification timestamp is an indicator of the update of the whole set.

  • Does it make sense to associate all the files of a specific version with a single date?

If we agree that the latest modification timestamp is an indicator of the freshness of the whole set, then it would make sense.

For better or worse, we have already used those unofficial schemas, so it may be a good idea to keep them. Maybe we could add a suffix to unofficial versions (e.g. V2019_12_31_LibreDTE)?

Excellent, I think it's a very good idea

Another idea: Instead of version 2019_12_12, use something like sii_rtc_2019_12_12_schema_cesion or sii_rtc_2019_12_12 to make it easier to associate the version with its source in src/code/rtc/2019-12-12-schema_cesion.

I think using the date as a prefix might be a good way to help to sort the sets.
I applied the suffix idea when possible because at least in the first version it is impossible because of the variety of sources.
Finally, I added more description to each element in the enum, to help make it self-contained

The contents of the `schemas-xml` directory are separated into
subdirectories, according to the sources used to build the set.
The subdirectories are named considering the source used and the
most recent modification timestamp present between the files in
the set.

Extra:
New enum `cl_sii.base.constants.XmlSchemasVersionEnum` to define
the available XML schema versions
Create a new version of the XML schemas for "factura electrónica"
from the official XML schemas of AEC (Archivo Electrónico de Cesión).
Last update timestamp is 2019-12-12.
Source: [cl-sii-extraoficial/archivos-oficiales@c89dec5](https://github.com/cl-sii-extraoficial/archivos-oficiales/tree/c89dec54f664281721dcb77af327c4f6c58ec4ff/src/code/rtc/2019-12-12-schema_cesion)

Changelog:
  - `SiiTypes_v10.xsd`:
    - Replaces CRLF line endings with LF.
    - `root`: A new simple type `Dec14_4-0Type` is added for
      non-negative decimals (admits 0)
    - `TipoTransCOMPRA`: The base type is changed and adds a restriction
      for the minimum and the maximum value (1 - 7)
    - `TipoTransVENTA`: Adds restriction for the minimum and maximum
      value (1 - 4)
  - `DTE_v10.xsd`
    - Replaces CRLF line endings with LF.
    - `IdDoc`: Adds the element `TipoFactEsp`
    - `Receptor.Extranjero`: Adds the element `TipoDocID`
    - `IndServicio`: Adds a new item to the enumeration
    - `MntExeOtrMnda`: Type changed to `Dec14_4-0Type`
    - `MntTotOtrMnda`: Type changed to `Dec14_4-0Type`
@yaselc yaselc force-pushed the feature/versioning-dte-xml-schema branch from 0464024 to 98e29dc Compare April 22, 2021 15:45
@@ -3,5 +3,7 @@ include LICENSE
include README.rst
recursive-include cl_sii *py
recursive-include cl_sii/data/cte/schemas-json *.schema.json
recursive-include cl_sii/data/ref/factura_electronica/schemas-xml *.xsd
recursive-include cl_sii/data/ref/factura_electronica/schemas-xml/2013_02_07_sii_official *.xsd
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtrh @glarrain I'm not entirely sure if these changes will be necessary, I know that recursive-include would recursively include all files, but will it keep the directory structure?

@yaselc
Copy link
Author

yaselc commented Apr 26, 2021

@glarrain @jtrh

@jtrh
Copy link
Contributor

jtrh commented May 5, 2021

CC: @jtrobles-cdd

@yaselc yaselc assigned ycouce-cdd and unassigned yaselc May 17, 2021
@jtrobles-cdd jtrobles-cdd added enhancement New feature or request and removed feature labels Feb 3, 2022
@reviewpad reviewpad bot mentioned this pull request Mar 30, 2023
9 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants