-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New legacy id generator #4875
New legacy id generator #4875
Conversation
To upgrade the id generator from
Finally, the value of |
@parrjd testing the fix for next range gaps in PR #4840 create problem with update. If fixed all the radix issues (there are few other operation errors) then the ranges do not match properly generating errors in deployed instances (it is OK for new CA). To avoid problems we are implementing this fix which provides a new generator so in cases instances have no problem with gaps there is no need to modify it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fmarco76 I'm still reviewing this, but could you add a test here (before switching to RSNv3) to show how to fix an existing CA that has a sequential serial number gap?
So basically the test might look like this:
- make the necessary changes to the CA
- allocate new ranges
- enroll certs to exhaust the current range
- repeat step 2 & 3 if necessary
- verify that no new gap is created
For simplicity it's not necessary to verify the config params or LDAP entries. The verification in step 5 should be sufficient. Thanks!
@fmarco76 sorry I did not see this earlier I think forcing the values in the CS.cfg to be read as base 10 is a bad idea and will only cause more problems for existing installs. What we have currently in our prod environment is a hex number including characters a-f that are allowed in a hex string, and if it is forced to be read as a base 10 that will likely break on startup, or when ever that vaule is required to be read. Using 100 as an example for both end and increment if you force it to be read and used as a decimal and the CA has already issued 0x80 then it will jump to the next range starting at 200 which will overlap the previous range. The only safe way to switch to Dec would be to convert the Incremement and the end values to Decimal so increment and end would need to be changed in the CS.cfg dbs.beginSerialNumber=1 dbs.serialIncrement=10000000 As a side note: pkispawn values For an install that did not override the begin and end for serial numbers CS.cfg with the pkispawn vars the first range has 268,435,456 possible serial numbers, and the increment has the same amount while the request only has 10,000,000 since the request is done in base 10. as long as range management is enabled it should not be an issue since the requests will role to the next range separate of the serials. The nextrange DS Values do not get populated for some time. We do not have them on most of our prod systems as they get populated at a point well after the CA is stood up when serial issuances reaches a point and populates the values to the DS instance. For a stand alone CA the default end serialNumer Value will result in most installs of a dogtag CA never rolling over to a new range. Active clones are more likely to role faster since they only get a very small fraction of the top end of the existing CA's serial range for a sequential CA. For a Random serial CA I believe it pulls the new nextRange when the CA is built, but may be wrong since I have not deployed a random serial CA yet. Ldap output from a serial CA that has a clone that has not reached the point where a range has been added to the DS instance.
I think the best solution would be to set a couple of extra values in the CS.cfg so that different subsystems can be contolled differently for like the TPS, TKS, or OSCP where everything I believe is in base 10, and treat all of the values for a type the same, so start, end, and increment for Serial would be base 16, and for request and replica they would be base 10. dbs.serialRadix=16 From DOGTAG_10_5_BRANCH
For a RHEL 7 amd RHEL 8 instance it can be assumed that for Serial should be interprtied as base 16, and for Requests and replica should be done as base 10. Since the math error in the code results in the value being recroded in the ds being higher than the value stored in the CS.cfg when taking base into account I am not seeing the risk to fixing it by fixing how the math is done. For dealing with existing systems I see a couple of paths. Handle at subsystem startup If nextRange values are not in DS, no issue since the problem code has not been executed. If nextRange exists in the DS, read the existing range from the cn=<start_of_current_range> from the DS and compare to what is in CS.cfg, if they match nothing to do, if they do not match the write back the End of the Range to CS.cfg so that they do match. This will prevent a gap from being created. Fix the code so that Reads from DS Range Values and the CS.cfg always bigInt with the Radix and toString(radix) so that the math is all done using the proper base for the values, and also stored in the same manner in both the DS and the CS.cfg so they can easily be compared. Switch to all Dec, but manually convert/ update the increment and end to be hex to dec values so that 100 would be 256 written back to the CS.cfg. The challenge here is I think this is a bigger lift, and anywhere in the code that currently uses the value as hex would have to be found, and anywhere it needs to be hex would need to do an integer.toHexString() to convert it. Basicly I am looking at the best way to prevent this from happening when a CA Rolls to the next range. Not looking to go back and try to fill in where we have ended up with gaps. We are already having to update Other systems that pull stuff from the CAs to handle, the gap, and we have had to do that before when we did not use the serial management provides and we did manual clones with completely different ranges. and had huge gaps so most of the other systems have a method to deal with gaps. If we read from DS and CS.cfg with a Radix then it is coming in to Bigint as a bigint from hex or Dec, and when writing back to either toString(radix), then it is put back in the expected value. I think adding 0x to designate it as hex is also likely a bad Idea as most of the code in the repository class is used by different subsystems that have different bases and then every value would need to be tested to see if the 0x needs to be stripped, off then bigint with a 16 radix if it had a 0x. where as if every subsystem provides the radix that it uses, it is just used vs having to add logic to determine the value type of dec/hex. I dont think a new id generator is really required since currently the existing has no potential of Serial overlap which would be really bad, it just can result in gaps since the math is not done the same way for the value written to the DS, and the value written to the CS.cfg |
b7dd26e
to
4fb546b
Compare
endRange: 48 | ||
host: primary.example.com | ||
|
||
EOF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the comment to indicate that the allocated cert range for the primary CA changed from 13-30 to 19-48.
Could you add a similar check for the request range too since we're migrating it to SSNv2 too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my understanding the beginRange
and endRange
attributes are always in decimal and we're not changing it in this PR:
https://github.com/dogtagpki/pki/blob/master/base/server/src/main/java/com/netscape/cmscore/dbs/Repository.java#L574-L578
The values are derived from nextRange
which was parsed as decimal:
https://github.com/dogtagpki/pki/blob/master/base/server/src/main/java/com/netscape/cmscore/dbs/Repository.java#L543-L544
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, they are written in decimal and derived from nextRange. However, nextRange is read as decimal when assigned to the range but it is read as hex when assigned in nextMinSerialNumber:
mNextMinSerialNo = new BigInteger(nextRange, mRadix); |
Therefore the value written in range match the value written in CS.cfg but the real value is different. With this patch the value will really match after the conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes it really complicated since we cannot really rely on the values written in the CS.cfg
or DS so there are more possibilities of problems. I need to look at the code to see how mNextMinSerialNo
is used.
So in this step the range size grows from 18 (which is the size specified during installation) to 30, and it temporarily overlaps the other server's range. Where does the 30 come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no real overlap because the one updated is in decimal while the other is in hex. If we took the range of the primary as example here. Initially the range was from 13 to 30 (30 = 13 + increment in dec - 1). The following range start from 31 and the begin is the real one. The gap is generated because in the CS.cfg the end is not 30 but 13 + incerement in hex - 1. = 24 and the next start from 31.
BTW, I think I found a possible conflict and I am updating the code.
- name: Check cert range config in primary CA | ||
run: | | ||
tests/ca/bin/ca-cert-range-config.sh primary | tee output | ||
|
||
cat > expected << EOF | ||
dbs.beginSerialNumber=0x13 | ||
dbs.endSerialNumber=0x30 | ||
dbs.serialCloneTransferNumber=0x9 | ||
dbs.serialIncrement=0x12 | ||
dbs.serialLowWaterMark=0x9 | ||
EOF | ||
|
||
diff expected output | ||
|
||
- name: Check cert range config in secondary CA | ||
run: | | ||
tests/ca/bin/ca-cert-range-config.sh secondary | tee output | ||
|
||
cat > expected << EOF | ||
dbs.beginSerialNumber=0x31 | ||
dbs.endSerialNumber=0x48 | ||
dbs.serialCloneTransferNumber=0x9 | ||
dbs.serialIncrement=0x12 | ||
dbs.serialLowWaterMark=0x9 | ||
EOF | ||
|
||
diff expected output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the ranges changed from 0x13-0x24 to 0x13-0x30 in the primary CA and from 0x31-0x42 to 0x31 to 0x48 in the secondary CA. Is there any possibility that the new ranges would conflict with other existing ranges?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I get the math correct the range in CS.cfg are smaller then in DS and not full used but the start range are always correct in case of gap. The ID between 0x42 and 0x48 are lost because when a new range is computed it start from the value recorded in nextRange and do not take in account the range objects and/or configuration range so overlap should not be possible. Using the full recorded range we are requesting to utilise the full range.
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemRangeGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemRangeUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemRangeUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemRangeGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemRangeGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
4b71c97
to
da9f148
Compare
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemIdGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemIdGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemIdGeneratorUpdateCLI.java
Show resolved
Hide resolved
LDAPSearchResults ranges = conn.search(rangeDN, LDAPv3.SCOPE_SUB, "(objectClass=pkiRange)", null, false); | ||
|
||
BigInteger lastUsedSerial = BigInteger.ZERO; | ||
boolean nextRangeToUpdate = true; | ||
// Search for the last range entry. If it is associated to the CA to update or ranges are not defined | ||
// then the nextRange is | ||
while (ranges.hasMoreElements()) { | ||
LDAPEntry entry = ranges.next(); | ||
String endRange = entry.getAttribute("endRange").getStringValues().nextElement(); | ||
String host = entry.getAttribute("host").getStringValues().nextElement(); | ||
String port = entry.getAttribute("securePort").getStringValues().nextElement(); | ||
BigInteger next = new BigInteger(endRange, 16); | ||
if (lastUsedSerial.compareTo(next) < 0) { | ||
lastUsedSerial = next; | ||
nextRangeToUpdate = host.equals(hostName) && port.equals(securePort); | ||
|
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC in this loop we go through all ranges and parse the endRange
as hexadecimal and use that to find the last range entry. However, in the following loop we're replacing the hexadecimal endRange
with its decimal value. So if we run this command again for another replica, wouldn't the now decimal endRange
be incorrectly parsed as hexadecimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nextRange
is updated only if last range is associated with the CA doing the generator update so only one CA will do the update and since the other are not creating new range there should be no conflict. In the following loop only the ranges associated with this CA are updated so if last range is not owned by this CA there is no update therefore there should not be any conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was actually concerned about the endRange
, not nextRange
. In the first loop above it's parsed as hexadecimal and used for comparison:
String endRange = entry.getAttribute("endRange").getStringValues().nextElement();
BigInteger next = new BigInteger(endRange, 16);
if (lastUsedSerial.compareTo(next) < 0) {
In the second loop below, it's parsed again as hexadecimal, but written as decimal, basically replacing the old value:
String endRange = entry.getAttribute("endRange").getStringValues().nextElement();
BigInteger endRangeNo = new BigInteger(endRange, 16);
attrs.add(new LDAPAttribute("endRange", endRangeNo.toString()));
So suppose we run the tool again for another clone, the first loop will run again, but this time it will be reading the endRange
that has already been converted to decimal by the second loop, and it will parse the value as hexadecimal again. Is this correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are the ranges in the DS just before the migration:
beginRange: 13
endRange: 30
host: primary.example.com
beginRange: 31
endRange: 48
host: secondary.example.com
nextRange: 49
So in the DS at least there seems to be no gap. Is it possible to take these values as decimals, then update CS.cfg
based on these values, but avoid updating the DS completely? Some serial numbers that used to belong to one server might "move" to another server, but since we're stopping all servers, fixing all ranges at once, and restarting the servers again, that might be OK. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have inverted the order of the operation. Update the range to decimal of the updated CA and then look for the next range. In case of not selecting the real last range is not a problem. Considering the case in CI but we suppose the primary range is from 13 to 31 and the secondary range is from 31 to 48. The first updated is the primary so it become 19 49 and it is considered as last. The nextRange will be 50. Then the other CA is update and the range is 50 -72 and the next range is updated again to 73. There are no other operation involved so it is not a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing is we're still mix up radixes in the second loop:
LDAPSearchResults ranges = conn.search(rangeDN, LDAPv3.SCOPE_SUB, "(objectClass=pkiRange)", null, false);
String endRange = entry.getAttribute("endRange").getStringValues().nextElement();
BigInteger next = new BigInteger(endRange); // could be decimal or hexadecimal that has not been converted to decimal
While the code might work with the above example, there isn't really a guarantee that it will work fine in all cases.
As discussed in Slack, we should explore some alternatives:
- avoid modifying DS at all
- fix the radix for all servers in one step, then fix the nextRange in a separate step
- store the new SSNv2 ranges in a separate subtree (Relocate SSNv2 range objects for new CA instances #4885)
String dn = "cn=" + beginRangeNo.toString() + "," + rangeDN; | ||
LDAPEntry rangeEntry = new LDAPEntry(dn, attrs); | ||
logger.info("SubsystemRangeGeneratorUpdateCLI.updateRanges: Remove entry " + entry.getDN()); | ||
conn.delete(entry.getDN()); | ||
logger.info("SubsystemRangeGeneratorUpdateCLI.updateRanges: Adding entry " + dn); | ||
conn.add(rangeEntry); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we're deleting the old cn=<hex>
and adding a new cn=<decimal>
. Is there a possibility that the cn=<decimal>
will conflict with an existing range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to compute this to exclude this possibility but I cannot. However, since the conversion between hex and dec is not linear, the results are not multiple and the value increase very quickly I cannot find combination of value where overlap could happen so I consider very unlikely. However, we could try to find if a combination is possible orwe could add a check and eventually do a correction (e.g. adding or reducing the range by one entry)
base/server/src/main/java/org/dogtagpki/server/cli/SubsystemIdGeneratorUpdateCLI.java
Outdated
Show resolved
Hide resolved
a36d41b
to
6a95c31
Compare
The current generator has a problem with converting from hex to decimal the range boundaries creating gaps between ranges. This a problem when third parties tools are used to with certificates because contiguous range are expected. This commit introduce the generator legacy2. This uses same configuration parameter but hex value are specified by the prefix '0x'. When value are written to the configuration value it is possible to set the radix with the options: - dbs.cert.id.radix (default to 16) - dbs.key.id.radix (default to 16) - dbs.request.id.radix (default to 10) Additionally, the new command `pki-server <subsystem>-id-generator-*` has been added to migrate from the legacy generator to the legacy2 or to random.
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still some concerns about the migration process, but the SSNv2 itself looks fine. I think lets merge this PR so we can add tests for SSNv2 (without the migration) while we keep improving the migration process. Or if you prefer, feel free to split the PR and move the migration tools & tests into a separate PR. Thanks!
@edewata Thanks! I am merging this with the current migration implementation. It will be easier to test and improve to support the corner cases. |
Testing with just fixing the conversion of nextRange and increment will generate range overlaps and CA stop working so it is impossible to fix but this is a problem with master branch. In older releases, the increment is correctly evaluated and the problem is only in the nextRange. In fact, the CI test we have in master branch https://github.com/dogtagpki/pki/blob/master/.github/workflows/ca-ssnv1-test.yml will generate different gaps if tests with code from older branch (like DOGTAG_10_5_BRANCH). To have a single solution for all versions and to avoid modifying running instance the optional new generator seems a better solution. |
Current implementation of legacy id generator using serial number has the problem to leave gaps when new ranges are generated because some conversion problem from decimal to hexadecimal range value. The gap could create problem to third party tool/service if they expected all no gaps are present in the serials.
A new id generator is introduced to solve the problem. To use the new id generator the following value should be used in
CS.cfg
:This can be be specified in
pkispawn
with the parameters:The remaining configuration are similar to the legacy generator. The main difference is that hexadecimal value must have the prefix
0x
, if it is not present then the value is read as decimal.The number serial written in
CS.cfg
during range update can be forced to be only decimal or hexadecimal setting the valuedbs.numberRangeRadix
to 10 or 16 respectively. If not set or negative then the default for the specific generator will be used.