-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add smoothing spline baseline #299
base: main
Are you sure you want to change the base?
Conversation
I think there is a little tricky thing here regarding the testing. Because the distance should be used as-is (with baseline) and the force should be baseline corrected, you end up with an implicit equation in the trap position: `(trap_position - trap2_ref) = tether_length + 2 * bead_radius + 2 * (wlc_force + baseline(trap_position)) / stiffness` Note how trap position appears on the left and right thanks to the baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I've had a first look at this and come to the same conclusion as you, that the smoothing factor based optimization doesn't really work that well unless you also downsample. This suggests that we should provide a good default for this. Otherwise, looking pretty good already!
For tests, I think adding a test which tests unique_sorted
and one that tests the baseline (both code paths, with fixed and non-fixed smoothing factor) should suffice.
Adding a package involves adding it to setup.py
. For conda, it might be a bit trickier since I don't see this package on conda-forge. I'd have to look into it.
To muddy the waters a bit more, another class of algorithms that's probably worth exploring for this purpose is LO(W)ESS regression.
x = trap_position.data | ||
u, c = np.unique(x, return_counts=True) | ||
m = np.isin(x, [u[c < 2]]) | ||
ind = np.argsort(x[m]) | ||
|
||
return x[m][ind], force.data[m][ind] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could consider simplifying this to:
def unique_sorted(trap_position, force):
"""Sort and remove duplicates trap_position data to prepare for fit smoothing spline.
Parameters
----------
trap_position : array_like
Trap mirror position
force : array_like
Force data
"""
sorted_position, idx, count = np.unique(trap_position, return_index=True, return_counts=True)
return sorted_position[count < 2], force[idx][count < 2]
Saves you a search and a sort (note that the result from np.unique
is already sorted).
Given that you return the raw numpy arrays rather than slices, I would also consider just taking raw numpy arrays as input and extracting the data where its called (rather than passing a slice).
Trap mirror position data | ||
force : lumicks.pylake.Slice | ||
Force data | ||
smoothing_factor : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether it would make sense to merge smoothing_factor
and smoothing_factors
.
If the input is just a value or has only one element, you use it as fixed value and otherwise you perform the optimization. You can use np.atleast_1d
to upconvert a value to a numpy array so that you can always use the len
operation on it. The default can then just be a default list that generally works.
) | ||
model = csaps(x_sorted, y_sorted, smooth=smoothing_factor) | ||
|
||
return cls(model, trap_position, force) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One issue with calling the constructor like this is that you don't actually put the data that you used for fitting in now (you fitted to x_sorted
and y_sorted
but store the original raw data).
mse_test_vals = np.zeros(len(smoothing_factors)) | ||
x_sorted, y_sorted = unique_sorted(trap_position, force) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I find a bit surprising is that you choose to pass trap position and force to this function, despite already having x_sorted
and y_sorted
. Is there a specific reason you prefer this?
force, | ||
smoothing_factors, | ||
n_repeats, | ||
plot_smoothingfactor_mse, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering that the function has smoothing_factor
in the name, I reckon plot_mse
would be sufficient.
I added functions to fit a baseline with a smoothing spline.
The smoothing factor is optimized using k-fold cross validation.
Therefore I added a function to plot the mse on the test data vs the smoothing factors.
Let me know what you think.