-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import cross sections from csv #41
base: master
Are you sure you want to change the base?
Import cross sections from csv #41
Conversation
This reverts part of commit 7984c15 and restores a newline that got lost while restoring test/test.py
galore/cross_sections.py
Outdated
with tarfile.open(tar_file_name) as tf: | ||
with tf.extractfile(file_path) as hello: | ||
data = hello.read().decode() | ||
a = data.split('\r\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use descriptive names for variables. It is hard to read a line of code operating on a
, b
, c
, and d
and understand what it is supposed to be doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new name data_string
is a bit better because it is at least "greppable". But it's also a bit misleading because data_string
is not actually a string, it's a list. Maybe something like data_lines
would be better, as this conveys how it was split?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_strings
would at least be better. If you read data_string[0]
it looks like it indexes a single letter from a string. Whereas data_strings[0]
clearly gets a longer string, which can be split.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry wrong push.
- Remove merge conflict - Remove unnecessary style changes Not that there's anything wrong with style cleanup, but it's better if it doesn't dominate the changes under review. This could be a separate PR and discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the updated read_csv_file: I'll take a closer look at the other parts tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, comments on the rest of the file!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Yaxuan,
I have made some comments but haven't yet tried running the new code. This looks pretty functional already, which is great.
Please do use tools to check PEP8 compliance and clean up the spacing as you go along, it makes the code easier to read/review. (And will be needed in the end anyway.)
I'm relieved to see that not many lines of code were needed to handle the installation locations, this looks simple and robust. The next step would be to add an environment variable for a user location, but that doesn't necessarily have to be part of this PR. Cleaning this up and handling/interpolating missing energy values is a higher priority.
galore/cli/galore_install_data.py
Outdated
def run(reference): | ||
if reference == 'Scofield' or reference == 'Yeh': | ||
|
||
url,data_file_dir,data_file_path =galore.cross_sections.get_csv_file_path(reference) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please follow PEP8 guidelines for spacing around ,
and =
, it makes the code easier to read.
galore/cli/galore_install_data.py
Outdated
return parser | ||
|
||
def run(reference): | ||
if reference == 'Scofield' or reference == 'Yeh': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to do this is if reference in ('Scofield', 'Yeh')
. What you have is fine here, but the other way is worth knowing if there are more options to compare 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another nice trick which is used elsewhere in the code is if reference.lower() in ('scofield', 'yeh')
; this makes it case-insensitive so our user doesn't have to mess around with shift keys.
galore/cross_sections.py
Outdated
|
||
|
||
if os.path.isfile(data_file_path)== True: | ||
print("Data file exists.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if somebody wants to update to a corrected newer version?
If we don't provide some kind of force=True
option, it would be helpful to print where the file exists so it can be examined/deleted.
|
||
try: | ||
os.mkdir(data_file_dir) | ||
except: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this except
catching? "Bare" exceptions like this can catch any error, including ones we don't expect. It's best to be very specific about which exceptions are ok to ignore.
electrons_numbers = np.array( | ||
[value for key, value in electron_counts_by_subshells.items() if 's' in key]) | ||
# get highest obital cross section of obital s | ||
highest_obital_cross_section = s_cross_sections[-1]/electrons_numbers[-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would IndexError if there is no s orbital data. (Say because we are looking at very low energy and only a p orbital is available.)
galore/cli/galore_get_cs.py
Outdated
@@ -54,28 +54,33 @@ def get_parser(): | |||
Space-separated symbols for elements in material.""") | |||
|
|||
parser.add_argument('--dataset', type=str, | |||
help='You can enter "Scofield" or "Yeh"') | |||
help= | |||
"""You can enter 'Scofield' or 'Yeh' """) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't clearer than the previous formatting, I think autopep8 is sneaking in again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would still be good to use choices
here, then Argparse is responsible for handling bad user input.
galore/cli/galore_get_cs.py
Outdated
logging = galore.cross_sections.cross_sections_info(cross_sections) | ||
logging.info("Photoionisation cross sections per electron:") | ||
if cross_sections is None: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which scenario is this catching? Could there be a more helpful logging message in that case? Users don't like it when programs do nothing.
galore/cross_sections.py
Outdated
cross_sections, closest_energy = _cross_sections_from_csv_data( | ||
energy, data, dataset) | ||
cross_sections_dict[element] = cross_sections | ||
print('The closest energy of input is {energy} keV'.format( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging instead of print here: the user should have a record of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general approach with sample test data looks like a good one for the current data setup.
The new test file looks functional, but isn't running at the moment because the filename needs tweaking.
@@ -0,0 +1,72 @@ | |||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have a different name than new_test.py
. It won't be "new" for long, and then it won't be easy to guess what it tests without looking inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More importantly, the test isn't actually running on the CI at the moment - I don't see them in the logs here https://github.com/SMTG-UCL/galore/pull/41/checks
That's because the test framework expects files containing tests to begin with the word "test" - this needs to be renamed to something like "test_csv_data.py" so they are found by setup.py test
.
|
||
##check the function goes well with above datatable | ||
cross_sections_scofield,_= _cross_sections_from_csv_data(800,data,'Yeh') | ||
self.assertAlmostEqual(cross_sections_scofield['s'], 0.00145) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this variable be cross_sections_yeh
? These tests are nearly duplicated, which suggests they might be more cleanly implemented by calling a common function.
(This is not urgent, I can clean it up later if need be.)
def test_scofield_directory_and_path(self): | ||
## Simulate expected correct directory and path for scofield data | ||
correct_path = os.path.join(correct_directory, 'Scofield_csv_database.zip') | ||
_,_,Scofield_file_path = get_csv_file_path('Scofield') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be scofield_file_path
(PEP8)
No description provided.