Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import cross sections from csv #41

Draft
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

Yaxuan-Lii
Copy link
Contributor

No description provided.

galore/__init__.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
with tarfile.open(tar_file_name) as tf:
with tf.extractfile(file_path) as hello:
data = hello.read().decode()
a = data.split('\r\n')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use descriptive names for variables. It is hard to read a line of code operating on a, b, c, and d and understand what it is supposed to be doing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new name data_string is a bit better because it is at least "greppable". But it's also a bit misleading because data_string is not actually a string, it's a list. Maybe something like data_lines would be better, as this conveys how it was split?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_strings would at least be better. If you read data_string[0] it looks like it indexes a single letter from a string. Whereas data_strings[0] clearly gets a longer string, which can be split.

galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@Yaxuan-Lii Yaxuan-Lii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry wrong push.

galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/__init__.py Outdated Show resolved Hide resolved
galore/__init__.py Outdated Show resolved Hide resolved
galore/__init__.py Outdated Show resolved Hide resolved
- Remove merge conflict
- Remove unnecessary style changes

Not that there's anything wrong with style cleanup, but it's better if
it doesn't dominate the changes under review. This could be a separate
PR and discussion.
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
Copy link
Member

@ajjackson ajjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the updated read_csv_file: I'll take a closer look at the other parts tomorrow.

galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Show resolved Hide resolved
Copy link
Member

@ajjackson ajjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, comments on the rest of the file!

galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
Copy link
Member

@ajjackson ajjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yaxuan,

I have made some comments but haven't yet tried running the new code. This looks pretty functional already, which is great.

Please do use tools to check PEP8 compliance and clean up the spacing as you go along, it makes the code easier to read/review. (And will be needed in the end anyway.)

I'm relieved to see that not many lines of code were needed to handle the installation locations, this looks simple and robust. The next step would be to add an environment variable for a user location, but that doesn't necessarily have to be part of this PR. Cleaning this up and handling/interpolating missing energy values is a higher priority.

galore/cli/galore_get_cs.py Outdated Show resolved Hide resolved
galore/cli/galore_get_cs.py Outdated Show resolved Hide resolved
galore/cli/galore_get_cs.py Outdated Show resolved Hide resolved
def run(reference):
if reference == 'Scofield' or reference == 'Yeh':

url,data_file_dir,data_file_path =galore.cross_sections.get_csv_file_path(reference)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow PEP8 guidelines for spacing around , and =, it makes the code easier to read.

return parser

def run(reference):
if reference == 'Scofield' or reference == 'Yeh':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this is if reference in ('Scofield', 'Yeh'). What you have is fine here, but the other way is worth knowing if there are more options to compare 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice trick which is used elsewhere in the code is if reference.lower() in ('scofield', 'yeh'); this makes it case-insensitive so our user doesn't have to mess around with shift keys.

galore/cross_sections.py Outdated Show resolved Hide resolved


if os.path.isfile(data_file_path)== True:
print("Data file exists.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if somebody wants to update to a corrected newer version?

If we don't provide some kind of force=True option, it would be helpful to print where the file exists so it can be examined/deleted.


try:
os.mkdir(data_file_dir)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this except catching? "Bare" exceptions like this can catch any error, including ones we don't expect. It's best to be very specific about which exceptions are ok to ignore.

galore/cross_sections.py Outdated Show resolved Hide resolved
galore/cross_sections.py Outdated Show resolved Hide resolved
electrons_numbers = np.array(
[value for key, value in electron_counts_by_subshells.items() if 's' in key])
# get highest obital cross section of obital s
highest_obital_cross_section = s_cross_sections[-1]/electrons_numbers[-1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would IndexError if there is no s orbital data. (Say because we are looking at very low energy and only a p orbital is available.)

@@ -54,28 +54,33 @@ def get_parser():
Space-separated symbols for elements in material.""")

parser.add_argument('--dataset', type=str,
help='You can enter "Scofield" or "Yeh"')
help=
"""You can enter 'Scofield' or 'Yeh' """)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't clearer than the previous formatting, I think autopep8 is sneaking in again?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still be good to use choices here, then Argparse is responsible for handling bad user input.

logging = galore.cross_sections.cross_sections_info(cross_sections)
logging.info("Photoionisation cross sections per electron:")
if cross_sections is None:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which scenario is this catching? Could there be a more helpful logging message in that case? Users don't like it when programs do nothing.

cross_sections, closest_energy = _cross_sections_from_csv_data(
energy, data, dataset)
cross_sections_dict[element] = cross_sections
print('The closest energy of input is {energy} keV'.format(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logging instead of print here: the user should have a record of this.

Copy link
Member

@ajjackson ajjackson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach with sample test data looks like a good one for the current data setup.

The new test file looks functional, but isn't running at the moment because the filename needs tweaking.

@@ -0,0 +1,72 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a different name than new_test.py. It won't be "new" for long, and then it won't be easy to guess what it tests without looking inside.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More importantly, the test isn't actually running on the CI at the moment - I don't see them in the logs here https://github.com/SMTG-UCL/galore/pull/41/checks

That's because the test framework expects files containing tests to begin with the word "test" - this needs to be renamed to something like "test_csv_data.py" so they are found by setup.py test.


##check the function goes well with above datatable
cross_sections_scofield,_= _cross_sections_from_csv_data(800,data,'Yeh')
self.assertAlmostEqual(cross_sections_scofield['s'], 0.00145)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this variable be cross_sections_yeh? These tests are nearly duplicated, which suggests they might be more cleanly implemented by calling a common function.

(This is not urgent, I can clean it up later if need be.)

def test_scofield_directory_and_path(self):
## Simulate expected correct directory and path for scofield data
correct_path = os.path.join(correct_directory, 'Scofield_csv_database.zip')
_,_,Scofield_file_path = get_csv_file_path('Scofield')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be scofield_file_path (PEP8)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants