Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import cross sections from csv #41

Draft
wants to merge 28 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0a0270c
add test for process_pdos
Jun 23, 2021
d5c8552
fix the imports
Yaxuan-Lii Jun 29, 2021
553738f
removeed irrelevant files
Yaxuan-Lii Jul 1, 2021
5300d62
import is fixed
Yaxuan-Lii Jul 1, 2021
dab091b
try to remove irrelevant fils second time
Yaxuan-Lii Jul 1, 2021
d69d61b
remove irrelevant files second time
Yaxuan-Lii Jul 1, 2021
7984c15
delet irrelevant files
Yaxuan-Lii Jul 1, 2021
66b1c6f
remove irrelevant files
Yaxuan-Lii Jul 1, 2021
4fdafa7
delete .DS_Store
Yaxuan-Lii Jul 1, 2021
dca2d49
Modified test_process_pdos.py
Yaxuan-Lii Jul 2, 2021
eb079f6
add test.py
Yaxuan-Lii Jul 2, 2021
8190183
Restore some files that were accidentally deleted
ajjackson Jul 2, 2021
946efaa
use flake8 to optimise format
Yaxuan-Lii Jul 5, 2021
cd0f3c0
import cross-sections from CSV archives
Yaxuan-Lii Jul 15, 2021
8f48e5e
form modify
Yaxuan-Lii Jul 17, 2021
a959150
modify follow the comments
Yaxuan-Lii Jul 21, 2021
067c289
make corrections
Yaxuan-Lii Jul 21, 2021
a621ab0
make correction
Yaxuan-Lii Jul 21, 2021
13911d0
Revert changes to galore/__init__.py
ajjackson Jul 21, 2021
4fb993d
modified as comments
Yaxuan-Lii Aug 2, 2021
cd4dc0e
merge the change of galore/__init__.py
Yaxuan-Lii Aug 2, 2021
5c9c31c
add cli to install data and get cross sections
Yaxuan-Lii Aug 9, 2021
b138c4b
merge get_cross_sections_from_csv into get_cross_sections
Yaxuan-Lii Aug 10, 2021
bb8a526
modify and add new test
Yaxuan-Lii Aug 24, 2021
5cdb67b
add a IF statement
Yaxuan-Lii Aug 31, 2021
39651d5
modify as comments
Yaxuan-Lii Sep 9, 2021
a2441b5
modify as comments
Yaxuan-Lii Sep 9, 2021
f59b50a
mistakes fix
Yaxuan-Lii Sep 12, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 20 additions & 9 deletions galore/cli/galore_get_cs.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,36 @@
from argparse import ArgumentParser
import galore.cross_sections


def main():
parser = get_parser()
args = parser.parse_args()
args = vars(args)
run(**args)


def get_parser():
parser = ArgumentParser()
parser.add_argument('energy', type=str,
help="""
Photon energy, expressed as source type: "he2" for He (II), "alka" for
Al k-alpha, (values from Yeh/Lindau (1985)) or as energy in keV (values
from polynomial fit to Scofield (1973)).""")
parser.add_argument('elements', type=str, nargs='+', help="""
Space-separated symbols for elements in material.""")
#parser.add_argument('energy', type=str,
# help="""
# Photon energy, expressed as source type: "he2" for He (II), "alka" for
# Al k-alpha, (values from Yeh/Lindau (1985)) or as energy in keV (values
# from polynomial fit to Scofield (1973)).""")
#parser.add_argument('elements', type=str, nargs='+', help="""
# Space-separated symbols for elements in material.""")



parser.add_argument('energy', type=str)
parser.add_argument('elements', nargs= '*')
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

parser.add_argument('--dataset', type=str, help = 'You can enter "Scofield" or "Yeh"')
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

return parser

def run(energy, elements):
cross_sections = galore.get_cross_sections(energy, elements)

def run(energy, elements,dataset):
cross_sections = galore.get_cross_sections(energy, elements,dataset)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved
logging = galore.cross_sections.cross_sections_info(cross_sections)
logging.info("Photoionisation cross sections per electron:")

Expand All @@ -60,3 +70,4 @@ def run(energy, elements):
else:
logging.info(" {0} {1}: {2:.3e}".format(element,
orbital, value))

25 changes: 25 additions & 0 deletions galore/cli/galore_install_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
from argparse import ArgumentParser
import galore.cross_sections

def main():
parser = get_parser()
args = parser.parse_args()
args = vars(args)
run(**args)

def get_parser():
parser = ArgumentParser()
parser.add_argument('reference', type=str,
help=""" 'Scofield' or 'Yeh'""")

return parser

def run(reference):
if reference == 'Scofield' or reference == 'Yeh':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to do this is if reference in ('Scofield', 'Yeh'). What you have is fine here, but the other way is worth knowing if there are more options to compare 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another nice trick which is used elsewhere in the code is if reference.lower() in ('scofield', 'yeh'); this makes it case-insensitive so our user doesn't have to mess around with shift keys.


url,data_file_dir,data_file_path =galore.cross_sections.get_csv_file_path(reference)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow PEP8 guidelines for spacing around , and =, it makes the code easier to read.

galore.cross_sections.galore_install_data(url,data_file_dir,data_file_path)
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

else:
print("You can enter reference 'Scofield' or 'Yeh' ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording can be a bit firmer here, and it's also useful to give feedback. e.g. f"Reference '{reference}' was not recognised. Accepted values are 'Scofield' and 'Yeh'."

This can make it clearer if the user made a typo or a script inserted the wrong value.


293 changes: 277 additions & 16 deletions galore/cross_sections.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import numpy as np

import os.path
from pkg_resources import resource_filename
from json import load as json_load
Expand All @@ -9,7 +11,7 @@
from numpy import exp, log


def get_cross_sections(weighting, elements=None):
def get_cross_sections(weighting, elements=None,dataset=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new argument needs to be included in the docstring below

"""Get photoionization cross-section weighting data.

For known sources, data is based on tabulation of Yeh/Lindau (1985).[1]
Expand Down Expand Up @@ -46,27 +48,47 @@ def get_cross_sections(weighting, elements=None):
may be used to store metadata.

"""

try:
energy = float(weighting)
return get_cross_sections_scofield(energy, elements=elements)

except ValueError:
if isinstance(weighting, str):
if weighting.lower() in ('alka', 'he2', 'yeh_haxpes'):
return get_cross_sections_yeh(weighting)
elif weighting.lower() in ('xps', 'ups', 'haxpes'):
raise ValueError("Key '{0}' is no longer accepted for "
if dataset==None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended in PEP8 that we always compare to None using is, i.e.

if dataset is None:

(This is because in principle some maniac could be using a custom class/object which == None!)

try:
energy = float(weighting)
return get_cross_sections_scofield(energy, elements=elements)

except ValueError:
if isinstance(weighting, str):
if weighting.lower() in ('alka', 'he2', 'yeh_haxpes'):
return get_cross_sections_yeh(weighting)
elif weighting.lower() in ('xps', 'ups', 'haxpes'):
raise ValueError("Key '{0}' is no longer accepted for "
"weighting. Please use 'alka' for Al k-alpha,"
" 'he2' for He (II) or 'yeh_haxpes' for "
"8047.8 eV HAXPES".format(weighting))
else:
return get_cross_sections_json(weighting)
else:
return get_cross_sections_json(weighting)

# A string or a number would have hit a return statement by now
raise ValueError("Weighting not understood as string or float. ",
# A string or a number would have hit a return statement by now
raise ValueError("Weighting not understood as string or float. ",
"Please use a keyword, path to JSON file or an "
"energy value in eV")
else:
cross_sections_dict = {}
energy = weighting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be float(weighting)? It makes sense to turn energy into a number while it's being renamed.

metadata = _get_metadata(energy,dataset)
cross_sections_dict.update(metadata)


_,_,data_file_path = get_csv_file_path(dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good use of _ for unused return values 😄

if os.path.isfile(data_file_path)== True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for == True; isfile() already returns a True/False that will make sense to if.

for element in elements:

file_name = '{element1}.csv'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the significance of the 1?

data = read_csv_file(data_file_path,file_name.format(element1 = element))
elemental_cross_sections = _cross_sections_from_csv_data(energy,data,dataset)
cross_sections_dict[element] = elemental_cross_sections
else:
print("Do you want to install data? \n You can enter \n galore-install-data {dataset}".format(dataset = dataset))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea to clarify that this is because the file was not found. It might be that the user did install data but there is some other issue with paths.

Also, print is likely not the best option here; in Galore we generally use logging instead. Or would it make more sense in this case to raise an error? What happens when other invalid values are used?


return cross_sections_dict



def cross_sections_info(cross_sections, logging=None):
Expand Down Expand Up @@ -281,3 +303,242 @@ def _eval_fit(energy, coeffs):
orb: _eval_fit(energy, np_fromstr(coeffs))
for orb, coeffs in orbitals_fits}})
return el_cross_sections


import zipfile
import numpy as np
import platform
import os
import urllib.request

def read_csv_file(data_file_path,file_name):
"""
Generate data from csv archive without decompression

Args:
tar_file_name (str): path to tarfile of CSV data
file_path(str): path to individual CSV file within tarfile

Returns:
dict: containing 'headers', 'electron_counts'
(lists of str and int respectively) and 'data_table',
a 2-D nested list of floats. Missing data is represented as None.

"""

## Open zipfile
with zipfile.ZipFile(data_file_path) as data_zf:
data = data_zf.read(file_name).decode()

## string to list
data_string = data.split('\r\n')

ajjackson marked this conversation as resolved.
Show resolved Hide resolved


## get number of colunm headers
column_headers = [column_header for column_header in data_string[0].split(',') if column_header !='']
n_columns = len(column_headers)


## build main matrix
main_matrix = [row.split(',')[0:n_columns] for row in data_string]


## remove empty values
midterm=[row for row in main_matrix if any(row)==True]
data_table = midterm[1:-1]
electron_counts_list = [occupancy for occupancy in midterm[-1] if occupancy !=''][1:]

## replace '' in data table with NaN and change string list into float array
data_table = np.array([[float('NaN') if cross_section == '' else float(cross_section) for cross_section in row]
for row in data_table])
electron_counts = np.array([float(occupancy) for occupancy in electron_counts_list])


## build result dict
result_dict={}
result_dict['headers'] = column_headers
result_dict['electron_counts'] = electron_counts
result_dict['data_table'] = data_table
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

return result_dict

def _get_avg_orbital_cross_sections(subshells_cross_sections,electrons_number):
"""
Obtain average cross sections of subshell of each obital when process Scofield data.

Args:
subshell_cross_sections(np.array): expecting pairs of subshells cross sections of a certain obital except s-obital
correspond to a given input energy.
electrons_number(np.array): corresponding electrons number of subshells

Note: subshells are like ('2p1/2', '2p3/2', '3p1/2', '3p3/2',...) for obital p

Returns:
avg_orbital_cross_sections(np.array): average cross sections of obitals

Note: the above array are like ('2p', '3p',...) for obital p
"""


subshells_cross_section_arrays = np.array_split(subshells_cross_sections, 2)
electrons_numbers_arrays = np.array_split(electrons_number, 2)

subshells_cross_sections_sum = np.array(list(map(sum,subshells_cross_section_arrays)))

subshells_electrons_number_sum = np.array(list(map(sum,electrons_numbers_arrays)))

avg_orbital_cross_sections = np.true_divide(subshells_cross_sections_sum,subshells_electrons_number_sum)


return avg_orbital_cross_sections

def _cross_sections_from_csv_data(energy,data,dataset):
"""
Get cross sections from data dict
For known sources, data is based on tabulation of Yeh/Lindau (1985).[1]
Otherwise, energies in keV from 1-1500 are used with log-log polynomial
parametrisation of data from Scofield.[2]

References:
1. Yeh, J.J. and Lindau, I. (1985)
Atomic Data and Nuclear Data Tables 32 pp 1-155
2. J. H. Scofield (1973) Lawrence Livermore National Laboratory
Report No. UCRL-51326

Args:
energy(float): energy value
data(dict): data from read_csv_file()
reference(str): 'Scofield' or 'Yeh'

Returns:
orbitals_cross_sections_dict: containing orbitals 's', 'p', 'd', 'f' and
maximum cross sections of each orbital.
Missing data is represented as None.

"""

n_subshells = len(data['electron_counts'])
subshells_headers = data['headers'][-n_subshells:]
cross_sections_by_subshells = data['data_table'].T[-n_subshells:]

##build dicts for easy calculation.
electron_counts_by_subshells = dict(zip(subshells_headers,data['electron_counts']))
cross_sections_by_subshells = dict(zip(subshells_headers,data['data_table'].T[-n_subshells:]))

## match the import energy
energy_index = np.abs(data['data_table'].T[0] - float(energy)).argmin()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the requested energy is nowhere near a value in the table?


## build result dict
orbitals_cross_sections_dict = {}

## result for s orbital
s_cross_sections = np.array([value[energy_index] for key, value in cross_sections_by_subshells.items() if 's' in key])
electrons_number = np.array([value for key, value in electron_counts_by_subshells.items() if 's' in key])
## get unit cross sections of s orbital
unit_cross_sections = s_cross_sections/electrons_number
## get highest cross section of obital s
highest_obital_cross_section = np.max(np.nan_to_num(unit_cross_sections))
orbitals_cross_sections_dict['s'] = highest_obital_cross_section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessarily the correct data to use. We want the "highest" orbital from 1s, 2s, 3s... which is not necessarily the one with the largest scattering cross section.




## result for 'p', 'd', 'f' orbitals
orbitals = ['p', 'd', 'f']

for orbital in orbitals:
interm_matrix = np.array([value for key, value in cross_sections_by_subshells.items() if orbital in key]).T
electrons_number = np.array([value for key, value in electron_counts_by_subshells.items() if orbital in key])

if np.shape(interm_matrix) != (0,):
if dataset == 'Scofield':
subshells_cross_sections = interm_matrix[energy_index]
result = _get_avg_orbital_cross_sections(subshells_cross_sections,electrons_number)

## get highest cross section of this obital
highest_obital_cross_section = np.max(np.nan_to_num(result))
orbitals_cross_sections_dict[orbital] = highest_obital_cross_section
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

elif dataset == 'Yeh':
obital_cross_sections = interm_matrix[energy_index]
## get unit cross sections of this orbital
unit_cross_sections = obital_cross_sections/electrons_number
## get highest cross section of this obital
highest_obital_cross_section = np.max(np.nan_to_num(unit_cross_sections))
orbitals_cross_sections_dict[orbital] = highest_obital_cross_section


return orbitals_cross_sections_dict

def _get_metadata(energy,dataset):
"""
Args:
energy(float): energy value
dataset(str): 'Scofield' or 'Yeh'

Note: 1.'Scofield' for J. H. Scofield (1973)
Lawrence Livermore National Laboratory Report No. UCRL-51326
2.'Yeh' for Yeh, J.J. and Lindau, I. (1985)
Atomic Data and Nuclear Data Tables 32 pp 1-155

Returns:
metadata_dict: containing the input energy value
and description of input reference

"""

metadata_dict = {}
metadata_dict['energy'] = energy
if dataset == 'Scofield':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making these comparisons case-insensitive with e.g. if dataset.lower() == 'scofield'.

metadata_dict['citation'] = 'J.H. Scofield, Theoretical photoionization cross sections from 1 to 1500 keV'
metadata_dict['link'] = 'https://doi.org/10.2172/4545040'
elif dataset == 'Yeh':
metadata_dict['citation'] = 'Yeh, J.J. and Lindau, I. (1985) Atomic Data and Nuclear Data Tables 32 pp 1-155'
metadata_dict['link'] = 'https://doi.org/10.1016/0092-640X(85)90016-6'
else:
metadata_dict('dataset error: you can enter reference as "Scofield" or "Yeh" ' )
return metadata_dict



def galore_install_data(DOI,data_file_dir,data_file_path):
ajjackson marked this conversation as resolved.
Show resolved Hide resolved


if os.path.isfile(data_file_path)== True:
print("Data file exists.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if somebody wants to update to a corrected newer version?

If we don't provide some kind of force=True option, it would be helpful to print where the file exists so it can be examined/deleted.


else:
print("Downloading file...")


try:
os.mkdir(data_file_dir)
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this except catching? "Bare" exceptions like this can catch any error, including ones we don't expect. It's best to be very specific about which exceptions are ok to ignore.

pass

urllib.request.urlretrieve(DOI,data_file_path)


if os.path.isfile(data_file_path) == True:
print("Done")


def get_csv_file_path(dataset):

if platform.system()== "Windows":

data_file_dir = os.path.join(os.getenv('LOCALAPPDATA'),'galore_data')
else:
data_file_dir = os.path.join(os.path.expanduser('~'),'.galore_data')
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

if dataset == 'Scofield':
data_file_path = os.path.join(data_file_dir,'Scofield_csv_database.zip')
DOI = 'https://ndownloader.figshare.com/articles/15081888/versions/1'
ajjackson marked this conversation as resolved.
Show resolved Hide resolved

elif dataset == 'Yeh':
data_file_path = os.path.join(data_file_dir,'Yeh_Lindau_1985_Xsection_CSV_Database.zip')
DOI = 'https://ndownloader.figshare.com/articles/15090012/versions/1'

return DOI,data_file_dir,data_file_path


Loading