-
Notifications
You must be signed in to change notification settings - Fork 24
Hydrolysis families and tests #770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
15076b7 to
e86ce34
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
49622d7 to
ef9d75c
Compare
28d0988 to
262761d
Compare
alongd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some minor initial comments, I think that the commits should first be organized in terms of function definitions (there's a missing """ to one of the docstrings, making other functions hard to read)
arc/job/adapters/ts/heuristics.py
Outdated
| spc (ARCSpecies): The species to check. | ||
| Returns: | ||
| bool: Whether the species is water. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the issue is that there's no closing """ for the docstring
arc/job/adapters/ts/heuristics.py
Outdated
| return O_counter == 1 and H_counter == 2 | ||
|
|
||
|
|
||
| def get_neighbors_by_electronegativity(spc: ARCSpecies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this commit deletes the def of another functions
alongd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Leen!
Please see the comments I added
|
Hi Alon, |
0440a42 to
2098e03
Compare
7df407b to
b91616f
Compare
carbonyl based, nitrile, and ether hydrolysis
carbonyl- based, nitrile, and ether hydrolysis
Adds the following:
1.Reaction setup and family filtering
-hydrolysis(reaction): Main entry to generate TS guesses for hydrolysis reactions
-get_products_and_check_families(): Retrieves hydrolysis product data and checks for carbonyl-based and ether hydrolysis families
-load_hydrolysis_parameters(): Loads family-specific parameters from hydrolysis_families_parameters.yml
2.Reactant and atom index mapping
-extract_reactant_and_indices(): Identifies reactants and key atomic indices (a, b, e, d, o, h1)
-process_hydrolysis_reaction(): Extracts water and main reactant from reactants
-is_water(): Recognizes water molecule
-get_neighbors_by_electronegativity(): Ranks neighbors by effective electronegativity, returns the top-ranked and remaining neighbors, excluding a specified atom.
3.Z-matrix setup and geometric manipulation
-setup_zmat_indices(): Converts XYZ indices to Z-matrix indices
-process_chosen_d_indices(): Applies bond stretching and generates TS variants with or without dihedral adjustment
-stretch_ab_bond(): Stretches the a–b bond based on family parameters
-get_matching_dihedrals(): Identifies dihedral angles in the Z-matrix that involve a specified set of atom indices.
-Add nitrile fallback second-pass with controlled dihedral adjustment
- First pass attempts nitrile TSG without forcing dihedral changes
- If nitrile hydrolysis is present but no nitrile TSG is formed, run an additional pass that forces dihedral adjustments
- Stop immediately once at least one nitrile TSG is found OR if all possible dihedral indices were exhausted without producing a valid TSG
-Use fixed factors; try original angle first, flip only if needed
- Use a fixed defined list of dihedral adjustment factors (instead of previous scattered handling)
- Apply these factors to modify the original angle first
- Only attempt the 180° flipped angle if no TS guesses are generated from the original angle variants
- Improve normalization and docstrings for clarity
4.TS generation and validation
-process_family_specific_adjustments(): Applies family-specific internal coordinates and generates TS guesses
-generate_hydrolysis_ts_guess(): Builds TS geometry, validates bonds, collisions, and uniqueness
-check_ts_bonds(): Validates water bond geometry in TS guess
-check_dao_angle(): Rejects guesses with near-linear angels for DAO angel
carbonyl-based, nitrile, and ether hydrolysis
-test_add_atoms_using_internal_coords() -Update formatting of tests and add tests -test_add_atoms_to_xyz_using_internal_coords()
-Allow setting multiple optimization methods via opt_methods -Default to ['SLSQP', 'Nelder-Mead', 'BFGS'] if not specified -Handle None values in BEST_GUESSES to prevent errors
Raise a ValueError when only one of solvation_method or solvent is defined. Both must be provided together or both must be None.
alongd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @LeenFahoum for this contribution!
Please see me comments below
| ANGL_PRECISION = 0.1 # rad (for both bond angle and dihedral) | ||
|
|
||
|
|
||
| global BEST_GUESSES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definition has no effect, please remove.
I suggest not using the BEST_GUESSES list as a global parameter.
It is only being used in add_atom_to_xyz_using_internal_coords() where it is defined, and by average_best_guesses_from_all_methods() which is called from the previous function.
instead, you can place the smaller average_best_guesses_from_all_methods function inside the larger function, and they'll both have access to the best_guesses list.
| methanol=self.methanol | ||
| rxn2 = ARCReaction(r_species=[methylformate, water], p_species=[formicacid, methanol]) | ||
| #RXN3 | ||
| ethylbenzoate=self.ethylbenzoate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please leave one white-space before and after each =
(PyCharm can format your code if you select it and do Ctrl+Shift+L)
| for i, guess in enumerate(xyz_guesses_total): | ||
| a, b, e, O, H1, d = guesses_indices[i] | ||
| xyz_str = xyz_to_str(guess) | ||
| print(reaction_families[i]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these print statement needed? or are they debugging leftovers?
| bnx_p3=self.bnx_p3 | ||
| rxn15=ARCReaction(r_species=[bnx_f3, water], p_species=[bnx_p3, methanol]) | ||
|
|
||
| tested_rxn=rxn2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand from this that we define above many reactions, but we only use rxn2 in the tests. Is this correct?
Maybe you want to loop through all your reactions for the test?
| print(a, b, e, d, O, H1) | ||
| print(xyz_str) | ||
| print() | ||
| if reaction_families[i]== 'carbonyl_based_hydrolysis':#the parameters of ether hydrolysis are checked in the following test section |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the comment. If you think it is needed, then use two empty spaces before,
e.g.: code # comment
| self.long_kinetic_description = '' | ||
| self._family = None | ||
| if check_family_name(family): | ||
| self._family = family |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should simply be self.family = family (no underscore)
but it should come after the block of:
self._product_dicts = None
self._atom_map = None
and we should add self._family = None to the above block
| raise IndexError(f"Atom index {atom_index} out of range for distance matrix of size {d_matrix .shape[0]}.") | ||
|
|
||
| distances = [(i, d_matrix[atom_index, i]) for i in range(d_matrix.shape[0]) if i != atom_index] | ||
| return sorted(distances, key=lambda x: x[1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an empty line at the end of the file
| job_type='opt', | ||
| project='test_9', | ||
| level=Level(method='xtb', solvent='water'), | ||
| level=Level(method='xtb', solvent='water', solvation_method="alpb"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is alpb? Is it a valid method in xTB?
| @@ -0,0 +1,49 @@ | |||
| # Configuration file for hydrolysis reaction parameters | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add here the citation to the paper that describes where the parameters are taken from. For now, just list authors and title, once it's out we;ll add the full citation details
| else: | ||
| rmg_families = list(family_sets[rmg_family_set]) \ | ||
| if isinstance(rmg_family_set, str) and rmg_family_set in family_sets else rmg_family_set | ||
| if isinstance(rmg_family_set, str) and rmg_family_set in family_sets else [rmg_family_set] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this commit fixes an issue introduced by this PR, then please squash it accordingly. If it was a previous bug, then OK to leave it.
new hydrolysis families and tests