Skip to content

Conversation

@LeenFahoum
Copy link
Contributor

new hydrolysis families and tests

Copy link

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some minor initial comments, I think that the commits should first be organized in terms of function definitions (there's a missing """ to one of the docstrings, making other functions hard to read)

spc (ARCSpecies): The species to check.
Returns:
bool: Whether the species is water.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the issue is that there's no closing """ for the docstring

return O_counter == 1 and H_counter == 2


def get_neighbors_by_electronegativity(spc: ARCSpecies,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this commit deletes the def of another functions

Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Leen!
Please see the comments I added

@LeenFahoum
Copy link
Contributor Author

Hi Alon,
I just finished updating the code according to your comments and force-pushed the changes.
Could you please take a look and let me know if any further adjustments are needed?

@LeenFahoum LeenFahoum force-pushed the hydrolysis branch 2 times, most recently from 0440a42 to 2098e03 Compare November 26, 2025 09:28
@LeenFahoum LeenFahoum force-pushed the hydrolysis branch 5 times, most recently from 7df407b to b91616f Compare December 30, 2025 15:45
carbonyl based, nitrile, and ether hydrolysis
carbonyl- based, nitrile, and ether hydrolysis
Adds the following:

1.Reaction setup and family filtering

    -hydrolysis(reaction): Main entry to generate TS guesses for hydrolysis reactions

    -get_products_and_check_families(): Retrieves hydrolysis product data and checks for carbonyl-based and ether hydrolysis families

    -load_hydrolysis_parameters(): Loads family-specific parameters from hydrolysis_families_parameters.yml

2.Reactant and atom index mapping

    -extract_reactant_and_indices(): Identifies reactants and key atomic indices (a, b, e, d, o, h1)

    -process_hydrolysis_reaction(): Extracts water and main reactant from reactants

    -is_water(): Recognizes water molecule

    -get_neighbors_by_electronegativity(): Ranks neighbors by effective electronegativity, returns the top-ranked and remaining neighbors, excluding a specified atom.

3.Z-matrix setup and geometric manipulation

    -setup_zmat_indices(): Converts XYZ indices to Z-matrix indices

    -process_chosen_d_indices(): Applies bond stretching and generates TS variants with or without dihedral adjustment

    -stretch_ab_bond(): Stretches the a–b bond based on family parameters
     -get_matching_dihedrals(): Identifies dihedral angles in the Z-matrix that involve a specified set of atom indices.

   -Add nitrile fallback second-pass with controlled dihedral adjustment
         - First pass attempts nitrile TSG without forcing dihedral changes 
         - If nitrile hydrolysis is present but no nitrile TSG is formed, run an additional pass that forces dihedral adjustments
         - Stop immediately once at least one nitrile TSG is found OR if all possible dihedral indices were exhausted without producing a valid TSG

   
   -Use fixed factors; try original angle first, flip only if needed
           - Use a fixed defined list of dihedral adjustment factors (instead of previous scattered handling)
           - Apply these factors to modify the original angle first
           - Only attempt the 180° flipped angle if no TS guesses are generated from the original angle variants
           - Improve normalization and docstrings for clarity

4.TS generation and validation

    -process_family_specific_adjustments(): Applies family-specific internal coordinates and generates TS guesses

    -generate_hydrolysis_ts_guess(): Builds TS geometry, validates bonds, collisions, and uniqueness

    -check_ts_bonds(): Validates water bond geometry in TS guess

    -check_dao_angle(): Rejects guesses with near-linear angels for DAO angel
carbonyl-based, nitrile, and ether hydrolysis
-test_add_atoms_using_internal_coords()

-Update formatting of tests and add tests

-test_add_atoms_to_xyz_using_internal_coords()
-Allow setting multiple optimization methods via opt_methods

-Default to ['SLSQP', 'Nelder-Mead', 'BFGS'] if not specified

-Handle None values in BEST_GUESSES to prevent errors
Raise a ValueError when only one of solvation_method or solvent is defined.
Both must be provided together or both must be None.
Copy link
Member

@alongd alongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LeenFahoum for this contribution!
Please see me comments below

ANGL_PRECISION = 0.1 # rad (for both bond angle and dihedral)


global BEST_GUESSES
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This definition has no effect, please remove.
I suggest not using the BEST_GUESSES list as a global parameter.
It is only being used in add_atom_to_xyz_using_internal_coords() where it is defined, and by average_best_guesses_from_all_methods() which is called from the previous function.
instead, you can place the smaller average_best_guesses_from_all_methods function inside the larger function, and they'll both have access to the best_guesses list.

methanol=self.methanol
rxn2 = ARCReaction(r_species=[methylformate, water], p_species=[formicacid, methanol])
#RXN3
ethylbenzoate=self.ethylbenzoate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please leave one white-space before and after each =
(PyCharm can format your code if you select it and do Ctrl+Shift+L)

for i, guess in enumerate(xyz_guesses_total):
a, b, e, O, H1, d = guesses_indices[i]
xyz_str = xyz_to_str(guess)
print(reaction_families[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these print statement needed? or are they debugging leftovers?

bnx_p3=self.bnx_p3
rxn15=ARCReaction(r_species=[bnx_f3, water], p_species=[bnx_p3, methanol])

tested_rxn=rxn2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand from this that we define above many reactions, but we only use rxn2 in the tests. Is this correct?
Maybe you want to loop through all your reactions for the test?

print(a, b, e, d, O, H1)
print(xyz_str)
print()
if reaction_families[i]== 'carbonyl_based_hydrolysis':#the parameters of ether hydrolysis are checked in the following test section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the comment. If you think it is needed, then use two empty spaces before,
e.g.: code # comment

self.long_kinetic_description = ''
self._family = None
if check_family_name(family):
self._family = family
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should simply be self.family = family (no underscore)
but it should come after the block of:

        self._product_dicts = None
        self._atom_map = None

and we should add self._family = None to the above block

raise IndexError(f"Atom index {atom_index} out of range for distance matrix of size {d_matrix .shape[0]}.")

distances = [(i, d_matrix[atom_index, i]) for i in range(d_matrix.shape[0]) if i != atom_index]
return sorted(distances, key=lambda x: x[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add an empty line at the end of the file

job_type='opt',
project='test_9',
level=Level(method='xtb', solvent='water'),
level=Level(method='xtb', solvent='water', solvation_method="alpb"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is alpb? Is it a valid method in xTB?

@@ -0,0 +1,49 @@
# Configuration file for hydrolysis reaction parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add here the citation to the paper that describes where the parameters are taken from. For now, just list authors and title, once it's out we;ll add the full citation details

else:
rmg_families = list(family_sets[rmg_family_set]) \
if isinstance(rmg_family_set, str) and rmg_family_set in family_sets else rmg_family_set
if isinstance(rmg_family_set, str) and rmg_family_set in family_sets else [rmg_family_set]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this commit fixes an issue introduced by this PR, then please squash it accordingly. If it was a previous bug, then OK to leave it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants