Skip to content

Conversation

@LukaszMrugala
Copy link
Contributor

@LukaszMrugala LukaszMrugala commented Jun 16, 2025

Currently, some tests change ALSABAT settings during their execution, but do not restore them to their prior state.

This causes tests that occur later in the execution order to intermittently fail.

To remedy this, all tests that use ALSA now use a common save_alsa_state function, which shall save current ALSA state to a file and then restore that state on the bash script EXIT signal.

Fixes #1275

@sofci
Copy link
Collaborator

sofci commented Jun 16, 2025

Can one of the admins verify this patch?

reply test this please to run this test once

@LukaszMrugala LukaszMrugala force-pushed the alsabat-test-state-restoration-quickfix branch 6 times, most recently from d4ca717 to 451e887 Compare June 17, 2025 15:32
frames=${OPT_VAL['n']}

start_test
save_alsa_state "${0##*/}".state
Copy link
Collaborator

@marc-hb marc-hb Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than have every caller trim and append .state, you can let save_alsa_state do it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to separate the concerns of handling ALSA and handling of the arbitrary filename. This approach was used to ensure no collisions between scripts.

case-lib/lib.sh Outdated
save_alsa_state()
{
dlogi "save_alsa_state called with ${1}"
trap "restore_alsa_state '${1}'" EXIT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already an EXIT handler: func_exit_handler(). You can't have both; how did you test this?

You can likely have func_exit_handler() invoke restore_alsa_state when the conditions are right.

@marc-hb
Copy link
Collaborator

marc-hb commented Jun 18, 2025

Isn't this a followup to #1270?

@LukaszMrugala LukaszMrugala force-pushed the alsabat-test-state-restoration-quickfix branch 4 times, most recently from 3fb569a to 5060e2e Compare June 18, 2025 08:58
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 18, 2025

test this please

@LukaszMrugala LukaszMrugala force-pushed the alsabat-test-state-restoration-quickfix branch 2 times, most recently from 52064bc to 0edd342 Compare June 18, 2025 13:18
frames=${OPT_VAL['n']}

start_test
save_alsa_state "${0##*/}".state
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to separate the concerns of handling ALSA and handling of the arbitrary filename. This approach was used to ensure no collisions between scripts.

case-lib/lib.sh Outdated
Comment on lines 1218 to 1223
cur_exit_trap_code=$(trap -p EXIT | awk -F\' '{print $2}')
if [[ -z "${cur_exit_trap_code// }" ]]; then
trap "restore_alsa_state ${1}" EXIT
else
trap "restore_alsa_state ${1}; ${cur_exit_trap_code}" EXIT
fi
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach works perfectly in my local tests, but generates TIMEOUTs in the CI in runs that previously were FAILs.
I didn't want to dump the ALSA traps inside the func_exit_handler, as it's a generic solution for every test and not every test requires ALSA setup and teardown.
I welcome comments as to why this would cause func_exit_handler to not run.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to dump the ALSA traps inside the func_exit_handler, as it's a generic solution for every test and not every test requires ALSA setup and teardown.

func_exit_handler() easily can be "smart" and run restore_alsa_state() only when save_alsa_state() was run before; this does not have to be "dumped".

I welcome comments as to why this would cause func_exit_handler to not run.

I think my previous "it does not work" comment was on some earlier and quite different version.

But even now, I don't like "stacking" trap functions like this because it scatters the exit handling code across multiple places and it becomes more difficult to see what it does in any given test and situation and in which order. Who says that restore_alsa_state should run first on EXIT? Maybe it shouldn't be first...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No TIMEOUT problems anymore after I've moved the called inside the func_exit_handler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered that the Jenkins runner waits forever for PASS or FAIL or SKIP to be printed, see commit 4395b60 and others. I'm afraid Jenkins does not care about the test process exiting and does not care about the exit value either. That's likely a (very old) design mistake.

This would tend to prove that the earlier version did not restore func_exit_handler() properly.

BTW can you set a trap from a trap? Dunno.

This discussion is obsolete and just for completeness, sorry.

@LukaszMrugala
Copy link
Contributor Author

Isn't this a followup to #1270?

It's connected, but it's not a follow-up per se. It aims to resolve the specific problem of ALSA tests changing machine's ALSA settings and messing with the following test executions. Note that it does nothing to e.g. prevent people from connecting to the machines and messing around in settings on an ad hoc basis.

@golowanow
Copy link
Member

Currently, some tests change ALSABAT settings during their execution, but do not restore them to their prior state.
This causes tests that occur later in the execution order to intermittently fail.

Do we really need to strive for ALSA state restore on_exit ?

Isn't it easier and just enough for our CI's purposes to restore the default state before at each test case where it is needed ?

This way it solves both these cases:

.. the specific problem of ALSA tests changing machine's ALSA settings and messing with the following test executions.

.. people from connecting to the machines and messing around in settings on an ad hoc basis.

@LukaszMrugala
Copy link
Contributor Author

Do we really need to strive for ALSA state restore on_exit ?

Isn't it easier and just enough for our CI's purposes to restore the default state before at each test case where it is needed ?

Yes, that's the ideal solution, given the assumption that we have a perfect description of the "default state" every time we run a test for every machine we run a test on. However, this quick fix allows us to be sure that tests do not interfere with each other no matter the deficiencies in our defined "default state" for a given machine at a given time.

@golowanow
Copy link
Member

Yes, that's the ideal solution, given the assumption that we have a perfect description of the "default state" every time we run a test for every machine we run a test on.

right, my assumption (it is already in #1270 comments) that each test plan has the DUT's default ALSA state set at the beginning of its execution, and it is saved for the following test cases to reset from it and have the common ground.

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's the ideal solution, given the assumption that we have a perfect description of the "default state" every time we run a test for every machine we run a test on.

That "assumption" is not really optional: if test results depend on some out-of-control configuration, then there is a test setup bug that must be fixed somewhere (BTW see area:Ansible Problems that Ansible can help with and the same label internally)

Tests cleaning up after themselves does not hurt but it's less important than comprehensive test setup. Also, cleanup must be easy to turn off when the test is failing. Anything that runs after a failure makes debugging harder - whether that's in interactive use or when staring at logs from automated runs.

case-lib/lib.sh Outdated
alsactl restore --file /var/tmp/"$1" --pedantic --no-ucm --no-init-fallback || dlogi "alsactl state restoration failed!"
rm /var/tmp/"$1" || dlogi "Old state file removal failed!"
fi
return "$status"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this... for instance it might print a misleading stack trace if the previous command failed.

case-lib/lib.sh Outdated
Comment on lines 1218 to 1223
cur_exit_trap_code=$(trap -p EXIT | awk -F\' '{print $2}')
if [[ -z "${cur_exit_trap_code// }" ]]; then
trap "restore_alsa_state ${1}" EXIT
else
trap "restore_alsa_state ${1}; ${cur_exit_trap_code}" EXIT
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to dump the ALSA traps inside the func_exit_handler, as it's a generic solution for every test and not every test requires ALSA setup and teardown.

func_exit_handler() easily can be "smart" and run restore_alsa_state() only when save_alsa_state() was run before; this does not have to be "dumped".

I welcome comments as to why this would cause func_exit_handler to not run.

I think my previous "it does not work" comment was on some earlier and quite different version.

But even now, I don't like "stacking" trap functions like this because it scatters the exit handling code across multiple places and it becomes more difficult to see what it does in any given test and situation and in which order. Who says that restore_alsa_state should run first on EXIT? Maybe it shouldn't be first...

@LukaszMrugala
Copy link
Contributor Author

Removed the trap stacking and moved the restore_ call inside the func_exit_handler. I'm using the existence of the .state file as an indicator whether to run the restore_ or not. Disabling that mechanism should be easy. Removed the filename passing as I've realised I can be using a preexisting variable instead.

@LukaszMrugala LukaszMrugala requested a review from marc-hb June 18, 2025 19:25
@LukaszMrugala LukaszMrugala marked this pull request as ready for review June 18, 2025 19:25
@LukaszMrugala LukaszMrugala requested review from a team, golowanow and lgirdwood as code owners June 18, 2025 19:25
}

# Restore ALSA settings after execution if state file exists
if [ -f /var/tmp/"${SCRIPT_NAME##*/}".state ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been wanting to reduce SCRIPT_NAME usage in #546 for a few years :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if [ -f /var/tmp/"${SCRIPT_NAME##*/}".state ]; then
if [ -f /tmp/"${SCRIPT_NAME##*/}".state ]; then

While different distros do things differently, /tmp is in general on tmpfs ("RAM disk") whereas /var/tmp is persistent across reboots - which I'm pretty sure we do not WANT.

BTW did you consider using the process ID in the name? I mean $$

There is also https://en.wikipedia.org/wiki/TMPDIR, it's complicated :-(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's another idea:

   SOF_TEST_ALSA_STATE=/tmp/"${SCRIPT_NAME##*/}".$$.state

... and then at restore time you can test -n SOF_TEST_ALSA_STATE? So the filename is defined in only one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose /var/tmp/ at first because I remembered that the DUT is rebooted in our CI during testing, but you're right - that reboot occurs before running the test case .sh file, so /tmp should be used instead.
I've applied your last suggestion and squashed the commits.
I've taken a look at the failed shellcheck and it all seemed like problems that already existed in the codebase, rather than something newly introduced.
PR checked in CI run No. 54699

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you're right - that reboot occurs before running the test case .sh file, so /tmp should be used instead.

There is no reboot test AFAIK right now:

PS: there are some suspend/resume tests but they are not very reliable (see #1017 and internal sof-framework/issues/408).

Currently, some tests change ALSABAT settings
during their execution, but do not restore them
to their prior state.

This causes tests that occur later
in the execution order to intermittently fail.

To remedy this, all tests that use ALSA now use
a common save_alsa_state function, which shall
save current ALSA state to a file and then restore
that state on the bash script EXIT signal.

Signed-off-by: Lukasz Mrugala <lukaszx.mrugala@intel.com>
@LukaszMrugala LukaszMrugala force-pushed the alsabat-test-state-restoration-quickfix branch from 6a22042 to 0aa7b2d Compare June 23, 2025 14:27
@LukaszMrugala LukaszMrugala requested a review from marc-hb June 24, 2025 07:50
@redzynix
Copy link
Contributor

SOFCI TEST

1 similar comment
@LukaszMrugala
Copy link
Contributor Author

SOFCI TEST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Some tests modify machine state

5 participants