Skip to content

Conversation

@rok-cesnovar
Copy link
Member

Submission Checklist

This is just a re-do of #3081 done with the master branch as base. The changes are all Steve's.

Copyright and Licensing

Please list the copyright holder for the work you are submitting (this will be you or your assignee, such as a university or company):
Steve Bronder
Rok Češnovar (all I did was create the branch and open the PR though)

By submitting this pull request, the copyright holder is agreeing to license the submitted work under the following licenses:

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.5 3.55 0.99 -1.36% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 1.01 1.05% faster
eight_schools/eight_schools.stan 0.09 0.09 0.99 -1.04% slower
gp_regr/gp_regr.stan 0.14 0.14 1.0 -0.06% slower
irt_2pl/irt_2pl.stan 5.89 5.87 1.0 0.27% faster
performance.compilation 92.98 91.01 1.02 2.12% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.13 8.06 1.01 0.83% faster
pkpd/one_comp_mm_elim_abs.stan 31.84 30.77 1.03 3.36% faster
sir/sir.stan 122.24 119.69 1.02 2.09% faster
gp_regr/gen_gp_data.stan 0.03 0.03 0.98 -2.0% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 2.97 1.0 -0.01% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.41 0.39 1.05 4.76% faster
arK/arK.stan 2.82 2.09 1.35 25.96% faster
arma/arma.stan 0.25 0.28 0.91 -9.57% slower
garch/garch.stan 0.72 0.61 1.18 15.6% faster
Mean result: 1.03689639101

Jenkins Console Log
Blue Ocean
Commit hash: 9c9aa47


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.51 3.61 0.97 -2.99% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.96 -4.44% slower
eight_schools/eight_schools.stan 0.09 0.1 0.96 -3.84% slower
gp_regr/gp_regr.stan 0.14 0.14 1.0 -0.07% slower
irt_2pl/irt_2pl.stan 5.87 5.88 1.0 -0.2% slower
performance.compilation 92.87 90.92 1.02 2.1% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.16 8.02 1.02 1.72% faster
pkpd/one_comp_mm_elim_abs.stan 30.48 31.3 0.97 -2.71% slower
sir/sir.stan 121.47 116.99 1.04 3.69% faster
gp_regr/gen_gp_data.stan 0.03 0.03 1.0 0.05% faster
low_dim_gauss_mix/low_dim_gauss_mix.stan 2.97 3.0 0.99 -0.9% slower
pkpd/sim_one_comp_mm_elim_abs.stan 0.37 0.38 0.99 -1.29% slower
arK/arK.stan 2.81 2.1 1.34 25.4% faster
arma/arma.stan 0.25 0.28 0.88 -13.07% slower
garch/garch.stan 0.72 0.6 1.19 15.88% faster
Mean result: 1.02213081386

Jenkins Console Log
Blue Ocean
Commit hash: fe56a8d


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

SteveBronder
SteveBronder previously approved these changes Nov 15, 2021
Copy link
Collaborator

@SteveBronder SteveBronder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rok-cesnovar I approved this (though I wrote the original code) If this looks good to you then I think we can merge

@rok-cesnovar
Copy link
Member Author

I have been testing this over the weekend (with a change in cmdstan). Still seeing a perf regression :/ Will more details a bit later.

@SteveBronder
Copy link
Collaborator

Oh dang, can you send me the branch you are trying out? I think you just want to make this line to

      diagnostic_writers.emplace_back(
          std::make_unique<std::fstream>(diagnostic_filename,
                                         std::fstream::out, "#", true),

That should turn off all of the diagnostics to know whether or not writing diagnostics is the issue. You can also do the same for the other writer for output. If that fixes the regression then the actual issue is writing to the stringstream before writing to disk which I think we can be looser about now that I'm thinking about it. I think at the time I assumed multiple threads could write to one unique_stream_writer<>, but each stream writer is associated with one output file so we may not actually need that string stream for anything besides the loggers that write to stdout and stderr

@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Nov 15, 2021

The model used for testing:

transformed data {
 int N = 10000;
}
parameters {
 vector[N] a;
}
model {
 a ~ normal(0,1);
}

Then various versions:

Between each switch, I ran make stan-update, make clean-all, make build. Also set STANC3_VERSION=v2.x.y so the correct stanc3 binary is associated.

  • on 2.27.0:

Elapsed Time: 8.892 seconds (Warm-up)
13.219 seconds (Sampling)
22.111 seconds (Total)

  • on master (2.28.1):

Elapsed Time: 8.296 seconds (Warm-up)
19.1 seconds (Sampling)
27.396 seconds (Total)

  • on fix/diagnostic branch (points to this branch in stan):

Elapsed Time: 8.752 seconds (Warm-up)
12.481 seconds (Sampling)
21.233 seconds (Total)

@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Nov 15, 2021

Link to branch: https://github.com/stan-dev/cmdstan/tree/fix_diagnostic_writer

So I guess I should have put it better, it looks better, but still not at the level of 2.27.0. Maybe that is fine and expected and due to the stream stuff?

I would be fine with this amount of a perf. regression as long as we know where it comes from.

@SteveBronder
Copy link
Collaborator

It looks like it's faster than 2.27.1? Which would be before any of these changes

@rok-cesnovar
Copy link
Member Author

Yeah, it is faster than 2.28.1. Maybe the rest of it is due to the use of streams and there is not much we can do besides that.

@SteveBronder
Copy link
Collaborator

Sorry am I not reading your posted times right?

on 2.27.1:
Elapsed Time: 8.892 seconds (Warm-up)
13.219 seconds (Sampling)
22.111 seconds (Total)

on fix/diagnostic branch (points to this branch in stan):
Elapsed Time: 8.752 seconds (Warm-up)
12.481 seconds (Sampling)
21.233 seconds (Total)

Is the top one 2.27.1 and the bottom this branch?

@rok-cesnovar
Copy link
Member Author

Ah, sorry I think I botched something. Let me run the benchmark again. I think the last number is wrong. Will run again to avoid any confusion...

@stan-buildbot
Copy link
Contributor


Name Old Result New Result Ratio Performance change( 1 - new / old )
gp_pois_regr/gp_pois_regr.stan 3.56 3.58 0.99 -0.51% slower
low_dim_corr_gauss/low_dim_corr_gauss.stan 0.02 0.02 0.81 -23.99% slower
eight_schools/eight_schools.stan 0.09 0.09 1.05 4.89% faster
gp_regr/gp_regr.stan 0.15 0.14 1.02 2.0% faster
irt_2pl/irt_2pl.stan 5.86 5.73 1.02 2.09% faster
performance.compilation 92.76 91.27 1.02 1.61% faster
low_dim_gauss_mix_collapse/low_dim_gauss_mix_collapse.stan 8.1 8.12 1.0 -0.23% slower
pkpd/one_comp_mm_elim_abs.stan 31.54 32.27 0.98 -2.3% slower
sir/sir.stan 118.4 124.5 0.95 -5.16% slower
gp_regr/gen_gp_data.stan 0.03 0.04 0.94 -6.66% slower
low_dim_gauss_mix/low_dim_gauss_mix.stan 3.02 2.99 1.01 1.12% faster
pkpd/sim_one_comp_mm_elim_abs.stan 0.38 0.38 0.99 -0.79% slower
arK/arK.stan 2.8 2.82 0.99 -0.68% slower
arma/arma.stan 0.25 0.28 0.91 -9.53% slower
garch/garch.stan 0.72 0.61 1.19 16.19% faster
Mean result: 0.991841046687

Jenkins Console Log
Blue Ocean
Commit hash: 6716d83


Machine information ProductName: Mac OS X ProductVersion: 10.11.6 BuildVersion: 15G22010

CPU:
Intel(R) Xeon(R) CPU E5-1680 v2 @ 3.00GHz

G++:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

Clang:
Apple LLVM version 7.0.2 (clang-700.1.81)
Target: x86_64-apple-darwin15.6.0
Thread model: posix

@SteveBronder
Copy link
Collaborator

Closing this as I think I found a simpler solution in #3087

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants