Skip to content

Conversation

@ANAMASGARD
Copy link
Contributor

Fixes #249

Summary

Adds file size information (total_MB, mean_MB, and rows) to the download status table, helping users understand disk usage for each geom.

Problem

Currently, the download status table shows how many TSV files need to be downloaded, but doesn't show file sizes. Users can't see if they're downloading 1MB or 100MB.

Test output:
Screenshot From 2025-10-29 10-50-36

Only added the test case in this first commit in the next commit will try to fix the issue !

Test verifies animint.js generates total_MB and mean_MB columns
in download status table. Currently fails, demonstrating these
size columns are not present in the table.

Addresses #249
@ANAMASGARD
Copy link
Contributor Author

Add download size info to status table (#249)

Adds total_MB, mean_MB, and rows columns to the download status table.
Calculates total data size and row count for each geom by iterating
through all downloaded chunks. Handles both array and object data
structures. Values update as data is downloaded and displayed.

Browser screenshot verification included showing new columns working.
Screenshot From 2025-10-30 21-04-08

@codecov
Copy link

codecov bot commented Oct 30, 2025

Codecov Report

❌ Patch coverage is 99.00990% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 77.89%. Comparing base (68b074e) to head (69bc4c7).

Files with missing lines Patch % Lines
R/z_animintHelpers.R 88.88% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #272      +/-   ##
==========================================
+ Coverage   73.06%   77.89%   +4.83%     
==========================================
  Files         164      164              
  Lines        8758     8845      +87     
  Branches        0      564     +564     
==========================================
+ Hits         6399     6890     +491     
+ Misses       2359     1955     -404     
Flag Coverage Δ
javascript 95.57% <100.00%> (+14.82%) ⬆️
r 69.59% <93.75%> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ANAMASGARD ANAMASGARD requested a review from tdhock October 30, 2025 16:06
ANAMASGARD and others added 3 commits October 31, 2025 13:22
- Calculate file sizes using file.size() in saveChunks()
- Store chunk_info (bytes, rows) in plot.json
- Update table cells after download (not in draw_geom)
- Remove 36-line JavaScript calculation loop
@tdhock
Copy link
Collaborator

tdhock commented Oct 31, 2025

No obvious timing issues in HEAD=add-download-size-info
Comparison Plot

Generated via commit 69bc4c7

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 2 minutes and 22 seconds
Installing different package versions 14 seconds
Running and plotting the test cases 3 minutes and 40 seconds

@tdhock
Copy link
Collaborator

tdhock commented Oct 31, 2025

Thanks! This is a good start. I see the result below using https://github.com/animint/animint2/blob/master/inst/examples/WorldBank-paper.R
image

Can you please make the following changes?

  • remove selected chunk and status columns
  • right justify numeric columns
  • remove mean MB column
  • combine downloaded and total columns into one column named "files" value "4/52"
  • same for MB and rows, value should be " # downloaded / # possible "
  • This data viz, for geom3 and geom4, makes 52 numbered chunk files (chunk1 to chunk52.tsv) plus a chunk_common.tsv file. Currently the common file is counted for the numbr downloaded "4" but not for the total number "52" but we should change so that the common file is counted for both. (and for each attribute, files/MB/rows)

ANAMASGARD and others added 2 commits October 31, 2025 21:47
- Remove 'selected chunk', 'status', and 'mean MB' columns
- Add combined 'files', 'MB', and 'rows' columns (downloaded/total format)
- Right-justify all numeric columns
- Fix common chunk tracking: count in both downloaded and total counts
- Track size (bytes/rows) for common chunks in R code
- Filter chunk_info to only include each geom's own chunks
- Count only .tsv files for accurate totals in JavaScript
- Update tests to verify new column structure
@ANAMASGARD ANAMASGARD requested a review from tdhock November 9, 2025 16:06
Use 1048576 (1024*1024) consistently for MB conversion instead of
mixing 1e6 (1000000) and 1048576.

This ensures consistent size reporting for both common chunks and
regular chunks in the download status table.
Switch test-download-status-table.R to validate rendered HTML instead of
the JS source. Verify headers and right-justified numeric columns via XPath.

Update test-renderer1-variable-value.R to read the combined
column ("downloaded / total") so the chunk count checks work again.

Remove the extra blank line in NEWS.md.
@ANAMASGARD ANAMASGARD requested a review from tdhock November 11, 2025 16:28
@tdhock
Copy link
Collaborator

tdhock commented Nov 11, 2025

please click Resolve so I can see which previous comments you have already addressed

@tdhock
Copy link
Collaborator

tdhock commented Nov 11, 2025

can you please post a screenshot like this one #272 (comment) but created with the new code?

@ANAMASGARD
Copy link
Contributor Author

Sir @tdhock Here's the updated screenshot with the new download status table format:

Screenshot From 2025-11-16 21-28-41

Changes implemented:

  • ✅ Removed "selected chunk" and "status" columns
  • ✅ Removed "mean MB" column
  • ✅ Combined columns into "files", "MB", and "rows" format (downloaded/total)
  • ✅ Right-justified all numeric columns
  • ✅ Common chunk now counted in BOTH downloaded and total for all metrics

All values update correctly as data downloads, and common chunk is properly counted in both downloaded and total counts.

Please give your feedback and suggest what changes I need to make so that the failing test can be passed on the GitHub , Thank You !

@tdhock
Copy link
Collaborator

tdhock commented Nov 17, 2025

#272 (comment) seems to have the wrong old screenshot, please revise

@ANAMASGARD
Copy link
Contributor Author

Updated screenshot with WorldBank-paper.R example showing new download status table format:

Screenshot From 2025-11-28 21-40-21
  • "files", "MB", and "rows" columns in "downloaded/total" format
  • Removed "selected chunk", "status", and "mean MB" columns
  • Right-justified numeric columns
  • Common chunk counted in both downloaded and total counts

g_info.data[tsv_name] = chunk;
g_info.tr.select("td.downloaded").text(d3.keys(g_info.data).length);
g_info.download_status[tsv_name] = "saved";

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please undo addition of empty lines

Comment on lines 14 to 16
expect_false("status" %in% header_text)
expect_false("mean MB" %in% header_text)
expect_false("selected chunk" %in% header_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove

table_headers <- getNodeSet(info$html, '//table[@id="download_status"]//th')
header_text <- sapply(table_headers, xmlValue)
expect_true("geom" %in% header_text)
expect_true("files" %in% header_text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expect_in

@ANAMASGARD ANAMASGARD requested a review from tdhock December 3, 2025 15:38
ANAMASGARD and others added 2 commits December 5, 2025 07:48
- Change download_status table ID from fixed 'download_status' to
  viz_id + '_download_status' for unique IDs when multiple animints
  are on the same page
- Update test XPath selectors to use contains() for flexibility
@tdhock tdhock changed the title Add file size columns to download status table (#249) Add file size columns to download status table Dec 10, 2025
@ANAMASGARD ANAMASGARD requested a review from tdhock December 10, 2025 16:49
@tdhock
Copy link
Collaborator

tdhock commented Dec 10, 2025

using this branch I see
image

  • large number of rows would benefit from having commas, like 4,321 instead of 4321 -- can you please fix?
  • sometimes 0.00 is displayed for MB which could be confusing. I'm not sure what the best fix for this would be. Either change to 0 (without .00) or change column heading to "disk space" and use KB or MB in each row (depending on how big they are).

also please change the computation or column name "MB" to be consistent with "man du" which says

       The SIZE argument is an integer and optional unit (example: 10K is
       10*1024).  Units are K,M,G,T,P,E,Z,Y,R,Q (powers of 1024) or
       KB,MB,... (powers of 1000).  Binary prefixes can be used, too:
       KiB=K, MiB=M, and so on.

in the example above we see 15.62 MB for geom2.
but looking at the file sizes in R we see:

> sum(size.dt[grep("geom2",name), R_bytes/1000/1000])
[1] 16.38009
> sum(size.dt[grep("geom2",name), R_bytes/1024/1024])
[1] 15.62127

15.62 is the number you get when you divide by 1024, which should be MiB (not MB as currently written in column header) so please

  • change MB to MiB in header, or
  • change divisor from 1024 to 1000

thanks!

- Add commas to large row numbers (e.g., 4,321 instead of 4321)
- Show disk sizes in KiB or MiB based on size (binary 1024 divisor)
- Rename column from 'MB' to 'disk' since values now include units
- Update NEWS.md entry to reflect current features
@ANAMASGARD
Copy link
Contributor Author

  • Updated NEWS/DESCRIPTION to version 2025.12.4 with the final download table description.
  • Download status table now shows files/disk/rows in “downloaded / total”, rows use commas, disk uses KiB/MiB (1024 divisor, man du consistent), and all numeric columns are right‑aligned.
  • Tests now target rendered HTML (headers and alignment) and are passing; full CI (JS/R/CRAN) is green.

Sir @tdhock can you please again review this PR .

Copy link
Collaborator

@tdhock tdhock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also please post a screenshot of the new display?

and please add a test for the content of the output? (current test is only for header names and alignment)

- Add updateDownloadStatus() function to eliminate code duplication
- Add formatWithCommas() for row numbers (4,321 instead of 4321)
- Add formatBytes() for KiB/MiB display using 1024 divisor
- Add test to verify table content format (files/disk/rows)
@ANAMASGARD
Copy link
Contributor Author

ANAMASGARD commented Dec 15, 2025

Sir @tdhock Updated screenshot showing the new download status table format:

Screenshot From 2025-12-15 21-54-33

@ANAMASGARD ANAMASGARD requested a review from tdhock December 15, 2025 17:09
Replace \d with [0-9] in regex patterns - \d is not recognized in R's base regex engine
@tdhock
Copy link
Collaborator

tdhock commented Dec 15, 2025

the updated screenshot looks better, thanks!
it would be easier to read if there were a grey or black line to separate each row/column in the table. Can you add that please?

@tdhock
Copy link
Collaborator

tdhock commented Dec 16, 2025

also can you please post updated screenshot?

@ANAMASGARD
Copy link
Contributor Author

Yes Sir @tdhock I will definitely create a function and post the updated screenshot.
Actually my laptop screen is broken, ASAP it is fixed I will make the changes.

@ANAMASGARD
Copy link
Contributor Author

Sir @tdhock Updated screenshot with table borders :-
Screenshot From 2025-12-19 15-51-47

@ANAMASGARD ANAMASGARD requested a review from tdhock December 19, 2025 11:28
@ANAMASGARD
Copy link
Contributor Author

Sir @tdhock The CI test failures in test-compiler-ghpages.R are not related to this PR's changes.

  • The failures are caused by a race condition: both JS_coverage and R_coverage jobs run in parallel and use the same test repo (animint-test/animint2pages_test_repo), causing them to interfere with each other.

  • I can fix this by making each test suite use a unique repo name based on the TEST_SUITE environment variable. Should I push that fix to this PR, or would you prefer to handle it separately
    (Like I raise an issue regarding this one )?

  • The download status table feature is complete and working. The test failures are flaky tests that sometimes pass (run #1335 passed, #1339 failed with same code).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

download status table could show sizes

3 participants