From 1a51b4eb05937fd46adbec427249ebe70f05f9a6 Mon Sep 17 00:00:00 2001 From: Jonathan Rochkind Date: Wed, 9 Apr 2025 13:51:56 -0400 Subject: [PATCH] Add substitute_with_block method to TextRun, that can take a block wtih access to MatchData with regex capture groups etc I needed to do a `substitute` where the match arg was a regex, and if it were gsub I'd be using a block to have access to capture groups in $1 $2 $3 etc. Because of the weird way variables $1 $2 $3 are handled in ruby and block scope, I couldn't provide a delegated block to give exactly the same API as ordinary gsub. My original idea was to do that, added on to existing #substitute. But instead, had to provide a new/alternate #substitute_with_block method, with a block that actually gets a MatchData object as arg, and can access whatever it needs from there, including capture groups and match string. --- README.md | 10 +++++++++- lib/docx/containers/text_run.rb | 13 +++++++++++++ spec/docx/document_spec.rb | 13 +++++++++++++ 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 31b4b39..6894ead 100644 --- a/README.md +++ b/README.md @@ -130,6 +130,14 @@ doc.paragraphs.each do |p| end end +# Substitute text with access to captures, note block arg is a MatchData, a bit +# different than String.gsub. https://ruby-doc.org/3.3.7/MatchData.html +doc.paragraphs.each do |p| + p.each_text_run do |tr| + tr.substitute_with_block(/total: (\d+)/) { |match_data| "total: #{match_data[1].to_i * 10}" } + end +end + # Save document to specified path doc.save('example-edited.docx') ``` @@ -145,7 +153,7 @@ doc = Docx::Document.open('tables.docx') # Iterate over each table doc.tables.each do |table| last_row = table.rows.last - + # Copy last row and insert a new one before last row new_row = last_row.copy new_row.insert_before(last_row) diff --git a/lib/docx/containers/text_run.rb b/lib/docx/containers/text_run.rb index 55ed62c..18b83b1 100755 --- a/lib/docx/containers/text_run.rb +++ b/lib/docx/containers/text_run.rb @@ -57,6 +57,19 @@ def substitute(match, replacement) reset_text end + # Weird things with how $1/$2 in regex blocks are handled means we can't just delegate + # block to gsub to get block, we have to do it this way, with a block that gets a MatchData, + # from which captures and other match data can be retrieved. + # https://ruby-doc.org/3.3.7/MatchData.html + def substitute_with_block(match, &block) + @text_nodes.each do |text_node| + text_node.content = text_node.content.gsub(match) { |_unused_matched_string| + block.call(Regexp.last_match) + } + end + reset_text + end + def parse_formatting { italic: !@node.xpath('.//w:i').empty?, diff --git a/spec/docx/document_spec.rb b/spec/docx/document_spec.rb index 81d57b8..b19c2ba 100755 --- a/spec/docx/document_spec.rb +++ b/spec/docx/document_spec.rb @@ -206,6 +206,19 @@ expect(@doc.paragraphs[1].text).to eq('Multi-line paragraph line 1same paragraph line 2yet the same paragraph line3 ') end + + it "should replace placeholder in any line of paragraph using substitute_with_block" do + expect(@doc.paragraphs[0].text).to eq('Page title') + expect(@doc.paragraphs[1].text).to eq('Multi-line paragraph line 1_placeholder2_ line 2_placeholder3_ line3 ') + + @doc.paragraphs[1].each_text_run do |text_run| + text_run.substitute_with_block(/_placeholder(\d)_/) { |match_data| + "_replacement_#{match_data[1]}" + } + end + + expect(@doc.paragraphs[1].text).to eq('Multi-line paragraph line 1_replacement_2 line 2_replacement_3 line3 ') + end end describe 'read formatting' do