Skip to content

Regex match to newline characters on windows10 #16

@ChandlerLutz

Description

@ChandlerLutz

Hi Anthony,

Thank you for the wonderful book.

I'm working through Ch. 13 on regular expressions on Windows10 using postgreSQL10 in pgAdmin4

I was having trouble getting the regular expressions to work for example in code listing 13-7.

I believe that the issue is related to the way new line characters are handled on windows.

This also may be related to the following issues (I am using a clean version of the imported csv file from the crime data):

#4 and #10

I was able to solve this issue with this SO answer: https://stackoverflow.com/a/20056634. Apparently windows may match newlines to \r\n

Here is my sql code for the crime time and the output, where crime_type_orig is the original from the book and the other crime_type2 and crime_type3 are based on the above SO answer:

select 
	regexp_match(original_text, '\n(?:\w+ \w+|\w+)\n(.*):') as crime_type_orig,
	-- See https://stackoverflow.com/a/20056634
	regexp_match(original_text, '\r\n(?:\w+ \w+|\w+)\r\n(.*):') as crime_type2,
	-- Based on https://stackoverflow.com/a/20056634
	regexp_match(original_text, '(?:\r\n|\r|\n)(?:\w+ \w+|\w+)(?:\r\n|\r|\n)(.*):') as crime_type3
from crime_reports;

Here is the output from pgAdmin

image


Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions