Regular Expression to find parts of a <script/img src=''>
or <link href=''>
attribute value
Been using my go-to regex101.com editor to work this out, but I always have problems with URLs and filesystem paths. I generally have the 'https' URL/resource in order.
I am trying to read and parse the link'href' and img/script'src' attribute values from the elements extracted in the markup.
The groupings/captures I want are
- "path provider" (PowerShell terminology), basically the drive
- The path leading to the file part. I prefer groupings between the path separator "\" or "/", both must be accounted for but will accept a long string
Thus, suppose D:\a\b\c\file.ext
This part can be grouped as '\a\b\c' but if it can multiple groups '\a', '\b', '\c', even better.
One more more path separators required - The file basename without path separator
- The file extension with the leading '.' which is the last '.' of the path
The pattern might be more specific regular expressions separated by the alternative separator (|) instead of trying to match the strings with a single expression.
I specifically include the '^' and '$' start and end assertions for the markup attribute value.
Test string #1: ${SPREST_JS_FolderPath}/SPListREST.js
- No path provider/drive, so no Group 1 - OK
- Group 2: ${SPREST_JS_FolderPath} # Item (ii)
- Group 3: ${SPREST_JS_FolderPath} # repeat of Group 2 -- not wanted
- Group 4: SPListREST # file basename Item (iii)
- Group 5: js # file type/extension Item (iv)
Test string #2 D:\dev\SharePoint\SPTools\src\pagestyle.css
- Group 1: D: # Item (i)
- Group 2: \dev\SharePoint\SPTools\src # Item (ii) exactly as required if groupings by '\pathseg' not possible
- Group 3: \src # the last path segment--unwanted
- Group 4: pagestyle # file basename Item (iii)
- Group 5: css # file type/extension Item (iv)
Test string #3 ./js/SPREST/SPRestEmail.js
- No path provider/drive, so no group 1
- Group 2: ./js/SPREST # Item (ii) exactly as required if groupings by '\pathseg' not possible
- Group 3: /SPREST # the last path segment--unwanted
- Group 4: SPRestEmail # file basename Item (iii)
- Group 5: js # file type/extension Item (iv)