Quantcast
Channel: CodeProject Latest postings for Regular Expressions
Viewing all articles
Browse latest Browse all 224

Regular Expression to find parts of a <script/img src=""> or <link href=""> attribute value

$
0
0

Regular Expression to find parts of a <script/img src=''> or <link href=''> attribute value


Been using my go-to regex101.com editor to work this out, but I always have problems with URLs and filesystem paths. I generally have the 'https' URL/resource in order.

I am trying to read and parse the link'href' and img/script'src' attribute values from the elements extracted in the markup.

The groupings/captures I want are
  1. "path provider" (PowerShell terminology), basically the drive
  2. The path leading to the file part. I prefer groupings between the path separator "\" or "/", both must be accounted for but will accept a long string
    Thus, suppose D:\a\b\c\file.ext
    This part can be grouped as '\a\b\c' but if it can multiple groups '\a', '\b', '\c', even better.
    One more more path separators required
  3. The file basename without path separator
  4. The file extension with the leading '.' which is the last '.' of the path
My working pattern/RE is: ^([a-zA-Z] ?(([/\]?[^/\]+)*)[/\]([^.]+).(\S+)$

The pattern might be more specific regular expressions separated by the alternative separator (|) instead of trying to match the strings with a single expression.

I specifically include the '^' and '$' start and end assertions for the markup attribute value.

Test string #1: ${SPREST_JS_FolderPath}/SPListREST.js

  • No path provider/drive, so no Group 1 - OK
  • Group 2: ${SPREST_JS_FolderPath} # Item (ii)
  • Group 3: ${SPREST_JS_FolderPath} # repeat of Group 2 -- not wanted
  • Group 4: SPListREST # file basename Item (iii)
  • Group 5: js # file type/extension Item (iv)

Test string #2 D:\dev\SharePoint\SPTools\src\pagestyle.css

  • Group 1: D: # Item (i)
  • Group 2: \dev\SharePoint\SPTools\src # Item (ii) exactly as required if groupings by '\pathseg' not possible
  • Group 3: \src # the last path segment--unwanted
  • Group 4: pagestyle # file basename Item (iii)
  • Group 5: css # file type/extension Item (iv)

Test string #3 ./js/SPREST/SPRestEmail.js

  • No path provider/drive, so no group 1
  • Group 2: ./js/SPREST # Item (ii) exactly as required if groupings by '\pathseg' not possible
  • Group 3: /SPREST # the last path segment--unwanted
  • Group 4: SPRestEmail # file basename Item (iii)
  • Group 5: js # file type/extension Item (iv)
[composed in Markdown, so presentation affected by your settings/stylings]

Viewing all articles
Browse latest Browse all 224

Trending Articles