TransWikia.com

cURL url_effective with Hash

Unix & Linux Asked by Zombo on January 5, 2022

If you put this link in a browser:

https://unix.stackexchange.com/q/453740#453743

it returns this:

https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

However cURL drops the Hash:

$ curl -I https://unix.stackexchange.com/q/453740#453743
HTTP/2 302
cache-control: no-cache, no-store, must-revalidate
content-type: text/html; charset=utf-8
location: /questions/453740/installing-busybox-for-ubuntu

Does cURL have an option to keep the Hash with the resultant URL? Essentially I
am trying to write a script that will resolve URLs like a browser – this is what
I have so far but it breaks if the URL contains a Hash:

$ set https://unix.stackexchange.com/q/453740#453743
$ curl -L -s -o /dev/null -w %{url_effective} "$1"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu

2 Answers

Curl download whole pages.
A # points to a fragment.

Both are not compatible.


hash

The symbol # is used at the end of a web page link to mark a position inside a whole web page.

  • Fragment URLs

    ...convention called "fragment URLs" to refer to anchors within an HTML document.

  • What is it when a link has a pound "#" sign in it

    It's a "fragment" or "named anchor". You can you use to link to part of a document.

  • Wikipedia: Uniform Resource Locator (URL)

    An optional fragment component preceded by an hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.

Its main use is to move the "presentation layer" (what is viewed) to the start of an item.

curl

There is no "presentation layer" in curl, its goal is to download whole pages, not parts or fragments of pages. Therefore, there is no use for a "fragment" marker in curl. It is simply ignored by curl.

Workaround

Re-append the tag to the (redirected) link:

originallink='https://unix.stackexchange.com/q/453740#453743'
wholepage=$(curl -Lso /dev/null -w %{url_effective} "$originallink")
if [ "$originallink" != "${originallink##*#}" ]; then
    newlink=$wholepage#${originallink##*#}
else
    echo "link contains no segment"
    newlink="$wholepage"
fi
    echo "$newlink"

Will print:

https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

A quite faster solution is to not download the page. It is being redirected to /dev/null anyway. By removing the -L option and asking what would be the link if the (first) redirect were followed. The first redirect works in this case and most others.

wholepage=$(curl -so /dev/null -w %{redirect_url} "$originallink")

Answered by ImHere on January 5, 2022

According to this thread on the curl website titled: Re: How to send fragment part of URL? the hashmark is meant for the browser and not the server, hence why curl is truncating it.

The fragment part of a URI is not meant to be sent in the HTTP request - it is used to identify a specific section in the resource that will be fetched by using the particular URI. If you want to force #-letter into the request I think encoding it sounds like a perfect idea.

Looking I did not see any method for curl to persist it beyond encoding it as %23, which I don't think is what you want.

Solution

Since it's the client that's maintaining the string after the hashmark, I'd "lean into it" and simply parse it out and then re-append it to the returned URL from curl as a true browser client would do it:

$ set 'https://unix.stackexchange.com/q/453740#453743'
$ echo "$(curl -I -L -s -o /dev/null -w %{url_effective} "$1")#$(echo "$1" | cut -d"#" -f2)"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

References

Answered by slm on January 5, 2022

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP