AnswerBun.com

cURL url_effective with Hash

Unix & Linux Asked by Zombo on January 5, 2022

If you put this link in a browser:

https://unix.stackexchange.com/q/453740#453743

it returns this:

https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

However cURL drops the Hash:

$ curl -I https://unix.stackexchange.com/q/453740#453743
HTTP/2 302
cache-control: no-cache, no-store, must-revalidate
content-type: text/html; charset=utf-8
location: /questions/453740/installing-busybox-for-ubuntu

Does cURL have an option to keep the Hash with the resultant URL? Essentially I
am trying to write a script that will resolve URLs like a browser – this is what
I have so far but it breaks if the URL contains a Hash:

$ set https://unix.stackexchange.com/q/453740#453743
$ curl -L -s -o /dev/null -w %{url_effective} "$1"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu

2 Answers

Curl download whole pages.
A # points to a fragment.

Both are not compatible.


hash

The symbol # is used at the end of a web page link to mark a position inside a whole web page.

  • Fragment URLs

    ...convention called "fragment URLs" to refer to anchors within an HTML document.

  • What is it when a link has a pound "#" sign in it

    It's a "fragment" or "named anchor". You can you use to link to part of a document.

  • Wikipedia: Uniform Resource Locator (URL)

    An optional fragment component preceded by an hash (#). The fragment contains a fragment identifier providing direction to a secondary resource, such as a section heading in an article identified by the remainder of the URI. When the primary resource is an HTML document, the fragment is often an id attribute of a specific element, and web browsers will scroll this element into view.

Its main use is to move the "presentation layer" (what is viewed) to the start of an item.

curl

There is no "presentation layer" in curl, its goal is to download whole pages, not parts or fragments of pages. Therefore, there is no use for a "fragment" marker in curl. It is simply ignored by curl.

Workaround

Re-append the tag to the (redirected) link:

originallink='https://unix.stackexchange.com/q/453740#453743'
wholepage=$(curl -Lso /dev/null -w %{url_effective} "$originallink")
if [ "$originallink" != "${originallink##*#}" ]; then
    newlink=$wholepage#${originallink##*#}
else
    echo "link contains no segment"
    newlink="$wholepage"
fi
    echo "$newlink"

Will print:

https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

A quite faster solution is to not download the page. It is being redirected to /dev/null anyway. By removing the -L option and asking what would be the link if the (first) redirect were followed. The first redirect works in this case and most others.

wholepage=$(curl -so /dev/null -w %{redirect_url} "$originallink")

Answered by ImHere on January 5, 2022

According to this thread on the curl website titled: Re: How to send fragment part of URL? the hashmark is meant for the browser and not the server, hence why curl is truncating it.

The fragment part of a URI is not meant to be sent in the HTTP request - it is used to identify a specific section in the resource that will be fetched by using the particular URI. If you want to force #-letter into the request I think encoding it sounds like a perfect idea.

Looking I did not see any method for curl to persist it beyond encoding it as %23, which I don't think is what you want.

Solution

Since it's the client that's maintaining the string after the hashmark, I'd "lean into it" and simply parse it out and then re-append it to the returned URL from curl as a true browser client would do it:

$ set 'https://unix.stackexchange.com/q/453740#453743'
$ echo "$(curl -I -L -s -o /dev/null -w %{url_effective} "$1")#$(echo "$1" | cut -d"#" -f2)"
https://unix.stackexchange.com/questions/453740/installing-busybox-for-ubuntu#453743

References

Answered by slm on January 5, 2022

Add your own answers!

Related Questions

No package systemd available on Amazon linux EC2

1  Asked on November 1, 2020 by venkateswaran-r

       

What allows bash to autocomplete tmux “sub-commands”?

1  Asked on October 30, 2020 by stonethrow

     

Making ChrootDirectory directory writable by SFTP user

1  Asked on October 30, 2020 by tshepang

     

Content of /etc/network in Alpine Linux image

1  Asked on October 19, 2020 by francesco-lucian

       

no GUI after upgrading to debian 10.5 (nvidia)

0  Asked on October 19, 2020 by mejustjohndoe

     

VNC Server immediately crashes

1  Asked on October 17, 2020 by jkan5855

     

Execute command as if in another open terminal window

1  Asked on October 14, 2020 by falky

     

How to use watch command with a piped chain of commands/programs

5  Asked on October 10, 2020 by tulains-crdova

     

Bluetooth headphones disconnect after a few seconds

0  Asked on October 9, 2020 by sudonite

 

Error when attempting to receive dropbox keyring

0  Asked on September 26, 2020 by enrico

       

Mount an existing partition without formatting

0  Asked on September 22, 2020 by unviray

     

Ask a Question

Get help from others!

© 2023 AnswerBun.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP