Every documentary I've seen about people shocked about the amount of information different platforms have collected on them has revolved around information that people have actually given them.
And again, that's really another facet of the "out of sight, out of mind" phenomenon. A Facebook profile has many optional fields that you can fill in that create an extensive story of your life - your education history, credentials, where you work and where you used to work, places you've lived, your favorite movies - just about anything you can think of. It even allows you, if you like, to choose a "hometown" distinct from the place you happen to live at present, because those are often separate things to many. People fill these in because the user-facing side of Facebook is supposed to be a social network - when people enter the high school they graduated from and their years of attendance, in their mind they've just added that information to their profile so that other people using Facebook who went to the same school at the same time can find them and "reconnect". They don't
feel like they're giving that information
to Facebook for Facebook to turn around and sell to advertising companies.
That conceptual disconnect is by design, and Facebook expends at least some effort to keep people feeling that way. A few years ago, encrypted chat app Signal attempted to put an extremely transparent ad campaign on Facebook using the promoted-posts system. Companies and advertisers who use it have access to the same targeting tools that external ad-server companies do, and all of these ads were designed to
simply say out loud what data points were used to target a given recipient. It did not go over well and Facebook quickly banned the campaign and the account.
The ads didn't break any terms of service or do anything that any other ad campaigns don't also do, they were banned because being forthright about it would upset Facebook's users. A lot of people "kind of know", on an intellectual level, that the data they volunteer on social media can and likely is being mined and exploited by...someone; but it hits differently when you're confronted with it in such a direct and explicit way.
I guess I'm not seeing how a site like Facebook could collect browser information when I'm not logged in if I took the simple step of clearing that information before logging into Facebook etc.
It isn't something unique to Facebook. As implied earlier, whenever your computer or device visits any webpage at any time, it sends specific information about your machine to the website. I'm not talking about cached information, history, or anything you can "clear"; I'm talking about telemetry that your machine sends about its software and hardware as an integral part of the process of establishing a connection with a website. It's not anything inherently nefarious - a website needs to "know" what kind of browser you're using, what size your screen is, what plugins it's running, what kind of fonts it has available, and other such things just so the website can display its information correctly. But this data is still just data, and the more data there is and the more unique certain bits of data in the set are, the higher the chance that it could only be coming from a relatively small set of people, or even just one person; and that fact is why some websites choose to store it and keep track of it.
Whether you're logged in or not, Facebook sees this kind of data whenever you load their website. When you ARE logged in, Facebook is able to associate this data about your device with your user account. Once they've made that connection, they can use it to correlate data reports they get from the other sites you visit, even if you're logged out and have deleted any Facebook cookies from your actual machine. "Oh, someone from [particular IP address] who uses [browser] with [particular set of extensions] and a machine set to [particular time zone] just visited ESPN.com to check a game score? That data happens to match what we know about [specific Facebook user]. We can add an interest in the St. Louis Cardinals to his dataset." In a nutshell, that's how a significant part of Facebook's data collection works.
You can't stop your device from sending this kind of information at all period, but some parts of it can be spoofed by extensions or browsers with strong privacy features. Practically speaking you can still be tracked while using spoofed data as long as that data is consistent across your web activity; but at least the "you" being tracked is fictitious because the data doesn't accurately describe you.